Principal Data Processing Engineer

Reposted 10 Days Ago
Be an Early Applicant
Mountain View, CA
In-Office
Expert/Leader
Software
DataPelago helps enterprises process data efficiently for AI and analytics with its Nucleus engine.
The Role
Lead the architecture, design, and implementation of a high-performance data processing engine focused on large-scale data processing.
Summary Generated by Built In

Principal Data Processing Engineer
Mountain View, CA

About DataPelago:

DataPelago is at the forefront of revolutionizing data processing for traditional analytics and cutting-edge GenAI preprocessing. We are building an innovative data processing engine that is transforming how Apache Spark, Apache Flink, Ray and others operate on diverse, large-scale data. Our team of engineers drive and adopt advances in hardware-accelerated computing, parallel processing of large-scale data, query optimization, distributed systems, compilers, machine learning, and cloud-native computing. We are looking for world-class engineers to join our team and shape the future of accelerated data processing.


The Role:

As a Principal Data Processing Engineer, you will be a key technical leader in the development of the core execution components of our data processing engine. You will lead architecture, design, and implementation that will enhance the functional breadth, performance, scale, and reliability of the engine to deliver a product that will redefine how users extract intelligence from their data. This is a unique opportunity to make a significant impact on a category-defining product and work with a talented team of engineers.


What You'll Do:

  • Drive the evolution of our parallel and distributed execution engine architecture, with a strong focus on leveraging accelerated computing technologies.
  • End-to-End Ownership: Lead the execution engine team in the complete lifecycle of design,
    implementation, and rollout of an enterprise-grade product.
  • Individually design, implement, test, and maintain critical components of the data processing execution engine.
  • Innovation and Differentiation: Analyze technology advances from industry and academia to identify opportunities for the engine to enhance technology and product leadership.
  • Collaboration: Partner effectively with engineering, product management,
    and customer success teams. 
  • Guide and mentor engineers on the execution engine team.
  • Foster best practices in design and code reviews, testing, CI/CD, and issue resolution to maintain the highest product quality, security, efficiency, and productivity.


What You'll Bring:

  • BS/MS in  Computer Science (or a related field) with 10+ years of relevant experience 
  • 7+ years of deep technical experience in developing core components of enterprise-grade database or analytics execution engines designed for large-scale data processing.
  • Proven expertise in developing high-performance parallel implementations of data processing operators and functions on rich data types.
  • Significant experience developing for platforms such as Apache Spark, Apache Flink, Apache Doris, Apache Gluten, Velox, Apache DataFusion/Comet preferred.
  • Previous experience working as technical lead/architect with teams of 10+ engineers in the design, development, and successful release of high-performance data processing engines for large production deployments.
  • Proficiency in C, C++, and Rust programming.
  • Extensive development experience in Linux environments.
  • Strong analytical and problem-solving skills with a passion for performance optimization.
  • Excellent communication and collaboration skills, with the ability to articulate complex technical concepts to both technical and non-technical audiences.


Why Join DataPelago?

  • Technical Leadership: Take a leadership role in shaping the architecture and development of how our core engine works with open source data processing platforms
  • Cutting-Edge Innovation: Work on challenging problems at the forefront of accelerated
    computing and data processing.
  • Significant Impact: Your contributions will directly impact the performance and scalability of our mission-critical platform.
  • Mentorship and Growth: Mentor and guide other talented engineers while expanding your own technical expertise.
  • Competitive compensation, stock options, comprehensive benefits package, and leadership development opportunities

Top Skills

Apache Datafusion
Apache Doris
Apache Flink
Apache Gluten
Spark
C
C++
Linux
Rust
Velox
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Mountain View, CA
60 Employees
Year Founded: 2025

What We Do

DataPelago is redefining how enterprises process data for AI and analytics at scale. As organizations race to operationalize artificial intelligence, they are discovering that the greatest barrier to progress isn’t a lack of models or talent – it’s the infrastructure beneath them. Data pipelines remain fragmented across specialized systems for analytics, AI, and data engineering, each optimized for specific workloads but incapable of operating as a cohesive whole. The result is inefficiency: duplicated data, stranded compute resources, and escalating costs that slow innovation.

DataPelago was founded to solve this challenge. Its flagship product, Nucleus, is the world’s first Universal Data Processing Engine (UDPE) – a new layer that sits between data lakes and query engines to unify data processing within a single, hardware-aware stack. Built from first principles for accelerated computing, Nucleus allows companies to process, move, and activate their data orders of magnitude more efficiently than existing systems.

At its core, Nucleus dynamically orchestrates workloads across heterogeneous compute environments – CPUs, GPUs, TPUs, and FPGAs – ensuring every job runs on the optimal hardware for maximum performance and efficiency. This unified approach eliminates the need to maintain separate infrastructure for different data workloads, dramatically reducing complexity and total cost of ownership by up to 40%.

Nucleus supports structured, unstructured, and semi-structured data in a single environment, enabling AI and analytics workloads to coexist seamlessly. It integrates easily with existing data ecosystems and open-source frameworks, providing enterprises with flexibility and performance without requiring code changes or proprietary lock-in.

With Nucleus, data teams can accelerate queries, streamline pipelines, and scale AI initiatives faster, all while controlling infrastructure spend. Early adopters across industries are leveraging the platform to speed up data preparation, model training, and real-time analytics by up to 10x, turning data from a bottleneck into a competitive advantage.

DataPelago’s mission is to make high-performance, cost-efficient data processing achievable for every enterprise. By bridging the gap between data infrastructure and AI innovation, the company is helping organizations unlock the full potential of their data, laying the foundation for a new era of intelligence at scale.

Why Work With Us

DataPelago is pioneering the world’s first Universal Data Processing Engine, unifying AI and analytics in a single, hardware-aware platform. We’re solving one of the biggest challenges in enterprise AI – making data infrastructure faster, simpler, and more efficient. Join us to build the foundation for the next era of intelligent computing.

Gallery

Gallery

Similar Jobs

In-Office
Mountain View, CA, USA
60 Employees
6-6 Annually

Take-Two Interactive Software Logo Take-Two Interactive Software

Counsel

Gaming • Information Technology • Mobile • Software
Hybrid
San Mateo, CA, USA
6500 Employees
134K-198K Annually

Relativity Space Logo Relativity Space

Production Manager, Engine Manufacturing

Aerospace • Hardware • Robotics • Software • Manufacturing
Easy Apply
In-Office
Long Beach, CA, USA
1800 Employees
150K-165K Annually

CSC Logo CSC

Associate Client Order Coordinator

Fintech • Legal Tech • Software • Financial Services • Cybersecurity • Data Privacy
In-Office
Wilmington, Los Angeles, CA, USA
8500 Employees
40K-40K Annually

Similar Companies Hiring

LayerOne Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
15 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account