Data Platform Engineer Lead (AI)

Posted 6 Months Ago
2 Locations
In-Office
Senior level
Artificial Intelligence • Software
The Role
As a Data Infrastructure Engineer, you will enhance data processing pipelines, manage data infrastructure, and implement best practices for large-scale data handling.
Summary Generated by Built In

EvolutionaryScale’s mission is to develop artificial intelligence to understand biology for the benefit of human health and society, through open, safe, and responsible research, and in partnership with the scientific community. Over the next ten years AI will transform biological design, making molecules and entire cells programmable. We will develop the foundation models for biology that enable this.

To continue to move the field forward in this emerging area, we prioritize individuals who have shown excellence and creativity in their respective domains over specific domain expertise. Having both biology and AI expertise is great, but not a requirement.

Our team does both deep research and product development, not only building the frontier biological AI models in the field but also putting them in the hands of the researchers at the forefront of the life sciences. This fundamentally requires elite engineers and scientists working together to solve big research and product challenges. We are building a world class multi-disciplinary team spanning AI research, engineering, biology research, and business roles, which requires strong communication and collaboration across roles.

The EvolutionaryScale team is based in two locations: San Francisco and New York. We believe in flexibility around work schedules and locations but expect that our team members will work half of the days or more of most weeks from one of our two offices.

The Role 

As our Data Platform Lead Engineer, you'll own the architecture and execution of EvolutionaryScale's data platform - the backbone that powers training, evaluation, and discovery across our models. You'll build reliable, scalable, and transparent pipelines that process biological data at unprecedented scale and ensure every dataset - pre-training or post-training - is high-quality, reproducible, and traceable. You'll collaborate closely with bioinformatics, modeling, research and infrastructure teams to design systems that enable our scientists and modelers to move faster, experiment more effectively, and generate insight from massive biological data.

  • Architect and operate large-scale data processing pipelines (batch + streaming) for pre-and post-training biology datasets - covering raw sequence, structure, and model-generated data.
  • Build and evolve our data platform: data lakes/lakehouses, metadata and lineage systems, feature stores, and orchestration frameworks.
  • Define and implement data cataloging, governance, and versioning practices that ensure full reproducibility and traceability across datasets.
  • Collaborate with researchers and ML engineers to translate modeling requirements into robust data systems that optimize throughput, reliability, and cost.
  • Establish best practices for data CI/CD, observability, infrastructure-as-code, fault tolerance, and data quality monitoring.
  • Continuously explore and integrate emerging technologies - Ray, Spark, Flink, modern data mesh approaches - to keep our stack state-of-the-art.

Preferred qualifications

  • Senior-level engineer with 3+ years (ideally 5+) designing and scaling large-scale data infrastructure.
  • Deep experience with distributed data frameworks such as Spark, Ray, or Flink for large-volume, high-throughput processing.
  • Foundation in data platform concepts: metadata stores, lineage tracking, orchestration tools, schema evolution, dataset versioning.
  • Skilled in debugging, performance optimization, and building observability into data systems.
  • Collaborative mindset - you can partner effectively with scientists, ML engineers, and infrastructure teams.
  • Excited by the chance to define new standards for data infrastructure in a domain that could reshape medicine and biology.
  • Experience with major cloud providers (AWS, GCP, or Azure), including familiarity with data warehousing tools is a plus.
  • Knowledge of large-scale distributed systems, machine learning, biology and biology datasets is a plus.

What success looks like

  • Datasets for training and evaluation are reliable, reproducible, and lineage aware.
  • Model and research teams can move faster thanks to self-service, high-throughput data pipelines.
  • Our data infrastructure scales efficiently with compute and storage demands as model size and scope grow.

The salary range for this position is $150,000 to $350,000 per year, plus a competitive equity package. Compensation package will vary based on job-related skills, experience, and knowledge. The compensation package also includes comprehensive medical, dental, and vision benefits.


Top Skills

AWS
Azure
Flink
GCP
Hadoop
Kafka Streams
Ray
Spark
Spark Streaming
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
18 Employees

What We Do

Company behind ESM3

Similar Jobs

Easy Apply
Hybrid
New York, NY, USA
260 Employees
220K-265K Annually
Easy Apply
Hybrid
New York, NY, USA
260 Employees
77K-85K Annually

Child Mind Institute Logo Child Mind Institute

Media and Talent Relations Director

Consumer Web • Gaming • Healthtech • Kids + Family • Software • Virtual Reality • Biotech
Hybrid
New York, NY, USA
387 Employees

Zeta Global Logo Zeta Global

Senior Account Executive

AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics
Easy Apply
Hybrid
New York, NY, USA
2429 Employees
170K-200K Annually

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account