GenBio AI

Lead Data Engineer

Sorry, this job was removed at 08:10 p.m. (CST) on Wednesday, Aug 13, 2025

Palo Alto, CA

In-Office

Artificial Intelligence • Software

The Role

Headquartered in Silicon Valley, we are a newly established start-up, where a collective of visionary scientists, engineers, and entrepreneurs are dedicated to transforming the landscape of biology and medicine through the power of Generative AI. Our team comprises leading minds and innovators in AI and Biological Science, pushing the boundaries of what is possible. We are dreamers who reimagine a new paradigm for biology and medicine.

We are committed to decoding biology holistically and enabling the next generation of life-transforming solutions. As the first mover in pan-modal Large Biological Models (LBM), we are pioneering a new era of biomedicine, with our LBM training leading to ground-breaking advancements and a transformative approach to healthcare. Our exceptionally strong R&D team and leadership in LLM and generative AI position us at the forefront of this revolutionary field. With headquarters in Silicon Valley, California, and a branch office in Paris, we are poised to make a global impact. Join us as we embark on this journey to redefine the future of biology and medicine through the transformative power of Generative AI.

Key Responsibilities:

Lead the strategic design of a holistic solution to our large and diverse data usage needs,
Set the collaboration and reusability strategy for data consumption including publicly available and partner generated data
Ensure the FAIR principles are followed in our data storage and retrieval strategy.
Build and maintain scalable, efficient, and reusable data products and codebases for large-scale foundation model training, adaptation, evaluation, and inference.
Collaborate closely with data engineers and research scientists to integrate models into production environments.
Ensure code quality, scalability, and performance through rigorous testing and code reviews.

Qualifications:

Bachelor’s, Master’s degree in Computer Science, Engineering, or related field. Experience in life sciences or healthcare is required.
Strong familiarity with at least some (the more the better) of the following biomedical data types: Sequencing data, other high throughput omics data, biological imaging data, clinical and phenotypic data
Experience with using (developing an advantage) large scale data products and systems for biological or biomedical applications.
Stong programming skills in JavaScript, Python, and modern web development frameworks, and familiarity with GPU-accelerated tools (e.g., CUDA, cuDNN, Triton).
Knowledge of major deep learning frameworks such as PyTorch, HuggingFace Transformers & Accelerate, or Megatron-LM/DeepSpeed.
Familiarity with resource management and scheduling systems (e.g., SLURM, Kubernetes).
Proficiency in back-end frameworks like Django, Flask, or Node.js, and database technologies (e.g., PostgreSQL, MongoDB).
Expertise in distributed systems, cloud computing (AWS, GCP), and containerization tools (Docker, Kubernetes).

Preferred Qualifications:

Prior experience pre-training or serving large language models or large-scale foundation models.
Experience with deep learning workflows.
Knowledge of challenges and experience with bioinformatics tools
Familiarity with version control systems like Git and CI/CD pipelines.
Strong understanding of RESTful APIs, authentication, and deployment pipelines
Familiarity with machine learning workflows and biological datasets.

Join us as we embark on this journey to redefine the future of biology and medicine.

We are an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

View all jobs at GenBio AI

View GenBio AI Profile

Report Job

Similar Jobs

Boeing

Lead Spacecraft Command and Data Handling (C&DH) Engineer - Millennium Space Systems

Aerospace • Information Technology • Cybersecurity • Defense • Manufacturing

In-Office

El Segundo, CA, USA

141000 Employees

103K-151K Annually

Vori

Lead Data Engineer

Marketing Tech • Software

In-Office

San Francisco, CA, USA

31 Employees

Tiger Analytics

Lead Data Engineer

Big Data • Analytics • Business Intelligence • Big Data Analytics

In-Office

Newport Beach, CA, USA

5000 Employees

Pacific Life

Lead Data Engineer

Financial Services

In-Office

Newport Beach, CA, USA

3447 Employees

148K-181K Annually

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Palo Alto, CA

29 Employees

Year Founded: 2024

What We Do

GenBio.AI, Inc. (GenBio AI) is an innovative global startup dedicated to developing the world's first AI-driven Digital Organism, an integrated system of multiscale foundation models for predicting, simulating, and programming biology at all levels.

Our goal is to achieve comprehensive, actionable empirical understandings of the mechanisms underlying all organismal physiologies and diseases. This will pave the way for a new paradigm in drug design, bio-engineering, personalized medicine, and fundamental biomedical research, all powered by Generative Biology.

Our founding team consists of world-renowned scientists and researchers in AI and Biology from prestigious institutions such as CMU, MBZUAI, WIS, alongside prominent financial investors.

GenBio AI, a true global effort from day one, is establishing offices in Palo Alto, Paris, and Abu Dhabi.