Key Responsibilities:
- Lead the strategic design of a holistic solution to our large and diverse data usage needs,
- Set the collaboration and reusability strategy for data consumption including publicly available and partner generated data
- Ensure the FAIR principles are followed in our data storage and retrieval strategy.
- Build and maintain scalable, efficient, and reusable data products and codebases for large-scale foundation model training, adaptation, evaluation, and inference.
- Collaborate closely with data engineers and research scientists to integrate models into production environments.
- Ensure code quality, scalability, and performance through rigorous testing and code reviews.
Qualifications:
- Bachelor’s, Master’s degree in Computer Science, Engineering, or related field. Experience in life sciences or healthcare is required.
- Strong familiarity with at least some (the more the better) of the following biomedical data types: Sequencing data, other high throughput omics data, biological imaging data, clinical and phenotypic data
- Experience with using (developing an advantage) large scale data products and systems for biological or biomedical applications.
- Stong programming skills in JavaScript, Python, and modern web development frameworks, and familiarity with GPU-accelerated tools (e.g., CUDA, cuDNN, Triton).
- Knowledge of major deep learning frameworks such as PyTorch, HuggingFace Transformers & Accelerate, or Megatron-LM/DeepSpeed.
- Familiarity with resource management and scheduling systems (e.g., SLURM, Kubernetes).
- Proficiency in back-end frameworks like Django, Flask, or Node.js, and database technologies (e.g., PostgreSQL, MongoDB).
- Expertise in distributed systems, cloud computing (AWS, GCP), and containerization tools (Docker, Kubernetes).
Preferred Qualifications:
- Prior experience pre-training or serving large language models or large-scale foundation models.
- Experience with deep learning workflows.
- Knowledge of challenges and experience with bioinformatics tools
- Familiarity with version control systems like Git and CI/CD pipelines.
- Strong understanding of RESTful APIs, authentication, and deployment pipelines
- Familiarity with machine learning workflows and biological datasets.
Similar Jobs
What We Do
GenBio.AI, Inc. (GenBio AI) is an innovative global startup dedicated to developing the world's first AI-driven Digital Organism, an integrated system of multiscale foundation models for predicting, simulating, and programming biology at all levels.
Our goal is to achieve comprehensive, actionable empirical understandings of the mechanisms underlying all organismal physiologies and diseases. This will pave the way for a new paradigm in drug design, bio-engineering, personalized medicine, and fundamental biomedical research, all powered by Generative Biology.
Our founding team consists of world-renowned scientists and researchers in AI and Biology from prestigious institutions such as CMU, MBZUAI, WIS, alongside prominent financial investors.
GenBio AI, a true global effort from day one, is establishing offices in Palo Alto, Paris, and Abu Dhabi.









