Software Engineer, ML Platform

Posted 7 Days Ago
Seattle, WA, USA
In-Office
140K-215K Annually
Senior level
Healthtech • Biotech
The Role
Build and operate ML infrastructure for large-scale biological foundation models: dispatch distributed training across multi-cloud GPU clusters, deploy high-throughput storage, implement evaluation and experiment-tracking tooling, and integrate training with telemetry and checkpointing systems.
Summary Generated by Built In

About Xaira Therapeutics

Xaira is an innovative biotech startup focused on leveraging AI to transform drug discovery and development. The company is leading the development of generative AI models to design protein and antibody therapeutics, enabling the creation of medicines against historically hard-to-drug molecular targets. It is also developing foundation models for biology and disease to enable better target elucidation and patient stratification. Collectively, these technologies aim to continually enable the identification of novel therapies and to improve success in drug development. Xaira is headquartered in the San Francisco Bay Area, Seattle, and London.

About the Role
We are seeking a Software Engineer to join our Platform team to design, build, and deploy the AI infrastructure that powers our world-class research team.  In this role, you’ll collaborate closely with AI Scientists and other engineers to enable the effective use of thousands of GPUs for training and inferencing cutting-edge biological foundation models.

This role spans a range of problems and skillsets, ranging from MLOps of cutting-edge GPU clusters, to backend engineering of control plane APIs.  Our ideal candidate has an opinion about slurm or kubernetes for model training, cares about maximizing bandwidth from the storage subsystems to the GPU, and can build the API paved path for submitting training jobs that are able to dispatch to multiple clusters.

What You Will Do

  • Develop and improve our model training system, responsible for dispatching distributed training jobs to clusters across multiple clouds.
  • Deploy storage subsystems that improve dataset management and throughput for training datasets.
  • Build evaluation infrastructure that enables easy execution and tracking.
  • Build base tooling for integrating model training with other internal infrastructure, such as telemetry, experiment tracking, and checkpointing.
  • Prior experience with biology is not required - we will teach what you need to know.  You’ll get to go in the lab, and for our ideal candidate, this should be a perk, not a chore!

Preferred Skills and Qualifications

  • Degree in Computer Science, Machine Learning, Computational Biology, or a related field.
  • 5+ years of industry experience building and deploying ML systems in production environments
  • Experience leading technical projects and driving cross-functional execution.
  • Strong programming skills in Python.
  • Experience with infrastructure/ops tools such as Terraform, Ansible.
  • Experience with deep learning frameworks such as Torch, Jax.
  • Solid understanding of machine learning
  • Experience with the infrastructure needs of large-scale model training.
  • Strong problem-solving skills and ability to work in a collaborative, multidisciplinary environment.

Compensation

We offer a competitive compensation and benefits package, seeking to provide an open, flexible, and friendly work environment to empower employees and provide them with a platform to develop their long-term careers. A Summary of Benefits is available for all applicants. We offer a competitive package that includes base salary, bonus, and equity. The base pay range for this position is expected to be $140,000 - $215,000 annually; however, the base pay offered may vary depending on the market, job-related knowledge, skills and capabilities, and experience.
Xaira Therapeutics an equal-opportunity employer. We believe that our strength is in our differences. Our goal to build a diverse and inclusive team began on day one, and it will never end.

TO ALL RECRUITMENT AGENCIES: Xaira Therapeutics does not accept agency resumes. Please do not forward resumes to our jobs alias or employees. Xaira Therapeutics is not responsible for any fees related to unsolicited resumes.

Skills Required

  • Degree in Computer Science, Machine Learning, Computational Biology, or related field
  • 5+ years industry experience building and deploying ML systems in production
  • Experience leading technical projects and driving cross-functional execution
  • Strong programming skills in Python
  • Experience with infrastructure/ops tools such as Terraform and Ansible
  • Experience with deep learning frameworks such as PyTorch and JAX
  • Solid understanding of machine learning concepts and large-scale model training needs
  • Experience with GPU clusters and distributed training systems
  • Familiarity with Slurm or Kubernetes for model training orchestration
  • Experience deploying storage subsystems for dataset management and high throughput
  • Experience integrating training with telemetry, experiment tracking, and checkpointing
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Brisbane, CA
112 Employees
Year Founded: 2023

What We Do

Xaira Therapeutics is an integrated biotechnology company driving advances in artificial intelligence to learn the language of life and transform how we treat disease. The company seeks to rethink the drug discovery and development process from end-to-end by bringing together leading talent across three core areas: machine learning research to better understand biology, expansive data generation to power new models, and robust therapeutic product development to treat disease. Xaira is headquartered in the San Francisco Bay Area

Similar Jobs

Expedia Group Logo Expedia Group

Development Engineer

AdTech • eCommerce • Information Technology • Travel • Generative AI
Hybrid
Seattle, WA, USA
16000 Employees
185K-295K Annually

MetLife Logo MetLife

Business Procedures Analyst

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Remote or Hybrid
United States
43000 Employees
42K-72K Annually

Applied Systems Logo Applied Systems

Senior User Experience Designer

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
Remote or Hybrid
4 Locations
3040 Employees
100K-130K Annually

Applied Systems Logo Applied Systems

Cloud Platform Engineer

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
Remote or Hybrid
2 Locations
3040 Employees
100K-160K Annually

Similar Companies Hiring

Camber Thumbnail
Fintech • Healthtech • Social Impact
New York, New York
90 Employees
Sailor Health Thumbnail
Healthtech • Social Impact • Telehealth
New York City, NY
20 Employees
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account