Senior MLOps Engineer

Posted Yesterday
Be an Early Applicant
Toronto, ON, CAN
Hybrid
175K-200K Annually
Senior level
Biotech
The Role
Own and evolve ML infrastructure: maintain GCP via Terraform, manage IAM/RBAC, run CI/CD (CircleCI, GitHub Actions), administer workflow orchestration (Nextflow/Seqera, Argo, Kubeflow), manage experiment tracking (W&B, MLflow), build containerized environments and Kubernetes clusters, provision and debug GPU resources, write Python tooling, and deploy and monitor ML models in production.
Summary Generated by Built In
About Us
Deep Genomics is at the forefront of using artificial intelligence to transform drug discovery. Our proprietary AI platform decodes the complexity of RNA biology to identify novel drug targets, mechanisms, and therapeutics inaccessible through traditional methods. With expertise spanning machine learning, bioinformatics, data science, engineering, and drug development, our multidisciplinary team in Toronto and Cambridge, MA is revolutionizing how new medicines are created.
 
Opportunity 
Join us in building the future of AI-driven drug discovery as a Senior MLOps Engineer. You will own and evolve the infrastructure that powers our ML pipelines – from cloud environments and CI/CD systems to workflow orchestration and model deployment. You will work closely with ML scientists, bioinformaticians, and software engineers to keep our platform reliable, reproducible, and scalable.
 
Ideal Candidate 
You are someone who enjoys keeping the infrastructure running smoothly so that scientists can focus on their research. You are comfortable working across cloud platforms, CI/CD systems, containers, and GPUs – and you take pride in making these systems reliable and easy for others to use. You have 4+ years of experience in production infrastructure or MLOps, you write solid Python, and you are curious about the ML and scientific workflows your work supports. Above all, you are a collaborative, kind team member who communicates clearly, adapts to evolving needs, and is happy to help colleagues grow their own infrastructure skills along the way. If this sounds like you, we would love to hear from you.

Key Responsibilities

  • Maintain and improve cloud infrastructure (GCP) using Infrastructure-as-Code tools (Terraform).
  • Manage IAM, RBAC, and permission policies across cloud environments.
  • Own and evolve CI/CD pipelines (CircleCI, GitHub Actions) and ensure best practices are followed across the engineering and ML teams.
  • Administer and support workflow orchestration platforms (e.g., Seqera/Nextflow, Argo, Kubeflow).
  • Operate and configure ML experiment tracking and registry tooling (e.g., W&B, MLflow).
  • Build and maintain containerized environments (Docker) and manage Kubernetes clusters.
  • Manage GPU resources – provisioning, scheduling, and debugging hardware and driver issues.
  • Write and maintain Python tooling, scripts, and integrations that support ML infrastructure.
  • Help deploy ML models to production environments and monitor their performance.

Basic Qualifications

  • 4+ years of experience operating production infrastructure.
  • Proficiency with cloud platforms (GCP preferred; AWS/Azure acceptable) and Infrastructure-as-Code (Terraform).
  • Extensive Hands-on experience with Kubernetes and containerization (Docker).
  • Solid background in CI/CD systems (CircleCI, GitHub Actions, or similar).
  • Experience managing GPU compute (provisioning, debugging, driver management).
  • Familiarity with Python package and environment management (e.g., pip, conda, pixi).
  • Strong Python programming skills.
  • Self-motivated problem solver with excellent communication skills.

Preferred Qualifications

  • Understanding of ML frameworks (e.g., PyTorch, PyTorch Lightning), ML workflows (training, inference, evaluation), and the model lifecycle.
  • Familiarity with MLOps tooling (e.g., W&B, Ray, VertexAI) and distributed compute patterns 
    (e.g., DDP, realtime/batch inference, multi-node training).
  • Familiarity with Kubernetes CRDs and batch/gang schedulers (e.g., Volcano, Kueue).
  • Experience working with large-scale datasets (storage, versioning, efficient access patterns).
  • Experience working directly with scientists and researchers in an interdisciplinary setting.
  • Knowledge of biology and/or machine learning science.
  • Familiarity with data compliance and governance frameworks (e.g., HIPAA, SOC 2).
  • Previous startup experience.

What We Offer

  • A collaborative and innovative environment at the frontier of computational biology, machine learning, and drug discovery. 
  • Highly competitive compensation, including meaningful stock ownership.
  • Comprehensive benefits - including health, vision, and dental coverage for employees and families, employee and family assistance program. 
  • Flexible work environment - including flexible hours, extended long weekends, holiday shutdown, unlimited personal days.
  • Maternity and parental leave top-up coverage, as well as new parent paid time off. 
  • Focus on learning and growth for all employees - learning and development budget & lunch and learns.
  • Facilities located in the heart of Toronto - the epicenter of machine learning and AI research and development, and in Kendall Square, Cambridge, Mass. - a global center of biotechnology and life sciences.

Deep Genomics encourages applications from all backgrounds who seek the opportunity to build the world's leading AI-driven genetic medicine company. 
 
If you have a disability or special need, accommodation is available on request for candidates taking part in all aspects of the selection process.
 
 
*This posting reflects a current vacancy. 
 
We offer competitive compensation aligned with local market benchmarks. The salary range for this role is $175,000 - $200,000, and reflects Canada-based roles; compensation may differ for U.S.-based candidates.

Skills Required

  • 4+ years of experience operating production infrastructure
  • Proficiency with cloud platforms (GCP preferred; AWS/Azure acceptable) and Infrastructure-as-Code (Terraform)
  • Extensive hands-on experience with Kubernetes and containerization (Docker)
  • Solid background in CI/CD systems (CircleCI, GitHub Actions, or similar)
  • Experience managing GPU compute (provisioning, debugging, driver management)
  • Familiarity with Python package and environment management (pip, conda, pixi)
  • Strong Python programming skills
  • Self-motivated problem solver with excellent communication skills
  • Understanding of ML frameworks (e.g., PyTorch, PyTorch Lightning) and model lifecycle
  • Familiarity with MLOps tooling (e.g., W&B, Ray, Vertex AI) and distributed compute patterns
  • Familiarity with Kubernetes CRDs and batch/gang schedulers (e.g., Volcano, Kueue)
  • Experience working with large-scale datasets (storage, versioning, efficient access patterns)
  • Experience working directly with scientists and researchers in an interdisciplinary setting
  • Knowledge of biology and/or machine learning science
  • Familiarity with data compliance and governance frameworks (e.g., HIPAA, SOC 2)
  • Previous startup experience
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Toronto, ON
107 Employees
Year Founded: 2014

What We Do

Deep Genomics is using artificial intelligence to build a new universe of life-saving genetic therapies. The future of medicine will rely on artificial intelligence, because biology is too complex for humans to understand. At Deep Genomics, our geneticists, molecular biologists and chemists develop new ways of detecting and treating disease using our biologically accurate artificial intelligence technology.

Similar Jobs

Wise Logo Wise

Compliance Lead - Canada

Fintech • Mobile • Payments • Software • Financial Services
Remote or Hybrid
Ottawa, ON, CAN
9000 Employees

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Sales Associate III

eCommerce • Fashion • Retail • Sales • Wearables • Design
Hybrid
Burlington, ON, CAN
16000 Employees
18-22 Hourly

Samsara Logo Samsara

Manager, SMB Sales

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
Toronto, ON, CAN
4000 Employees
199K-257K Annually

Ericsson Logo Ericsson

Solution Introduction Specialist -NH

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)
In-Office
Ottawa, ON, CAN
88000 Employees
97K-127K Annually

Similar Companies Hiring

Formation Bio Thumbnail
Artificial Intelligence • Big Data • Healthtech • Biotech • Pharmaceutical
New York, NY
150 Employees
SOPHiA GENETICS Thumbnail
Software • Healthtech • Biotech • Big Data • Artificial Intelligence
Boston, MA
450 Employees
Pfizer Thumbnail
Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
New York, NY
121990 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account