Staff Engineer, Machine Learning Operations

Posted 8 Days Ago
Be an Early Applicant
Bangalore, Bengaluru Urban, Karnataka
In-Office
Senior level
Healthtech • Information Technology • Consulting
The Role
The Staff Engineer, Machine Learning Operations will lead technical efforts in AI platform architecture and CI/CD strategies, ensuring model reliability and governance.
Summary Generated by Built In

Job Description:

The Staff Engineer, Machine Learning Operations will provide technical leadership for our AI platform, define architecture and standards for training, evaluation, and high-scale, low-latency inference of models and AI agents. This role will be responsible to develop and implement strategy for CI/CD, governance, and reliability across multiple AI models, partnering with security, compliance, and leadership to deliver resilient, cost-effective AI. Aside from the core responsibilities, Machine Learning Operations Engineers will also have responsibilities shared with other engineering functions.
  • Establish the technical vision for end-to-end ML-AIOps (from data to model/agent to product integration).
  • Design and evolve multi-region, multi-tenant inference/training platforms with strong isolation.
  • Design and Implement CI/CD strategy for models/agents/data pipelines (policy gates, canary/rollbacks, approvals).
  • Institutionalize model/agent monitoring (quality, safety, drift) and business KPIs; sponsor continuous evaluations.
  • Lead major reliability programs (capacity planning, disaster recovery, chaos testing, incident management).
  • Establish and implement governance methodologies for datasets, prompts, models, and agents (lineage, approvals, etc.).
  • Collaborate on security architecture with security teams (zero-trust, key management, vaults, secrets rotation, audit).
  • Evaluate and integrate platforms/vendors; influence build-vs-buy; manage technical debt and roadmap.
  • Mentor/prioritize other engineers; build a culture of documentation, runbooks, and post-incident learning.
  • Perform other duties that support the overall objective of the position.
Education Required:
  • Bachelor’s degree in Computer Science, Information Technology, Electronics/Electrical Engineering, or a related field.
  • Or, any combination of education and experience which would provide the required qualifications for the position.
Experience Required:
  • 5-8 years of hands-on experience in MLOps, DevOps, or related roles involving operation of an AI/ML platform at-scale with 10 – 12+ years of experience in overall IT experience.
  • IaC with Terraform at an organizational scale and strong experience in Unix based environments.
  • Expert with Continerization and orchestration (Docker/Kubernetes) and cloud, including networking, security, and autoscaling.
  • Strong AWS experience is expected.
  • Experience in building CI/CD pipelines using tools like BitBucket Pipelines, AWS Code Pipelines or similar.
  • Experience with mature observability stacks (e.g. DataDog/Dynatrace). Experience with LLM observability frameworks is a plus.
  • Deep experience with operationalizing ML/AI models. Experience with LLMs or AI agents is a plus.
Knowledge, Skills & Abilities:
  • Knowledge of: Familiarity with database technologies and data pipelines (Data Lakes, Lakehouse, Warehouse, NoSQL, ETL/ELT processes). Solid understanding of model monitoring, logging, and debugging tools. Strong command of platform SRE practices, and cost governance. Familiarity with feature stores, lakehouse patterns, distributed computing systems (Spark) and model versioning systems (MLFlow).
  • Skill in: Strong problem-solving skills and a detail-oriented mindset. Excellent communication skills.
  • Ability to: Excellent collaboration ability. Ability to have a clear view of complete systems and the ability to understand and work on different components as and when required. 
The company has reviewed this job description to ensure that essential functions and basic duties have been included. It is intended to provide guidelines for job expectations and the employee's ability to perform the position described. It is not intended to be construed as an exhaustive list of all functions, responsibilities, skills and abilities. Additional functions and requirements may be assigned by supervisors as deemed appropriate. This document does not represent a contract of employment, and the company reserves the right to change this job description and/or assign tasks for the employee to perform, as the company may deem appropriate.

NextGen Healthcare is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Top Skills

AWS
Aws Code Pipelines
Bitbucket Pipelines
Datadog
DevOps
Docker
Dynatrace
Kubernetes
Mlflow
Mlops
Spark
Terraform
Unix
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Atlanta, GA
3,179 Employees

What We Do

NextGen Healthcare is on a relentless quest to improve the lives of those who practice medicine and those they care for. We provide tailored solutions to fit the precise needs of ambulatory practices, as they strive to reach the quadruple aim while navigating the journey of value-based care. The result? Healthier patients and happier providers.

Similar Jobs

CrowdStrike Logo CrowdStrike

Senior Software Engineer

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
KA, IND
10000 Employees

Allica Bank Logo Allica Bank

Technical Incident Lead

Fintech • Software • Financial Services
In-Office or Remote
2 Locations
502 Employees

Elsevier Logo Elsevier

Account Manager

Artificial Intelligence • Healthtech • Information Technology • Other • Analytics
In-Office or Remote
3 Locations

Allica Bank Logo Allica Bank

Full-stack Engineer

Fintech • Software • Financial Services
In-Office or Remote
2 Locations
502 Employees

Similar Companies Hiring

Sailor Health Thumbnail
Telehealth • Social Impact • Healthtech
New York City, NY
20 Employees
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
15 Employees
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account