The Role
Operate and monitor ML/AI models and agentic systems in production. Build AI observability, logging, tracing, and evaluation pipelines. Monitor LLM outputs, detect drift and model degradation, and maintain data/feature pipelines. Develop CI/CD, model versioning, experiment tracking, and automate alerts and incident remediation while collaborating with data scientists and platform teams.
Summary Generated by Built In
Key Responsibilities AI/ML Model Operations
- Deploy, manage, and monitor machine learning and AI models in production environments.
- Implement model performance monitoring including accuracy, latency, and inference metrics.
- Detect and mitigate concept drift, data drift, and model degradation.
AI Observability
- Design and implement AI observability frameworks to track model behavior and reliability.
- Monitor LLM outputs, hallucination rates, and response quality.
- Implement logging, tracing, and evaluation pipelines for AI systems.
Agentic Systems Monitoring
- Monitor agent-based AI workflows and autonomous systems.
- Track agent actions, tool usage, decision paths, and execution outcomes.
- Implement guardrails, safety monitoring, and failure detection for AI agents.
Data Pipeline Monitoring
- Monitor and maintain data ingestion, transformation, and feature pipelines.
- Ensure data quality, schema consistency, and pipeline reliability.
- Detect and resolve pipeline failures and anomalies.
Infrastructure & Automation
- Build and maintain CI/CD pipelines for ML models and AI systems.
- Manage model versioning, experiment tracking, and reproducibility.
- Automate monitoring alerts, incident response, and remediation.
Collaboration
- Work closely with data scientists, ML engineers, platform teams, and product teams.
- Support continuous improvement of AI system reliability and governance
-
Compensation, Benefits and Duration
Minimum Compensation: USD 50,000
Maximum Compensation: USD 177,000
Compensation is based on actual experience and qualifications of the candidate. The above is a reasonable and a good faith estimate for the role.
Medical, vision, and dental benefits, 401k retirement plan, variable pay/incentives, paid time off, and paid holidays are available for full time employees.
This position is not available for independent contractors
No applications will be considered if received more than 120 days after the date of this post
Skills Required
- Deploy, manage, and monitor machine learning and AI models in production environments.
- Implement model performance monitoring including accuracy, latency, and inference metrics.
- Detect and mitigate concept drift, data drift, and model degradation.
- Design and implement AI observability frameworks to track model behavior and reliability.
- Monitor LLM outputs, hallucination rates, and response quality.
- Monitor agent-based AI workflows; implement guardrails, safety monitoring, and failure detection for agents.
- Monitor and maintain data ingestion, transformation, and feature pipelines ensuring data quality and schema consistency.
- Build and maintain CI/CD pipelines for ML models and AI systems.
- Manage model versioning, experiment tracking, and reproducibility.
- Automate monitoring alerts, incident response, and remediation.
- Collaborate with data scientists, ML engineers, platform teams, and product teams.
Am I A Good Fit?
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.
Success! Refresh the page to see how your skills align with this role.
The Company