MLOps / Infrastructure Engineer

Reposted 3 Days Ago
New York City, NY, USA
In-Office
130K-230K Annually
Mid level
Artificial Intelligence • Analytics • Consulting • Cybersecurity
Research Services
The Role
Looking for an MLOps Engineer to design, deploy, and monitor real-time ML systems. Responsibilities include managing infrastructure, optimizing APIs, and collaborating with ML teams.
Summary Generated by Built In

About 10a Labs: 10a Labs is the safety and threat-intelligence layer trusted by frontier AI labs, AI unicorns, Fortune 10 companies, and leading global technology platforms. Our adversarial red teaming, model evaluations, and intelligence collection enable engineering, safety, and security teams to stay ahead of evolving threats and deploy AI systems safely.

3–8 Years of Industry Experience | Remote | High-Impact

About the Role: We’re looking for an infrastructure-focused engineer who thrives at the intersection of machine learning, systems, and product delivery. This is a hands-on role responsible for deploying, monitoring, and scaling a real-time ML-powered content moderation system used to detect and triage abuse, threats, and edge-case language. You’ll work closely with ML engineers, researchers, and clients to build infrastructure that makes high-performance models accessible and reliable in the wild.

In This Role, You Will:

  • Design and maintain cloud infrastructure (GCP or AWS) to support real-time model serving, data ingestion, and evaluation workflows.
  • Deploy and optimize APIs for low-latency access to ML models and embedding search systems.
  • Manage and optimize the end-to-end training data flow—from sourcing and cleaning datasets to preparing them for model consumption—ensuring accuracy, scalability, and efficiency.
  • Build observability tooling for production ML pipelines (monitor latency, error rates, request volumes, drift).
  • Automate model deployment, retraining, and evaluation pipelines (CI/CD for ML).
  • Work with ML engineers to package models for serving.
  • Help manage vector databases and semantic search infrastructure (e.g., Pinecone, FAISS, Vertex Matching Engine).
  • Ensure security, compliance, and uptime of infrastructure supporting safety-critical systems.

We’re Looking for Someone Who:

  • Has 3–8 years of experience deploying machine learning systems or high-availability backend systems.
  • Has shipped and maintained production infrastructure at scale, supporting ML workflows.
  • Has experience with GCP, AWS, or similar platforms (including managed ML services).
  • Is proficient in Terraform, Docker, Kubernetes, or similar infra tools.
  • Understands performance tradeoffs in serving models and embedding search pipelines.
  • Can work cross-functionally with ML, security, and product teams to deploy safely and iterate fast.
  • Brings a builder's mindset and bias for ownership in ambiguous environments.

Nice to Have Experience With:

  • Experience with vector databases or ANN systems, preferably within GCP (or AWS).
  • Experience serving LLMs or embedding-based models via API.
  • Experience with model monitoring, logging, and metrics platforms (e.g., Prometheus, Grafana, Sentry).
  • Familiarity with trust & safety infrastructure, abuse detection, or policy enforcement systems.

What Success Looks Like in the First 3 Months:

  • You’ve deployed and monitored a real-time ML inference system with well-defined observability.
  • You’ve implemented an API with latency under 200ms for embedding or classifier-based inference.
  • You’ve partnered with ML engineers to streamline deployment and retraining workflows.
  • You’ve built logging and monitoring that gives insight into system performance and classifier behavior.

Compensation & Benefits:

  • Salary Range: $130K–$230K, depending on experience and location.
  • Bonus: Performance-based annual bonus.
  • Professional Development: Support for continuing education, conferences, or training.
  • Work Environment: Fully remote, U.S.-based.
  • Health Benefits: Comprehensive health, dental, and vision coverage.
  • Time Off: Generous PTO and paid holiday schedule.
  • Retirement: 401(k) plan.

Skills Required

  • 3-8 years of experience deploying machine learning systems
  • Experience with GCP, AWS, or similar platforms
  • Proficient in Terraform, Docker, Kubernetes, or similar tools
  • Experience with model monitoring, logging, and metrics platforms

10a Labs Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about 10a Labs and has not been reviewed or approved by 10a Labs.

  • Healthcare Strength Benefits include comprehensive medical, dental, and vision coverage for full-time roles, listed across multiple postings. Coverage is presented as a core part of the package rather than a role-specific perk.
  • Leave & Time Off Breadth Positions frequently advertise generous PTO and paid holidays, with some roles noting unlimited PTO and flexible hours. This indicates substantial time-off provisions alongside remote work arrangements.
  • Strong & Reliable Incentives Compensation commonly includes performance-based annual bonuses and occasional spot bonuses. These incentives are presented as standard components for many roles.

10a Labs Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
28 Employees

What We Do

10a Labs is an applied research and technology company specializing in AI security. We deliver intelligence collection, investigative research, and analysis for AI unicorns, Fortune 10 companies, and U.S. tech leaders.

Similar Jobs

Cloudflare Logo Cloudflare

GTM Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Remote or Hybrid
6 Locations
4400 Employees
161K-303K Annually

Wise Logo Wise

Regulatory Product Compliance & Risk Lead

Fintech • Mobile • Payments • Software • Financial Services
Hybrid
New York, NY, USA
9000 Employees

Cloudflare Logo Cloudflare

Senior Cloudflare One GTM Specialist (Central or East)

Cloud • Information Technology • Security • Software • Cybersecurity
Remote or Hybrid
3 Locations
4400 Employees
146K-303K Annually

Ro (Ro.co) Logo Ro (Ro.co)

Senior Accountant

Healthtech • Pharmaceutical • Telehealth
Easy Apply
Hybrid
New York, NY, USA
824 Employees
84K-111K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account