Senior MLOps Engineer

Posted 8 Hours Ago
Be an Early Applicant
Hiring Remotely in VN
Remote
Senior level
Artificial Intelligence • Machine Learning • Software • Infrastructure as a Service (IaaS)
The Role
Design and operate GPU/ML CI/CD and release pipelines (GitHub Actions, self-hosted H100/A100 runners). Implement packaging, multi-arch containers, continuous benchmarking/performance gates, contributor-friendly CI, and infrastructure-as-code with security and autoscaling.
Summary Generated by Built In

JOIN US – BUILD THE FUTURE OF AI WITH TENSORMESH.AI FROM VIETNAM!

Tensormesh.ai – một startup AI đình đám tại Mỹ được spinoff từ dự án mã nguồn mở LMCache, đang trên đà định hình lại cách thế giới hiểu và triển khai AI hiệu năng cao – đang chính thức mở rộng và xây dựng team Core Engineering tại Việt Nam! Chúng tôi tin rằng Việt Nam xứng đáng là trung tâm R&D cốt lõi cho khu vực Đông Nam Á, và bạn có thể là một phần quan trọng trong hành trình đó.

We are looking for: MLOps Engineer — LMCache (Open-Source Infrastructure)

1. What You'll Own

- Pipeline architecture: GitHub Actions workflows + self-hosted GPU runner fleet (H100/A100); multi-stage pipeline from lint → unit → GPU integration → cross-framework compat (vLLM/SGLang) → performance regression

- Release engineering: semantic versioning, PyPI publishing, multi-arch container images, Helm charts, Sigstore/cosign signing, coordination with downstream integrators

- Performance gates: Continuous benchmarking that blocks regressions in cache hit rate, TTFT, throughput, memory before merge

- Contributor experience: fast PR feedback, eliminate flakiness, dev containers that don't require expensive GPUs

- Security & IaC: SBOM/SLSA provenance, secret rotation, runner fleet via Terraform with cost-optimized autoscaling.

2. Required

- 4+ years MLOps/DevOps/SRE; 2+ years CI/CD for GPU or ML workloads

- Deep GitHub Actions expertise (workflows, composite actions, self-hosted runners at scale)

- Python packaging & PyPI release flow (incl. wheels with native extensions)

- Docker multi-stage/multi-arch; NVIDIA Container Toolkit

- Terraform/Ansible for cloud GPU infrastructure

- Track record building CI that contributors trust — fast, non-flaky, clear failures

3. Strongly Preferred

Maintainer/contributor experience on a popular OSS project

Familiarity with vLLM, SGLang, NVIDIA Dynamo, KServe, or Triton

Kubernetes in CI (Kind/k3s, multi-node integration tests)

Continuous benchmarking tools + time-series perf tracking

Supply chain security (Sigstore, SLSA, syft/grype)

RDMA / high-perf networking / P2P system testing

Tại sao chọn Tensormesh.ai?

* Làm việc trực tiếp với engineer team tại Mỹ và Việt Nam – sản phẩm bạn build sẽ được dùng bởi các công ty AI hàng đầu thế giới.

* Mức đãi ngộ cạnh tranh toàn cầu.

* Văn hóa engineering-first, không rào cản, không bureaucracy – chỉ có code, impact và learning.

* Linh hoạt remote/hybrid.

Ứng tuyển ngay! Gửi CV + GitHub/LinkedIn về: [email protected] hoặc [email protected]

Hoặc tag ngay người bạn nghĩ "xứng đáng làm core engineer cho một startup AI toàn cầu!"

hashtag

Skills Required

  • 4+ years MLOps/DevOps/SRE; 2+ years CI/CD for GPU or ML workloads
  • Deep GitHub Actions expertise (workflows, composite actions, self-hosted runners at scale)
  • Python packaging & PyPI release flow (including wheels with native extensions)
  • Docker multi-stage/multi-arch
  • NVIDIA Container Toolkit
  • Terraform or Ansible for cloud GPU infrastructure
  • Proven track record building fast, non-flaky CI with clear failures
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
0 Employees
Year Founded: 2025

What We Do

Tensormesh is an AI infrastructure optimization company that provides distributed AI compute infrastructure, including GPU clusters and inference optimization platforms, to reduce GPU costs and latency.

Similar Jobs

Mondelēz International Logo Mondelēz International

Analytics Manager

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
4 Locations
90000 Employees

Mondelēz International Logo Mondelēz International

Analyst, Analytics - 6 months contract

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
4 Locations
90000 Employees

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Quality Auditor, Footwear

eCommerce • Fashion • Retail • Sales • Wearables • Design
Remote or Hybrid
Haiphong, VNM
16000 Employees

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Sr. Analyst, Costing

eCommerce • Fashion • Retail • Sales • Wearables • Design
Remote or Hybrid
Haiphong, VNM
16000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account