Software Engineer, ML Infrastructure

Reposted 2 Days Ago
Sunnyvale, CA, USA
In-Office
180K-250K Annually
Senior level
Artificial Intelligence • Machine Learning • Software • Cybersecurity
The Role
The role involves deploying and optimizing LLMs, designing the ML serving stack, and ensuring high-performance GPU services for production readiness.
Summary Generated by Built In
Role Overview

We are hiring a Founding ML Infrastructure Engineer to own the end-to-end deployment, optimization, and operation of our suits of models in production.
This is a core founding role focused on building and operating production-grade LLM systems. You will apply deep knowledge of model internals to deploy, optimize, and run modern LLMs at scale, owning performance end-to-end across latency, throughput, and reliability.

You will design and operate the full ML serving stack from model artifacts to GPU execution, and work closely with Product and ML teams to ensure our models can support high QPS, strict SLAs, and production correctness.
This role is ideal for someone who deeply understands how LLMs work internally, but chooses to specialize in making them fast, stable, and production-ready.

About Realm Labs

Realm Labs is an AI trust and security startup. We help enterprises detect, debug, and prevent AI’s misbehaviors in production. We are backed by top VCs and serve some of the most iconic global enterprises.

Key Responsibilities

  • Own the end-to-end LLM inference stack, including:
    • Model loading and execution
    • GPU utilization and memory efficiency
    • Runtime performance tuning
    • Production deployment and scaling
  • Design and operate high-performance LLM serving systems using technologies such as:
    • vLLM, TensorRT / TensorRT-LLM, Triton Inference Server, SGLang
  • Optimize inference across:
    • Latency
    • Throughput (QPS)
    • GPU memory footprint
    • Cost efficiency
  • Work hands-on with PyTorch and TensorFlow models, including:
    • Model graph understanding
    • Attention mechanisms, KV cache behavior, batching strategies
    • Precision tradeoffs (FP16, BF16, INT8, etc.)
  • Build and maintain production-grade GPU services:
    • Multi-model serving
    • Autoscaling strategies
    • Fault isolation and graceful degradation
  • Collaborate with application and platform teams to:
    • Define serving APIs
    • Ensure correctness and safety of outputs
    • Debug production issues end-to-end
  • Build a reproducible model training and versioning system for customer deployments
  • Establish best practices for:
    • Model versioning
    • Rollouts and rollbacks
    • Performance benchmarking
    • Production validation

Expected Qualifications

  • 5+ years of professional experience in ML infrastructure, systems engineering, or production ML roles.
  • Strong software engineering fundamentals; ability to write robust, maintainable production code.
  • Deep hands-on experience with LLM inference infrastructure, including:
    • PyTorch (required)
    • TensorFlow (working knowledge)
  • Proven experience with GPU inference optimization, including:
    • TensorRT / TensorRT-LLM
    • vLLM
    • Triton Inference Server
    • SGLang or similar serving runtimes
  • Strong understanding of LLM internals, such as:
    • Transformer architectures
    • Attention and KV caching
    • Batching, streaming, and token-level generation
  • Experience running ML systems in production with high traffic and SLAs
  • Comfortable working in Linux-based, cloud production environments

Preferred Qualifications

  • Experience deploying LLMs on Kubernetes and GPU clusters.
  • Familiarity with CUDA, NCCL, or low-level GPU performance concepts.
  • Experience with:
    • Model sharding and parallelism strategies
    • Multi-GPU inference
    • Streaming inference systems
  • Knowledge of observability for ML systems (metrics, latency breakdowns, GPU monitoring).
  • Experience working at startups or owning systems with minimal abstraction layers.

Additional Information

  • This is a founding, high-ownership role with direct impact on core product capabilities.
  • You will be expected to build, run, and own systems end-to-end.
  • The role may include limited on-call responsibilities aligned with production ownership.

Compensation & Benefits

  • Market aligned compensation and benefits
  • Founding engineer equity (Equity is a significant component of this role and will be discussed)
  • Medical, Dental, Vision, Life insurance, 401-K, In-office lunch etc.

Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and candidate. But if we make you an offer, we will make all reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

Compensation
The base pay range for this role is $180,000 – $250,000 per year.

Skills Required

  • 5+ years of professional experience in ML infrastructure or systems engineering
  • Deep hands-on experience with LLM inference infrastructure including PyTorch
  • Proven experience with GPU inference optimization
  • Experience running ML systems in production with high traffic
  • Strong software engineering fundamentals
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
7 Employees
Year Founded: 2023

What We Do

Realm Labs is an AI security company focused on helping enterprises secure and monitor AI applications and data. It develops an AI-based authorization platform designed to prevent data leaks and enable secure generative AI usage with features like agent security and guardrails.

Similar Jobs

Whatnot Logo Whatnot

Software Engineer

eCommerce • Mobile • Retail
In-Office
4 Locations
1200 Employees
190K-300K Annually
In-Office
2 Locations
2359 Employees
170K-216K Annually

Voxel Logo Voxel

Staff Software Engineer

Artificial Intelligence • Security • Software
Hybrid
San Francisco, CA, USA
62 Employees
220K-260K Annually

Nuro Logo Nuro

Software Engineer

Artificial Intelligence • Automotive • Information Technology • Robotics
In-Office
Mountain View, CA, USA
908 Employees
160K-241K Annually

Similar Companies Hiring

Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account