AI Researcher — Inference Optimization

Reposted 24 Days Ago
Be an Early Applicant
Hiring Remotely in World Golf Village, FL, USA
In-Office or Remote
Senior level
Artificial Intelligence • Information Technology • Software
The Role
As an AI Researcher, you will optimize inference performance for large-scale machine learning models by improving latency, throughput, and cost efficiency through various model and systems-level optimizations.
Summary Generated by Built In
Role Overview

We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models. You will work at the intersection of model architecture, systems engineering, and hardware-aware optimization, improving latency, throughput, and cost efficiency across real-world production environments.

Key Responsibilities
  • Research and develop techniques to optimize inference performance for large neural networks.

  • Improve latency, throughput, memory efficiency, and cost per inference.

  • Design and evaluate model-level optimizations (quantization, pruning, KV-cache optimization, architecture-aware simplifications).

  • Implement systems-level optimizations (dynamic batching, kernel fusion, multi-GPU inference, prefill vs decode optimization).

  • Benchmark inference workloads across hardware accelerators.

  • Collaborate with engineering teams to deploy optimized inference pipelines.

  • Translate research insights into production-ready improvements.

Required Qualifications
  • Strong background in machine learning, deep learning, or AI systems.

  • Hands-on experience optimizing inference for large-scale models.

  • Proficiency in Python and modern ML frameworks (e.g., PyTorch).

  • Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime).

  • Ability to design experiments and communicate results clearly.

Preferred / Nice-to-Have Qualifications
  • Experience deploying production inference systems at scale.

  • Familiarity with distributed and multi-GPU inference.

  • Experience contributing to open-source ML or inference frameworks.

  • Authorship or co-authorship of peer-reviewed research papers in machine learning, systems, or related fields.

  • Experience working close to hardware (CUDA, ROCm, profiling tools).

What Success Looks Like
  • Measurable gains in latency, throughput, and cost efficiency.

  • Optimized inference systems running reliably in production.

  • Research ideas successfully translated into deployable systems.

  • Clear benchmarks and documentation that inform product decisions.

Relevant Research Areas (Bonus)
  • Long-context inference optimization

  • Speculative decoding

  • KV-cache compression and paging

  • Efficient decoding strategies

  • Hardware-aware inference design

Skills Required

  • Strong background in machine learning, deep learning, or AI systems
  • Hands-on experience optimizing inference for large-scale models
  • Proficiency in Python and modern ML frameworks (e.g., PyTorch)
  • Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime)
  • Ability to design experiments and communicate results clearly
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
20 Employees
Year Founded: 2023

What We Do

We enable serverless inference via our GPU orchestration and model load-balancing system. We unlock fine-tuning by enabling organizations to size their server fleet to throughput needs, not number of models in the catalogue. See it in action on our public cloud, which offers inference for 10k+ open weight models.

Similar Jobs

Remote
26 Locations
393 Employees
179K-179K Annually

Immersive Logo Immersive

Senior Manager, Cyber Resilience Team

Enterprise Web • HR Tech • Information Technology • Software • Cybersecurity
Remote or Hybrid
2 Locations
330 Employees
144K-207K Annually

Scrunch  Logo Scrunch

Head of Legal

Artificial Intelligence • Information Technology • Marketing Tech • Software • SEO
Remote
USA
180K-220K Annually

TechTorch Logo TechTorch

Deliver Manager (CPQ / RevOps / Lead-to-Cash)

Artificial Intelligence • Information Technology
Remote
USA
97 Employees
160K-180K Annually

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account