Backend Engineer – Inference Optimization

Posted 24 Days Ago
Be an Early Applicant
Seattle, WA
In-Office
150K-250K Annually
Senior level
Artificial Intelligence • Software
Building the AI Computer Interface of the Future
The Role
As a Backend Engineer, you'll optimize inference pipelines for large-scale AI models, focusing on improving performance and efficiency through advanced techniques and collaboration with researchers and engineers.
Summary Generated by Built In

About Us

We're a high-energy, impact-driven team, with a long track record of academic excellence. Our team includes researchers whose work has shaped the field—earning best paper awards at top AI conferences and even ranking among the most cited scientists in history of science. We've built fundamental, transformative research that has redefined the community, and now, we're here to change the world—one breakthrough at a time.

What We're Looking For & Why Join Us

We’re looking for a Backend Engineer – Inference Optimization who thrives on solving some of the hardest systems problems in AI. You’ll focus on pushing the limits of foundation model inference performance, working at the intersection of cutting-edge ML and high-performance systems engineering. This is your opportunity to set new benchmarks for latency, throughput, and efficiency at scale.

What is this role?

As a Backend Engineer, you’ll own the design and optimization of inference pipelines for large-scale models. You’ll work closely with researchers and infrastructure engineers to identify bottlenecks, implement advanced techniques like quantization and KV caching, and deploy high-performance serving systems in production. Your work will directly determine how fast and cost-effectively users can access next-generation AI.

What do we expect?

Must have:

  • Deep experience in optimizing model inference pipelines, model quantization and KV caching.

  • Proficiency in backend systems and high-performance programming (Python, C++, or Rust)

  • Familiarity with distributed serving, GPU acceleration, and large-scale systems

  • Ability to debug complex performance issues across model, runtime, and hardware layers

  • Comfort working in fast-moving environments with ambitious technical goals

Nice to have:

  • Hands-on experience with vLLM or similar inference frameworks

  • Background in GPU kernel optimization (CUDA, Triton, ROCm)

  • Experience scaling inference across multi-node or heterogeneous clusters

  • Prior work in model compilation (e.g., TensorRT, TVM, ONNX Runtime)

  • Hands-on experience with model quantization

Compensation & Benefits

$150K – $250K + Equity

We offer health benefits, a 401(k) plan, and meaningful equity—because we believe top talent should be supported, secure, and fully invested in the future we’re building together.

Location: Our company is in-office at our Seattle HQ.

Top Skills

C++
Cuda
Onnx Runtime
Python
Rocm
Rust
Tensorrt
Triton
Tvm
Vllm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Seattle, Washington
17 Employees
Year Founded: 2024

What We Do

Vercept is reimagining how people interact with computers. Our product, Vy, lets you command your machine with natural language — navigating apps, content, and context like never before.

We’re building AI that understands your screen, not just your words.
vercept.com

Similar Jobs

Wells Fargo Logo Wells Fargo

Consultant

Fintech • Financial Services
Hybrid
Vancouver, WA, USA
213000 Employees
17-17 Hourly

Wells Fargo Logo Wells Fargo

Private Mortgage Banker

Fintech • Financial Services
Hybrid
Bellevue, WA, USA
213000 Employees
20-20 Hourly

Wells Fargo Logo Wells Fargo

Consultant

Fintech • Financial Services
Hybrid
Bellevue, WA, USA
213000 Employees
20-20 Hourly

Wells Fargo Logo Wells Fargo

Teller Part Time Mercer Island

Fintech • Financial Services
Hybrid
Mercer Island, WA, USA
213000 Employees
22-28 Hourly

Similar Companies Hiring

PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account