Senior ML Inference Platform Engineer

Reposted 2 Days Ago
Be an Early Applicant
2 Locations
In-Office
Senior level
Artificial Intelligence • Information Technology • Software
The Role
The role involves building and optimizing high-performance ML inference systems, designing evaluation metrics, and improving GPU performance through various techniques.
Summary Generated by Built In
About AION

AION is building the next generation of AI cloud platform by transforming the future of high-performance computing (HPC) through its decentralized AI cloud. Purpose-built for bare-metal performance, AION democratizes access to compute power for AI training, fine-tuning, inference, data labeling, and full stack AI/ML lifecycle.

Led by high-pedigree founders with previous exits, AION is well-funded by major VCs with strategic global partnerships. Headquartered in the US with global presence, the company is building its initial core team across India, London and Seattle. 

Who You Are

You're an ML systems engineer who's passionate about building high-performance inference infrastructure. You don't need to be an expert in everything - this field is evolving too rapidly for that - but you have strong fundamentals and the curiosity to dive deep into optimization challenges. You thrive in early-stage environments where you'll learn cutting-edge techniques while building production systems. You think systematically about performance bottlenecks and are excited to push the boundaries of what's possible in AI infrastructure.


RequirementsKey Responsibilities
  • Build and optimize LLM inference systems working towards 2-4x performance improvements over standard frameworks like vLLM and TensorRT-LLM.
  • Implement modern inference optimizations including KV-cache management, dynamic batching, speculative decoding, compression and quantization strategies.
  • Develop GPU optimization solutions using CUDA, with opportunities to learn advanced techniques like Triton kernel development and CUDA graphs.
  • Design model evaluation and benchmarking systems to assess performance across reasoning, coding, and safety metrics.
  • Research and integrate trending open-source models (DeepSeek R1, Qwen 3, Llama 4, Mistral variants) with optimized configurations.
  • Build performance monitoring and profiling tools for GPU cluster analysis, bottleneck identification, and cost optimization.
  • Create cost-performance optimization strategies that balance throughput, latency, and infrastructure costs.
  • Explore agent orchestration capabilities for multi-step reasoning and tool integration workflows.
  • Collaborate with tech and product teams to identify optimization opportunities and translate them into production improvements.
Skills & Experience
  • High agency individual looking to own and influence product architecture and company direction
  • 3+ years of software engineering experience with focus on performance-critical systems and production deployments.
  • Strong Python expertise and working knowledge of C++ for performance optimization.
  • Working understanding of deep learning fundamentals including transformer architectures, attention mechanisms, and neural network training/inference.
  • Hands-on experience of model serving and deployment techniques.
  • Experience with at least one modern inference framework (vLLM, TensorRT-LLM, SGLang or similar) in a production setting.
  • Hands-on experience with PyTorch including model development, training loops, and basic distributed computing concepts.
  • Understanding of distributed systems concepts including load balancing, auto-scaling, and fault tolerance.
  • Basic GPU programming experience with CUDA or willingness to quickly learn GPU optimization techniques.
  • Strong debugging and performance profiling skills for identifying and resolving system bottlenecks.

Benefits
  • Join the ground floor of a mission-driven AI startup revolutionizing compute infrastructure.
  • Work with a high-caliber, globally distributed team backed by major VCs.
  • Competitive compensation and benefits.
  • Fast-paced, flexible work environment with room for ownership and impact.
  • Hybrid model: 3 days in-office, 2 days remote with flexibility to work remotely for part of the year.

In case you got any questions about the role please reach out to hiring manager on linkedin or X.

Top Skills

C++
Cuda
Python
PyTorch
Tensorrt-Llm
Vllm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
21 Employees
Year Founded: 2023

What We Do

Everyday AI Platform: aion collapses the entire ai development lifecycle into a single, unified workspace. From data to deployment - everything at your fingertips. aion simplifies AI infrastructure the way Stripe simplified payments:

Plug-and-Play Multi-Provider Access
Customer Infrastructure Management
Deploy and optimize AI infrastructure via prompts with integrated cost tracking and performance analytics
Partner Sales & Resource Optimization

Track opportunities with confidential pricing, manage real-time inventory allocation, and monitor profitability from aion workloads

Similar Jobs

Samsara Logo Samsara

Sales Operations Manager

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Hybrid
Seattle, WA, USA
89K-135K Annually

Samsara Logo Samsara

Sales Operations Analyst

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Hybrid
Seattle, WA, USA
66K-99K Annually

PwC Logo PwC

UI/UX Senior Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
Seattle, WA, USA
124K-280K Annually
Easy Apply
Remote or Hybrid
6 Locations
179K-263K

Similar Companies Hiring

Sailor Health Thumbnail
Telehealth • Software • Social Impact • Healthtech
New York City, NY
20 Employees
Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account