Senior ML Performance Engineer

Posted 17 Days Ago
8 Locations
Hybrid
Senior level
Artificial Intelligence • Machine Learning • Software
The Role
The Senior ML Performance Engineer will architect a Performance Testing Platform, optimize ML performance on GPUs, and validate model efficiency.
Summary Generated by Built In
About Us

At Lemurian Labs, we're on a mission to bring the power of AI to everyone—without leaving a massive environmental footprint. We care deeply about the impact AI has on our society and planet, and we're building a solid foundation for its future, ensuring AI grows sustainably and responsibly. Innovation should help the world, not harm it.

We are building a high-performance, portable compiler that lets developers "build once, deploy anywhere." Yes, anywhere. We're talking about seamless cross-platform compatibility, so you can train your models in the cloud, deploy them to the edge, and everything in between—all while optimizing for resource efficiency and scalability.

If the idea of sustainably scaling AI motivates you and you're excited about making AI development both powerful and accessible, then we'd love to have you. Join us at Lemurian Labs, where you can have fun building the future—without leaving a mess behind.

The Role

We're looking for a Senior ML Performance Engineer to architect and lead our Performance Testing Platform from the ground up. You'll be the technical authority on how we measure, validate, and optimize the performance of large language models (Llama 3.2 70B, DeepSeek, and others) before and after compiler optimization on modern GPU architectures.

This is a high-impact role where you'll directly influence our product quality and our customers' success. You'll work at the intersection of ML systems, GPU architecture, and performance engineering—building the infrastructure that proves our compiler delivers real value.

What You'll Do

  • Design and build a comprehensive performance testing platform for evaluating LLM inference workloads across GPU clusters
  • Define and implement the benchmarking methodology, metrics, and test suites that measure latency, throughput, memory utilization, power consumption, and model accuracy
  • Establish baseline performance for unoptimized models (Llama 3.2 70B, DeepSeek, etc.) and validate post-optimization improvements
  • Develop automated testing pipelines for continuous performance validation across compiler releases and model updates
  • Investigate performance bottlenecks using profiling tools (ROCm profilers, GPU traces, system-level monitoring) and work with the compiler team to drive optimizations
  • Create dashboards and reporting that provide clear visibility into performance trends, regressions, and wins
  • Collaborate cross-functionally with compiler engineers, ML engineers, and DevOps to ensure performance testing is integrated into our development workflow
  • Document best practices for performance testing and optimization of ML workloads on GPU hardware

What You'll Bring

  • 7+ years of experience in performance engineering, benchmarking, or systems engineering roles
  • Deep understanding of ML inference workloads, particularly transformer-based models and LLMs
  • Hands-on experience with GPU programming and optimization (CUDA, ROCm, or similar)
  • Strong programming skills in Python and C/C++
  • Proven track record of building performance testing infrastructure or benchmarking platforms from scratch
  • Experience with ML frameworks (PyTorch, TensorFlow, ONNX Runtime, vLLM, TensorRT-LLM, etc.)
  • Proficiency with profiling and debugging tools for GPU workloads
  • Strong analytical skills with the ability to design experiments, analyze results, and communicate findings clearly
  • Experience with CI/CD systems and test automation frameworks

Nice to Have

  • Experience with AMD GPUs (Mi200/Mi300 series) and ROCm ecosystem
  • Knowledge of compiler optimization techniques and their impact on performance
  • Experience with distributed inference and multi-GPU workloads
  • Familiarity with ML model quantization, pruning, and other optimization techniques
  • Background in high-performance computing or systems-level optimization
  • Experience with infrastructure-as-code (Kubernetes, Docker, Terraform)
  • Contributions to open-source ML or systems projects

Personal Attributes

  • Obsessive about details — you notice the 2% regression that others miss
  • Self-driven — you take ownership and don't wait for permission to solve problems
  • Collaborative mindset — you work well across teams and help others succeed
  • Passionate about sustainability — you care about making AI more efficient and environmentally responsible
  • Clear communicator — you can explain complex technical concepts to both engineers and stakeholders

Salary depends on experience and geographical location. 

This salary range may be inclusive of several career levels and will be narrowed during the interview process based on a number of factors, such as the candidate’s experience, knowledge, skills, and abilities, as well as internal equity among our team. 

Additional benefits for this role may include: equity, company bonus opportunities, medical, dental, and vision benefits; retirement savings plan; and supplemental wellness benefits.

Lemurian Labs ensures equal employment opportunity without discrimination or harassment based on race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity or expression, age, disability, national origin, marital or domestic/civil partnership status, genetic information, citizenship status, veteran status, or any other characteristic protected by law.

EOE

Top Skills

AI
C++
Ci/Cd
Cuda
Docker
Gpu Programming
Kubernetes
Ml
Onnx Runtime
Python
PyTorch
Rocm
TensorFlow
Tensorrt-Llm
Terraform
Vllm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Menlo Park, California
33 Employees
Year Founded: 2018

What We Do

At Lemurian Labs our focus is on unleashing the capabilities of AI for the benefit of humanity. To fulfill this purpose we are developing a full stack solution consisting of software and hardware that is capable of orders of magnitude better performance and efficiency than legacy solutions, while being designed for scalability. There are massive shifts underway moving us from Software 1.0 to Software 2.0 to Software 3.0 and onwards, but to realize its true benefits we need fundamentally new hardware and systems that can keep up with the changing compute demands and simultaneously bringing down costs. We are developing software and hardware designed from first principles to deliver unprecedented realizable performance/watt and enable the next generation of AI workloads. Our diverse team of technologists have decades of experience at the frontiers of high performance computing, digital arithmetic, cryptography, artificial intelligence, robotics, and networking. There is a lot of talk about what the technology of tomorrow will look like and there are a number of companies developing it. At Lemurian, we believe tomorrow is so yesterday. We are developing the technology for the day after tomorrow. We are Lemurian Labs. Welcome to the future of artificial intelligence and computing.

Similar Jobs

Samsara Logo Samsara

Director, Sales Commissions

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
Canada
4000 Employees
154K-211K Annually

Block Logo Block

Visual Designer

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office
8 Locations
12000 Employees
177K-312K Annually

Block Logo Block

Senior Ios Engineer

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office
Toronto, ON, CAN
12000 Employees
184K-276K Annually

Block Logo Block

Head of KYC Modeling

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
In-Office
8 Locations
12000 Employees
351K-351K Annually

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account