Lemurian Labs

Runtime Engineer

Reposted 17 Days Ago

Be an Early Applicant

8 Locations

Remote or Hybrid

Mid level

Artificial Intelligence • Machine Learning • Software

The Role

Design and develop a multi-target runtime, optimize kernels, analyze compiler outputs, and improve architecture based on ML engineers' needs.

Summary Generated by Built In

At Lemurian Labs, we’re on a mission to bring the power of AI to everyone—without leaving a massive environmental footprint. We care deeply about the impact AI has on our society and planet, and we’re building a rock-solid foundation for its future, ensuring AI grows sustainably and responsibly. Because let’s face it, what good is innovation if it doesn’t help the world?

We are building a high-performance, portable compiler that lets developers “build once, deploy anywhere.” Yes, anywhere. We’re talking about seamless cross-platform compatibility, so you can train your models in the cloud, deploy them to the edge, and everything in between—all while optimizing for resource efficiency and scalability.

If the idea of sustainably scaling AI motivates you and you’re excited about making AI development both powerful and accessible, then we’d love to have you. Join us at Lemurian Labs, where you can have fun building the future—without leaving a mess behind.

Key Duties

Design, develop, maintain and improve our multi-target runtime
Use the latest techniques in parallelization and partitioning to automate generation and exploit highly optimized kernels
Rapid prototyping and data driven exploration of new ideas
Benchmark and analyze the outputs produced by our optimizing compiler on target hardware
Work closely with our product team to understand the evolving needs of ML engineers and drive improvements in architecture
Build tools to collect and analyze performance bottlenecks

Essential Skills and Experience

A deep understanding of asynchronous, concurrent programming.
4+ years of experience with C/C++ (C++14 or newer).
An understanding of HW architecture (vector vs scalar registers and instructions, memory hierarchies).
Knowledge of operating system kernel development or hypervisor development.

Preferred Skills and Experience

Experience developing or maintaining libraries like CUDA or ROCm.
Experience with GPU programming.
Experience with high performance computing (HPC).
Masters or PhD degree in computer science, or equivalent practical experience.
Knowledge of DL frameworks such as PyTorch, JAX or Triton.
Experience with programming large compute clusters.

Salary depends on experience and geographical location.

This salary range may be inclusive of several career levels and will be narrowed during the interview process based on a number of factors, such as candidate’s experience, knowledge, skills and abilities, as well as internal equity among our team.

Additional benefits for this role may include: equity, company bonus opportunities; medical, dental, and vision benefits; retirement savings plan; and supplemental wellness benefits.

Lemurian Labs ensures equal employment opportunity without discrimination or harassment based on race, color, religion, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity or expression, age, disability, national origin, marital or domestic/civil partnership status, genetic information, citizenship status, veteran status, or any other characteristic protected by law.

EOE

Top Skills

C/C++

Cuda

Gpu

High Performance Computing

Jax

PyTorch

Rocm

Triton

View all jobs at Lemurian Labs

View Lemurian Labs Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

Menlo Park, California

33 Employees

Year Founded: 2018

What We Do

At Lemurian Labs our focus is on unleashing the capabilities of AI for the benefit of humanity. To fulfill this purpose we are developing a full stack solution consisting of software and hardware that is capable of orders of magnitude better performance and efficiency than legacy solutions, while being designed for scalability. There are massive shifts underway moving us from Software 1.0 to Software 2.0 to Software 3.0 and onwards, but to realize its true benefits we need fundamentally new hardware and systems that can keep up with the changing compute demands and simultaneously bringing down costs. We are developing software and hardware designed from first principles to deliver unprecedented realizable performance/watt and enable the next generation of AI workloads. Our diverse team of technologists have decades of experience at the frontiers of high performance computing, digital arithmetic, cryptography, artificial intelligence, robotics, and networking. There is a lot of talk about what the technology of tomorrow will look like and there are a number of companies developing it. At Lemurian, we believe tomorrow is so yesterday. We are developing the technology for the day after tomorrow. We are Lemurian Labs. Welcome to the future of artificial intelligence and computing.