Role Description:
We are looking for an experienced, highly skilled and motivated Senior Compiler Engineer to join our compiler team. In this role, you will be responsible for designing, developing, and maintaining critical components of our AI-driven compilation stack. You will work on PyTorch, Triton, LLVM, and MLIR to build robust, scalable, and high-performance solutions for diverse
hardware backends. This is a hands-on technical position where you will solve complex problems, optimize for performance, and contribute to the next generation of our technology.
What You’ll Do:
- Design, Implement, and Optimize Compiler Components: Architect and develop critical compiler modules to efficiently translate and optimize AI models for deployment across a variety of hardware platforms, including CPUs, GPUs, and emerging custom accelerators
- Enhance and unify PyTorch Inductor, Triton, LLVM, and MLIR toolchains to support cutting-edge architectures, facilitate seamless interoperability, and enable rapid experimentation with new compiler features
- Create and maintain custom IRs, code generation passes, and optimization strategies tailored for AI workloads, focusing on both general and domain-specific improvements
- Profile and tune computational kernels—such as linear algebra operations, matrix multiplications, and elementwise computations—to achieve optimal performance, scalability, and resource efficiency on diverse hardware
- Open-Source Engagement: Actively contribute to the LLVM, MLIR, Triton, and PyTorch open-source projects, sharing improvements, collaborating with the developer community, and driving the evolution of the AI compiler ecosystem
What We’re Looking For:
- Bachelor’s or Master’s in Computer Science, Computer Engineering, or a related field from a recognized university
- 7+ years experience in compiler engineering or closely related fields
- Deep knowledge of LLVM and MLIR internals, including IR transformations, code generation, and backend optimization techniques
- Experience with PyTorch (Inductor/Dynamo) and Triton, including their compiler subsystems
- Proven experience and expertise in C/C++ programming
- Demonstrated expertise in performance optimization, including vectorization, parallelization, and hardware-specific tuning
- Advanced debugging, analytical, and system-level thinking skills
- Excellent communication skills with a strong track record of cross-functional collaboration
Ways to Stand Out from the Crowd:
- Strong understanding of AI models — both training and inference pipelines
- Experience developing compiler support for custom hardware accelerators, including ASICs, FPGAs, or novel AI chips
- Active contributions to open-source compiler frameworks, demonstrating leadership and community involvement
- Familiarity with distributed training strategies, graph compilers, and advanced memory models
Skills Required
- Bachelor's or Master's in Computer Science, Computer Engineering, or related field
- 7+ years experience in compiler engineering or closely related fields
- Deep knowledge of LLVM and MLIR internals, including IR transformations and code generation
- Experience with PyTorch (Inductor/Dynamo) and Triton compiler subsystems
- Proven experience and expertise in C/C++ programming
- Demonstrated expertise in performance optimization (vectorization, parallelization, hardware-specific tuning)
- Advanced debugging, analytical, and system-level thinking skills
- Excellent communication skills and cross-functional collaboration
- Strong understanding of AI models (training and inference pipelines)
- Experience developing compiler support for custom hardware accelerators (ASICs, FPGAs)
- Active contributions to open-source compiler frameworks
- Familiarity with distributed training strategies, graph compilers, and advanced memory models
What We Do
Majestic Labs is reimagining AI infrastructure for the world’s most demanding workloads. Today, organizations are forced to overprovision expensive compute just to access the required memory their models need. We took a fundamentally different approach by pairing a massive amount of compute with 1000x the memory to create game changing improvements in performance, power and deployment efficiency. Our customers can literally replace racks of traditional AI infrastructure with a single Majestic server.






