Your work will directly determine whether models fit within hardware constraints and achieve real-time performance.
- Design and implement compiler architecture across:
- IR design and transformation pipelines
- Optimization strategies for scheduling, tiling, and memory reuse
- Lead development of complex compiler passes such as:
- Global memory allocation and liveness-driven reuse
- Cross-operator fusion and graph partitioning
- Hardware-aware scheduling strategies
- Develop cost models and optimization heuristics
- Explore advanced techniques:
- Constraint-based optimization (e.g., ILP/MILP/CP)
- Scheduling optimization
- Drive debugging of system-level issues (correctness, performance, HW mismatches)
- Collaborate with hardware teams on co-design of abstractions and execution models
Requirements
- 4+ years of experience in compilers, systems, or performance engineering
- Masters or PhD in Computer Science, Electrical Engineering, Math, or a related field
- Deep experience with at least one:
- ML compiler frameworks (MLIR, TVM, XLA, etc.)
- Low-level optimization (scheduling, memory, tiling)
- Proven ability to design non-trivial compiler systems or passes
- Strong intuition for performance across compute, memory, and data movement
- Comfort working with hardware constraints
- Experience with constraint solvers (MILP, ILP, CP)
- Background in accelerator architectures or embedded systems
- Experience optimizing ML workloads for latency/power
- Familiarity with DSP or real-time signal processing
Benefits
- 401(k)
- Medical insurance
- Vision insurance
- Dental insurance
- Commuter benefits
- Disability insurance
- Paid maternity leave
- Paid paternity leave
- Child care support
femtoAI is an equal opportunity employer committed to a diverse workforce which strives to create an inclusive working environment empowering everyone to do their best work. We do not discriminate on the basis of race, ethnicity, religion, gender, gender identity, sexual orientation, age, marital status, veteran status, or disability status.
Skills Required
- 4+ years of experience in compilers, systems, or performance engineering
- Masters or PhD in Computer Science, Electrical Engineering, Math, or a related field
- Deep experience with at least one ML compiler framework (MLIR, TVM, XLA, etc.)
- Proven ability to design non-trivial compiler systems or passes
- Strong intuition for performance across compute, memory, and data movement
What We Do
Headquartered in Silicon Valley, femtoAI—formerly known as Femtosense—was founded in 2018 by researchers from the Brains in Silicon Lab at Stanford University. Our technology takes inspiration from the principles of neuromorphic computing such as sparsity to empower intelligence in everyday devices. We pioneered a high-performance AI accelerator integrated with an end-to-end embedded AI platform, enabling low-latency operation with less energy at a fraction of the cost. From wearables and household appliances to robotics and autonomous vehicles, femtoAI brings the power of AI to everyday devices.








