Modal provides the infrastructure foundation for AI teams. With instant GPU access, sub-second container startups, and native storage, Modal makes it simple to train models, run batch jobs, and serve low-latency inference. Companies like Suno, Lovable, and Substack rely on Modal to move from prototype to production without the burden of managing infrastructure.
We're a fast-growing team based out of NYC, SF, and Stockholm. We've hit high 8-figure ARR and recently raised a Series B at a $1.1B valuation. We have thousands of customers who rely on us for production AI workloads, including Lovable, Scale AI, Substack, and Suno.
Working at Modal means joining one of the fastest-growing AI infrastructure organizations at an early stage, with many opportunities to grow within the company. Our team includes creators of popular open-source projects (e.g. Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.
The Role:Modal is looking for a machine learning engineer to help us with some of the most challenging AI/ML problems our customers are facing. As a Forward Deployed ML Engineer, you will:
Help our customers architect and build complex AI applications on Modal
Work with companies developing some of the most cutting edge AI use cases in the world, such as Lovable, Suno and Mistral AI
Optimize performance for open-source models and frameworks
Write examples and build demos that showcase Modal
Contribute to the core Modal stack
Help our community build cool stuff on top of Modal
We are looking for someone with these skills:
At least a few years of professional ML engineering experience (or solutions engineering or similar)
Clear communicator who can make complex ideas easy to understand
Solid business sense and ability to build trust and strong working relationships
Willing to work in-person
Top Skills
What We Do
Deploy generative AI models, large-scale batch jobs, job queues, and more on Modal's platform. We help data science and machine learning teams accelerate development, reduce costs, and effortlessly scale workloads across thousands of CPUs and GPUs.
Our pay-per-use model ensures you're billed only for actual compute time, down to the CPU cycle. No more wasted resources or idle costs—just efficient, scalable computing power when you need it.