Exa is building a search engine from scratch to serve every AI application. We build massive-scale infrastructure to crawl the web, train state-of-the-art embedding models to index it, and develop super high performant vector databases in rust to search over it. We also own a $5m H200 GPU cluster and routinely run batchjobs with 10s of thousands of machines. This isn't your average startup :)
On the ML team, we train foundational models for search. Our goal is to build systems that can instantly filter the world's knowledge to exactly what you want, no matter how complex your query. Basically, put the web into an extremely powerful database.
We're looking for an ML evals engineer to design and build our eval stack at Exa. The role involves investigating how to evaluate search engines in an LLM world and then building the most comprehensive, creative, and effective eval suite. You will be deciding the future of search through the evals we choose to optimize for.
Desired Experience
You have some ML experience
You have strong engineering experience
You like creating evaluation datasets and diving deeply into the data
You care deeply about the problem of search and want to create an eval suite that helps us get as perfect a search engine as we can
Example Projects
Write a manifesto of what perfect search means
Identify the biggest problems in our search and make an eval for those problems
Think of creative ways to gather evaluation data
This is an in-person opportunity in San Francisco. We're happy to sponsor international candidates (e.g., STEM OPT, OPT, H1B, O1, E3).
Top Skills
What We Do
Exa was built with a simple goal — to organize all knowledge. After several years of heads-down research, we developed novel representation learning techniques and crawling infrastructure so that LLMs can intelligently find relevant information.








