- Steer LLMs to become strong evaluators aligned with human preferences using advanced post-training techniques.
- Lead and empower a world-class team of researchers and engineers, setting a high bar of excellence that propels Atla forward.
- Define and execute an ambitious research agenda that advances Atla's position as a leader in language model evaluation.
- Develop comprehensive evaluation frameworks, including tooling, datasets and metrics for rigorous assessment of alignment and safety risks.
- Contribute significant findings to leading AI safety conferences and journals.
- Track record of 5+ years in pioneering AI research, with significant contributions to the field of LLMs, evidenced by publications in top-tier conferences and journals.
- Proven experience in defining and executing research agendas, demonstrating the ability to guide and align a team toward achieving ambitious research goals.
- Demonstrated success leading teams of researchers.
- Deep expertise in training and evaluating language models across GPUs, preferably in PyTorch.
- Experience at elite AI research lab (OpenAI, DeepMind, Meta, Anthropic, Cohere, etc.).
- Experience at a fast growing startup.
- Strong software, ML engineering expertise with a focus on building robust, scalable system.
- Create real value: Every action should deliver tangible, meaningful value for the people who use what we build.
- Drive to completion: Do the second 90%.
- Do fewer things, better: Prioritize focus over breadth.
- Collaborate for excellence: The whole is greater than the sum of its parts.
- Seek truth: Let the best ideas win, no matter where they come from, and let go of ego.
- Argue passionately, then commit fully: Debate fiercely, but once a decision is made, own it like it’s yours.
- Advance AI safety: Every action should contribute towards the safe development of AI.
- Go big or go home: “The people who are crazy enough to think they can change the world are the ones who do.”
- £200K - £300K
- Significant stake in equity as one of our core technical leaders
- Pension plan with employer contributions
- Medical, dental, and vision benefits
Top Skills
What We Do
Atla is the eval and improvement platform for AI agents. We help teams find and fix agent failures—fast. As agents grow more complex, debugging and improving them has become a significant challenge. Atla brings clarity by tracing every step, surfacing error patterns across runs, and delivering specific suggestions to improve agent performance. With real-time monitoring, automated error detection, and tools for prompt experimentation, Atla gives teams the visibility and control needed to confidently ship agentic systems that work. We’re a team of researchers, engineers, entrepreneurs and operational leaders. Our expertise in evals was honed through training our own purpose-built LLM Judges, Selene and Selene Mini, which are available open-source and have been downloaded 40,000+ times. Atla is backed by Y Combinator, Creandum, and the founders of Reddit, Cruise, Rappi, Instacart and more. Blog: https://atlaai.substack.com/








