- Scope, prototype, and run behavioral evaluations in response to emerging policy and oversight needs, including rapid-turnaround work for government and civil society partners.
- Execute on Transluce's contracts with government evaluators, including building evaluations for harmful manipulation with the EU AI Office.
- Design and run privileged-access evaluations and external oversight exercises with frontier labs.
- Work with civil society organizations and domain experts to adapt our behavioral evaluation pipelines to their contexts (e.g., mental health, persuasion, evaluation awareness).
- Hands-on experience designing and running AI evaluations, particularly behavioral or interactive evaluations (multi-turn, agentic, or red-teaming contexts)
- Strong engineering instincts and good judgment about when "good enough to ship" is actually good enough.
- Experience in customer-facing, consulting, or forward-deployed roles translating ambiguous stakeholder needs into concrete deliverables.
- Experience running evaluations at scale or in a production context.
- Ability to understand and balance between the needs of AI researchers and domain experts, as well as between researchers and senior decision makers.
- Strong communication skills, low ego, openness to giving and receiving feedback.
Skills Required
- Experience designing and running AI evaluations, especially behavioral or interactive evaluations
- Strong engineering instincts and judgment for deliverables
- Experience in customer-facing, consulting, or forward-deployed roles
- Experience running evaluations at scale or in a production context
- Ability to balance needs of researchers and domain experts
- Strong communication skills and openness to feedback
What We Do
Transluce is an independent research lab that builds open, scalable technology for understanding AI systems and steering them in the public interest. Transluce means to shine light through something to reveal its structure. Today’s complex AI systems are difficult to understand—not even experts can reliably predict their behavior once deployed. Given AI's extraordinary consequences on society, we need scalable and open analyses of the capabilities and risks of AI systems. We are building open source, AI-driven tools to understand and analyze AI systems. We will apply these tools to open-weight models, so the world can vet our analyses and improve their reliability. Once our technology has been vetted, we will work with frontier AI labs and governments to ensure that internal assessments reach the same standards as our publicly vetted procedures. Email: [email protected]







