The Role
Build and own production applied-AI systems focused on information retrieval and LLM pipelines. Design experiment frameworks and evaluation pipelines, optimize for metrics (precision, recall, F1), run benchmarks, perform failure analysis, and iterate to improve real-world system performance at scale.
Summary Generated by Built In
About CaseGuild
The Role
You Might Thrive in This Role If You:
Compensation
CaseGuild builds the industry’s most advanced evidence reasoning platform for complex litigation, helping legal teams investigate massive document sets and surface critical facts with speed and precision.
We’re a fast growing, early-stage Seattle startup building systems at the intersection of information retrieval, machine learning, and large language models, operating at billions of tokens where accuracy and speed matter. You’ll join a small, senior team that ships production systems daily, measures everything, and iterates based on real-world performance.
This role is for a startup engineer who is data-first, evaluation-driven, and has built production systems.
You’ve spent years building ML or AI systems where success isn’t measured by demos, but by metrics, benchmarks, and real-world performance. You understand that modern LLM pipelines still require datasets, experiments, baselines, and failure analysis, and you enjoy owning that end-to-end.
- Have 5+ years of experience building ML or applied AI systems where accuracy and evaluation mattered
- Have designed and owned experiment frameworks and evaluation pipelines in production
- Are fluent in metrics (precision, recall, F1) and know when each matters
- Have strong foundations in classic ML, NLP, and information retrieval, now applied to LLM-based systems
- Have experience working with multiple LLM providers and models, and don’t treat them as black boxes
- Enjoy end-to-end ownership and pragmatic tradeoffs in a startup environment
- Have a high sense of ownership and agency, with a bias toward getting 1% better every day
- Care deeply about correctness, rigor, and repeatability
Compensation
The base pay range for this role is $80,000 – $140,000 per year.
Skills Required
- 5+ years building ML or applied AI systems where accuracy and evaluation mattered
- Designed and owned experiment frameworks and evaluation pipelines in production
- Fluent in evaluation metrics (precision, recall, F1) and their application
- Strong foundations in classic machine learning, NLP, and information retrieval
- Experience working with multiple LLM providers and models and treating them as non-black-boxes
- Experience building production systems with end-to-end ownership and pragmatic tradeoffs in a startup environment
- Focus on correctness, rigor, and repeatability in ML experiments and deployments
Am I A Good Fit?
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.
Success! Refresh the page to see how your skills align with this role.
The Company
What We Do
CaseGuild is an AI-powered evidence reasoning platform designed to assist legal teams in complex litigation. It enables legal professionals to quickly analyze massive document collections to uncover key evidence, identify patterns, and build stronger cases. Founded by industry veterans from Microsoft, Meta, and major law firms, the company focuses on enhancing the accuracy and efficiency of case assessments through specialized AI workflows.








