Elicit is building the reasoning layer for science and decision-making. We use language models to search over 125 million papers, extract data, and surface insights so that researchers, policy-makers, and industry leaders can go from questions to evidence-backed decisions in minutes.
Today, hundreds of thousands of researchers have used Elicit to speed up literature reviews, automate systematic reviews, and explore new domains. As we expand our impact beyond academic research, we are laying the groundwork for ML systems that are systematic, transparent, and unbounded when reasoning at scale.
To do this, Elicit is pioneering supervision of process, not outcomes. Instead of favoring large black-box models, we break complex questions down into human-legible steps and supervise the reasoning process itself. This approach delivers more transparent, defensible answers today and charts a safer path toward advanced AI tomorrow.
Our vision is ambitious: we’re building the default starting point for understanding and reasoning through any hard question. We invite you to help us build that future.
(See how people use Elicit today on Twitter; explore our vision in the roadmap.)
About the roleAs a Machine Learning Engineer at Elicit, you’ll build products and workflows that help researchers and scientific teams make higher quality decisions with language models.
This is not a role for someone who only wants to develop models in isolation from user impact. A large part of the work is software engineering: building product experiences, APIs, data integrations, evaluation systems, and reliable harnesses that make language models reliably useful and trustworthy in high-stakes domains.
You’ll work on problems like:
Turning messy, ambiguous research tasks into clear product experiences
Building interfaces and artifacts that help users understand, trust, and act on model outputs, thinking beyond the chat interface while leveraging full model capabilities
Combining language models with external tools, structured and unstructured data, and retrieval systems
Improving quality through building careful evaluations, truth-conducive model environments and tools, and targeted ML modeling where the impact is high
Agentic harnesses for target assessment, evidence synthesis, and experiment planning that allow models to provide guarantees about their processes
Data integrations across literature, scientific databases, customer data, and internal tools
APIs that customers can use in their own systems
Evaluation systems that help us understand whether a change actually improves user outcomes
Trust and transparency features, like source-quality signals, intermediate reasoning, and better ways to inspect and fix outputs
Examples of projects you could work on:
Build a target-assessment workflow that combines literature, genetics, chemistry, clinical, regulatory, and company data into a shareable artifact.
Build experiment-planning and iteration tools that help researchers decide what to do next and learn from new results.
Build evidence-monitoring workflows that keep teams up to date through alerts, briefs, and living reports.
Build enterprise APIs and structured-output pipelines that plug Elicit into customers’ internal systems.
Build interfaces that make it easier to inspect, trust, and correct model outputs.
Build workflow-specific evals and quality systems that tell us whether a product change actually helped users.
Improve extraction, reasoning, or search quality with better prompts, better system design, or finetuning when appropriate.
A strong software engineering background and can build end-to-end systems, not just scripts or notebooks
Fluency with language models to reason well about prompting, retrieval, evals, failure modes, and where (and how) finetuning is or isn’t worth it
Strong product sense and likes turning fuzzy user problems into concrete things people can use
An excitement to solve difficult, creative problems rather than narrow optimization on well-defined benchmarks
Ability to move across backend, data, and model layers as needed
Clear communication with product, design, domain experts, and other engineers
Ability to use coding assistants effectively and thoughtfully, and has adapted their workflow to become much more effective with them
To get a sense for how some of us look at applications, see this thread. (The short version: Wherever we can, we prefer to directly evaluate work.)
You’ll thrive here if you:Like shipping user-facing things quickly
Enjoy working on ambiguous problems with a lot of autonomy
Care about product quality and user trust, not just technical novelty
Want to build new kinds of software made possible by language models
Are excited to use AI tools as part of your daily engineering workflow, while still applying strong judgment
This is probably not the right role if you mainly want to:
do low-level model systems work like CUDA optimization or model serving infrastructure as your primary focus
work only on research experiments without owning production systems
optimize benchmark numbers without much connection to user workflows or product outcomes
We do care about model quality, evals, and sometimes finetuning. But those matter because they help us build products users can rely on, not as ends in themselves.
Am I a good fit?Consider these questions:
How does a transformer work?
What is a tokenizer?
What is a decorator in Python?
What are generic types?
Strong applicants will find it easy to answer these questions.
Location and travelWe have a lovely office in Oakland, CA, but we also have remote employees across the US. It's important to us to spend time with our teammates, so we ask that all Elicians come together for a quarterly team retreat, normally in or around the SF bay area.
BenefitsIn addition to working on important problems as part of a happy, productive, and positive team, we also offer great benefits (with some variation based on work location):
Flexible work environment - work from our office in Oakland or remotely as long as you can travel to work in-person for retreats and coworking events
Fully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family
Flexible vacation policy, with a minimum recommendation of 20 days/year + company holidays
401K with a 6% employer match
A new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter
$1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools to incorporate into your workflow, take courses, purchase educational resources, or attend AI-focused conferences and events
A team administrative assistant that you can delegate personal and work tasks to
Commuter benefits, a relocation bonus, and more!
You can find more reasons to work with us in this thread.
For all roles at Elicit, we use a data-backed compensation framework to make sure our salaries are market-competitive, equitable, and simple. For this role, we're targeting starting ranges of:
Career (L3): $185-230K + equity
Senior (L4): $230-260K + equity
Expert/Staff (L5): $255-340K + significant equity
We're optimizing for a hire who can contribute at a L4/senior-level or above. We'd love to meet staff/principal level contributors as well.
We also offer above-market equity for all roles at Elicit, as well as employee-friendly equity terms.
Join us!
Top Skills
What We Do
Elicit, the AI research assistant, helps you automate time-consuming research tasks like summarizing papers, extracting data, and synthesizing your findings.We're a public benefit company with a mission is to scale up good reasoning. We want machine learning to help as much with thinking and reflection as it does with tasks that have clear short-term outcomes.








