Scientific Evals

Reposted 5 Hours Ago
Be an Early Applicant
San Francisco, CA, USA
In-Office
160K-300K Annually
Mid level
Artificial Intelligence • Software • Database
The Role
Design realistic biological benchmarks and curate high-quality datasets to evaluate and train AI systems; analyze model failures, iterate on data and evaluation criteria, collaborate with ML researchers, and manage workflows and documentation across domain experts.
Summary Generated by Built In
About

Edison Scientific builds and commercializes AI agents for science. Scientific discovery moves too slowly, and autonomous AI agents are how we intend to fix that. We're assembling a team of top researchers and engineers across AI and biology to build an AI scientist.

Role

We are seeking an ambitious, scientifically grounded person to join our team focused on developing rigorous benchmarks and training datasets that advance AI capabilities in biology. This role sits at the intersection of biology, data curation, and machine learning, and is ideal for someone with deep scientific training who is excited to shape how frontier AI systems learn to do science.
This role is on-site at our San Francisco office in the Dogpatch neighborhood. Our office is a converted warehouse with high ceilings, open space, and a team that genuinely believes in what they're building.

This position is part of the Evals team. 

Responsibilities
  • Design benchmarks that capture the complexity of real biological research, drawing on your domain expertise to identify what makes scientific reasoning hard. This will include open-ended scientific benchmarks and building on prior work like LAB-Bench and BixBench.
  • Curate and vet biological datasets to ensure scientific rigor.
  • Analyze model outputs, identify failure modes, and contribute to iterative improvements in both datasets and evaluation criteria.
  • Collaborate with AI/ML researchers to translate scientific intuition into training signal, helping AI systems learn not just facts but how scientists think.
  • Coordinate operations and manage workflows, including working with domain experts, tracking task progress, and maintaining documentation.
Qualifications
  • Graduate-level training in biology, biochemistry, computational biology, or a related field, with hands-on research experience.
  • Working knowledge of machine learning concepts, particularly deep learning and large language models.
  • Comfortable with Python and building workflows for data processing, analysis, and experimentation.
  • Possess strong scientific taste and able to identify what distinguishes expert-level reasoning from surface-level pattern matching.
  • Detail-oriented and willing to take on high-value but occasionally tedious work.
  • Energized by ambiguous, open-ended problems that require creativity, collaboration, and first-principles thinking to solve.
  • Organized and communicative, able to manage multiple workstreams and coordinate across teams.
Bonus points for
  • Prior experience creating evaluation datasets, annotation guidelines, or working on human-in-the-loop data pipelines.
  • Experience with bioinformatics pipelines, biological databases, or sequence analysis tools.
  • Hands-on experience fine-tuning or evaluating large language models, or familiarity with RLHF and preference-based training.
  • Publications or research experience in areas relevant to AI for science.
Salary

$160,000 - $300,000  •  Offers equity

Why join us?
  • Competitive salary and equity
  • Full healthcare coverage — we pay 100% of premiums for you and your dependents
  • Support for growing families, including a yearly new parent stipend and fertility coverage through Carrot
  • 401(k) company matching
  • $300 health and wellness benefit
  • Lunch is on us every day you're in the office, and dinner is on us when you're working late
  • Regular team offsites and company events
  • A fast-moving, mission-driven culture where smart people do their best work and actually enjoy doing it

Skills Required

  • Graduate-level training in biology, biochemistry, computational biology, or a related field with hands-on research experience
  • Working knowledge of machine learning concepts, particularly deep learning and large language models
  • Proficiency with Python and ability to build workflows for data processing, analysis, and experimentation
  • Strong scientific judgement to distinguish expert-level reasoning from surface-level pattern matching
  • Detail-oriented, able to do high-value but occasionally tedious work
  • Ability to work on ambiguous, open-ended problems requiring creativity and first-principles thinking
  • Organized and communicative, able to manage multiple workstreams and coordinate across teams
  • Prior experience creating evaluation datasets, annotation guidelines, or human-in-the-loop data pipelines
  • Experience with bioinformatics pipelines, biological databases, or sequence analysis tools
  • Hands-on experience fine-tuning or evaluating large language models; familiarity with RLHF and preference-based training
  • Publications or research experience in areas relevant to AI for science
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
47 Employees
Year Founded: 2025

What We Do

Spun out from FutureHouse in 2025, Edison Scientific accelerates discovery and innovation across the sciences. Our platform empowers researchers to move from question to breakthrough faster than ever, automating literature synthesis, data analysis, and molecular design. At its core is Kosmos, our AI scientist, capable of running hundreds of research tasks in parallel. It transforms raw datasets into comprehensive, validated reports—compressing months of work into a single run. With Edison Scientific, scientists remain in control, using AI to amplify their expertise and accelerate discovery at unprecedented speed.

Similar Jobs

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Sales Associate II

eCommerce • Fashion • Retail • Sales • Wearables • Design
Hybrid
Tejon Ranch, CA, USA
16000 Employees
15-20 Hourly

Cox Enterprises Logo Cox Enterprises

Human Resources Business Partner

Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
Remote or Hybrid
United States
50000 Employees
67K-101K Annually

Cox Enterprises Logo Cox Enterprises

Customer Success Manager

Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
Remote or Hybrid
United States
50000 Employees
92K-154K Annually

Axle Health Logo Axle Health

Engineering Manager

Artificial Intelligence • Healthtech • Information Technology • Logistics
In-Office
Santa Monica, CA, USA
22 Employees
200K-250K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account