RL Deep Learning Engineer (Remote)

Posted 21 Hours Ago
7 Locations
In-Office or Remote
125K-180K Annually
Mid level
Artificial Intelligence • Legal Tech • Software • Generative AI
The Role
Build and scale RL environments and evaluation harnesses for long-horizon legal reasoning. Own pipelines converting court filings into contamination-free benchmarks and RL tasks, integrate with partner model APIs, and collaborate with attorneys to create scorable task formats.
Summary Generated by Built In

About Midpage

Midpage is the search engine for legal data used by AI labs. We cover all US court data - 20M records. Over 300 law firms use our platform directly, 200k+ visitors read cases on our site every month, and five multibillion-dollar companies including Perplexity trust us as their legal data supplier. We're a team of 7 in Bowery, lower Manhattan. Our ARR has grown from $400k to $2M in the last 4 months.

The role

We're seeking an engineering generalist to build the first RL environments and benchmarks purpose-built for long-horizon legal reasoning—tasks where AI agents must search, read, analyze, and draft across real case filings, the same work that still takes teams of lawyers days to weeks. Frontier labs are will use these environments to make future models more legally capable and we need an engineer to own the infrastructure that makes it all work.

You'll design and scale the systems that turn millions of real court filings into verifiable evaluation environments and RL training tasks. You'll work directly with our attorneys, our data pipeline, and our partners at frontier AI labs.

What you'll do

- Build and maintain the evaluation harness and RL environment infrastructure—task runners, sandboxed environments, and scoring logic that can scale to thousands of parallel agents

- Own the data pipeline that turns freshly collected court filings into benchmark and RL tasks before they reach any model's training set

- Integrate with partner harnesses and model APIs to run contamination-free evaluations

- Collaborate with attorneys to translate legal workflows like cite checks, motion drafting, and precedent research into structured, scorable task formats using the Harbor spec

What we're looking for

- Strong generalist software engineering fundamentals. You've built, scaled, and maintained diverse systems in production

- You’ve built entire systems yourself, don’t require detailed specs or product managers, and take full ownership over your projects

- Deep experience with Python, bonus for TypeScript. Most importantly, you can work on hard engineering problems

- You should be kind, self-managing, and a clear communicator

- You make effect use of Cursor/Claude Code/Codex and are capable of writing good code without them

Bonuses but not requirements

- Familiarity with LLM evaluation. You get what makes a good rubric and why benchmarks leak

- Comfort working with messy, real-world document data (legal filings, PDFs, long-form text)

Skills Required

  • Strong generalist software engineering fundamentals; built, scaled, and maintained production systems
  • Ownership of end-to-end systems without detailed specs or product managers
  • Deep experience with Python
  • Experience building RL environment infrastructure, task runners, sandboxed environments, and scoring logic
  • Experience owning data pipelines that turn raw documents into benchmark/RL tasks
  • Experience integrating with partner harnesses and model APIs for contamination-free evaluation
  • Ability to collaborate with attorneys and translate legal workflows into scorable task formats (Harbor spec)
  • Kind, self-managing, and clear communicator
  • Effective use of Cursor/Claude Code/Codex and ability to write good code without them
  • TypeScript experience
  • Familiarity with LLM evaluation and benchmark leakage concerns
  • Comfort working with messy, real-world document data (legal filings, PDFs, long-form text)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
7 Employees
Year Founded: 2022

What We Do

Midpage is an AI-powered legal research and drafting platform designed for litigators and law students. It streamlines the process of searching, reading, and analyzing US case law, statutes, and regulations using generative AI tools. By automating the transformation of research into briefs and memos and offering tools like grid-based search, Midpage helps legal professionals manage information overload and increase drafting efficiency.

Similar Jobs

Optum Logo Optum

Senior Software Engineer

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office or Remote
Richmond, BC, CAN
160000 Employees
83K-172K Annually

GitLab Logo GitLab

Social Media Manager

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
Canada
2500 Employees
115K-194K Annually

PwC Logo PwC

Anthropic Alliance Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Remote or Hybrid
37 Locations
370000 Employees
212K-244K Annually

PwC Logo PwC

Tax Director - Global Information Reporting

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Remote or Hybrid
65 Locations
370000 Employees
150K-438K Annually

Similar Companies Hiring

Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
LTX Thumbnail
Conversational AI • Generative AI
Jerusalem, Israel
360 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account