ML Research Engineer

Posted Yesterday
Be an Early Applicant
2 Locations
Hybrid
120K-250K Annually
Senior level
Artificial Intelligence • Security • Software • Cybersecurity
The Role
Train and post-train LLMs using SFT/RLHF/DPO, build reward models and large-scale data pipelines, run distributed multi-GPU training, design evaluation benchmarks, and optimize models for production inference and serving.
Summary Generated by Built In

TLDR: We are looking for several ML Engineers to train, post-train, and evaluate the LLMs at the core of our platform. This is hands-on modern model training work: large-scale data pipelines, SFT/RLHF/DPO-style alignment, reward models, distributed multi-GPU training, and evaluation.

About us

White Circle is an AI Safety company building the safety, reliability, and optimization layer for AI systems. At the core of our platform are policies – simple natural-language rules that define what an AI model should and shouldn’t do. We automatically test, enforce, and continuously improve these policies at scale.

  • We’ve raised $11M from top funds, founders, and senior leaders at OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and others

  • We process over 100M+ API calls every month

  • We fine-tune and train our own LLMs so they run faster and cheaper than any open or proprietary model

We’re a small, highly focused team. If you want to work deeply on hard problems, see your work ship to production quickly, and influence how AI safety is actually built – you’re the one we need.

 

You will:

  • Turn petabytes of unstructured text into a structured, explorable view (topics, clusters, segments, trends, anomalies): iterate from “unknown unknowns” to stable definitions we can track.

  • Build scalable representation pipelines: sampling strategies, preprocessing/normalization, embeddings at scale, indexing, and retrieval to make the corpus searchable and analyzable.

  • Use LLMs pragmatically: labeling/classification, weak supervision, data enrichment, summarization, and automated diagnostics of inbound volumes (with cost/quality controls).

  • Deliver insights that change decisions: translate findings into product and operational actions (what data we have, what’s missing, where quality breaks, what to prioritize next).

  • Ship self-serve analytics: datasets, data models, and lightweight tools/dashboards so the team can explore and answer questions without ad-hoc requests.

  • Partner closely with engineering/research: align pipelines with production constraints (latency/cost/privacy), and integrate outputs into workflows.

You'll fit right in if you:

  • Strong Python + SQL with an engineering mindset: you can build reliable pipelines, not just notebooks.

  • Solid applied NLP/ML experience on real-world text: embeddings, clustering, topic modeling, semantic search, classification; you understand failure modes and how to debug them.

  • Comfortable at scale: distributed processing, large-scale storage-querying, and performance-cost tradeoffs.

  • You know how to evaluate fuzzy problems: offline/online metrics, human-in-the-loop labelling, inter-annotator agreement, drift monitoring, and reproducibility.

  • Have prior work with safety/moderation datasets, policy/rule systems, or high-volume logging/observability

A big plus:

  • A public builder footprint: open-source models, datasets, or training frameworks on HuggingFace/GitHub, benchmarks, papers (workshop or main conference), or technical posts with real usage

  • Experience training models at a frontier or near-frontier lab, or leading open-source model releases with documented adoption

  • Experience with RL methods for LLMs beyond standard RLHF: online RL, GRPO-style methods, or novel alignment approaches

  • Experience with moderation, safety, or classification models at scale

  • Multilingual model training experience

Why White Circle

  • Paid time off in line with your local regulations, no matter where you work from

  • Work from Paris (hybrid) with a relocation package available, or work from London (note: we are currently unable to provide relocation support and medical insurance for London-based roles)

  • Comprehensive medical insurance for our France-based team

  • All the hardware, tools, and services you need

  • Covered subscriptions for AI agents and IDEs

  • Team off-sites twice a year: we've recently been to the Alps and to Saint-Tropez

How we hire

  1. Introductory call with HR (25 min)

  2. Take-home test task

  3. Technical interview with Head of Applied Research (60 min)

  4. Final conversation with our CEO (45 min)

Please submit your application in English.

Skills Required

  • Hands-on experience training and post-training LLMs using SFT, RLHF, DPO, or related methods
  • Experience building and operating large-scale data pipelines: collection, generation, filtering, deduplication, and quality control
  • Experience training models on distributed multi-GPU clusters
  • Proficiency with PyTorch or JAX
  • Experience building or working with reward models and preference data
  • Deep understanding of evaluation and benchmark design for model behavior
  • Experience optimizing inference: quantization, speculative decoding, vLLM, TensorRT, Triton or similar
  • Strong Python skills and comfort with SQL-like data tooling for large-scale data work
  • Strong ownership mindset: able to take ambiguous problems to production and iterate from feedback
  • Public builder footprint: OSS models, datasets, training frameworks, benchmarks, papers, or technical posts
  • Experience training models at frontier or near-frontier labs, or leading open-source model releases
  • Experience with RL methods beyond standard RLHF (online RL, GRPO-style methods, novel alignment)
  • Experience with moderation, safety, or classification models at scale
  • Multilingual model training experience
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
23 Employees
Year Founded: 2025

What We Do

White Circle is an enterprise AI control platform specializing in automated vulnerability detection and protection for AI systems. The company provides a unified system for testing, monitoring, and safeguarding AI applications in real time, focusing on blocking unsafe inputs, preventing jailbreaks, and optimizing model performance. Its mission is to secure AI systems and ensure they remain safe and controllable for businesses worldwide.

Similar Jobs

Harmattan AI Logo Harmattan AI

ML Research Engineer (Detect & Track Distillation)

Artificial Intelligence • Computer Vision • Machine Learning • Robotics • Defense • Manufacturing
In-Office
Paris, Île-de-France, FRA
131 Employees

Nebius Logo Nebius

Senior Machine Learning Engineer

Artificial Intelligence • Information Technology • Consulting
In-Office or Remote
29 Locations
473 Employees
Hybrid
4 Locations
92 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account