TLDR: We’re looking for a Data Engineer who can design and build our data pipelines from scratch – handling petabytes of logs, events, and model traces – and create a clean, reliable environment for production, testing, and research workloads.
About usWhite Circle is an AI Safety company building the safety, reliability, and optimization layer for AI systems. At the core of our platform are policies – simple natural-language rules that define what an AI model should and shouldn’t do. We automatically test, enforce, and continuously improve these policies at scale.
We’ve raised $11M from top funds, founders, and senior leaders at OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and others
We process over one hundred million API calls every month
We fine-tune and train our own LLMs so they run faster and cheaper than any open or proprietary model
We’re a small, highly focused team. If you want to work deeply on hard problems, see your work ship to production quickly, and influence how AI safety is actually built – you’re the one we need.
What you’ll doOwn and evolve our internal data tooling.
Scrape external sources, turn raw chaos into structured datasets, and keep them fresh and reliable.
Build and maintain pipelines to transform and version data, ensuring quality through testing and monitoring.
Create new data products that directly address business needs.
Diagnose and fix pipeline issues before they become someone else’s problem.
Ship analytics and dashboards that people actually use.
Jump into data and infra tasks where needed and make things more robust.
What we’re looking for
You’re strong in Python and SQL and comfortable working with messy, real-world data.
You have solid experience with web scraping and know how to make pipelines reliable, monitored, and resilient.
You’ve worked with PostgreSQL (or similar) and understand how to structure and query data efficiently.
You’ve worked with a cloud data warehouse (ClickHouse, BigQuery, Redshift, Snowflake, etc.).
You’re a builder: you fix things, improve systems, and don’t wait for perfect specs.
You communicate clearly and can work in English.
Experience with Metabase or similar BI tools.
Experience versioning datasets and building lightweight data pipelines.
Experience with dbt for data modeling / analytics engineering.
Familiarity with ETL tools (Fivetran, Airbyte, dltHub, etc.).
Experience with AWS (Athena, Glue, S3, etc.).
Experience with AI-assisted or agentic coding.
Paid time off in line with your local regulations, no matter where you work from
Work from Paris (hybrid) + relocation package
Best medical insurance in France
All the hardware, tools, and services you need
Covered subscriptions for AI agents and IDEs
Team off-sites twice a year: we’ve recently been to the Alps and to Saint-Tropez
Intro call with one of our colleagues
Complete the take-home exercise
Show your best during the technical interview
Final call with our CEO and CTO
Please submit your application in English - it’s our company language so you’ll be speaking lots of it if you join
Skills Required
- Strong in Python
- Strong in SQL
- Experience working with messy, real-world data
- Solid experience with web scraping and building reliable, monitored pipelines
- Experience with PostgreSQL or similar relational databases
- Experience with cloud data warehouses (ClickHouse, BigQuery, Redshift, Snowflake, etc.)
- Experience designing and building data pipelines for large-scale datasets
- Ability to build data products, analytics, and dashboards used by stakeholders
- Clear communication in English
- Experience with Metabase or similar BI tools
- Experience versioning datasets and building lightweight data pipelines
- Experience with dbt for data modeling / analytics engineering
- Familiarity with ETL tools (Fivetran, Airbyte, dltHub)
- Experience with AWS (Athena, Glue, S3)
- Experience with AI-assisted or agentic coding
What We Do
White Circle is an enterprise AI control platform specializing in automated vulnerability detection and protection for AI systems. The company provides a unified system for testing, monitoring, and safeguarding AI applications in real time, focusing on blocking unsafe inputs, preventing jailbreaks, and optimizing model performance. Its mission is to secure AI systems and ensure they remain safe and controllable for businesses worldwide.







