Senior Data Infrastructure Engineer

Reposted 10 Days Ago
San Francisco, CA, USA
In-Office
Senior level
Artificial Intelligence • Software
The Role
Build and scale real-time data pipelines processing 100k+ traces/sec, run LLM-based scoring and clustering near-real time, optimize LLM serving and ClickHouse OLAP performance, and own infrastructure roadmap from ingestion through analytics.
Summary Generated by Built In

Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM). While traditional observability focuses on logging exceptions and latency, our ABM surfaces behavioral anomalies such as instruction drifts and context retrieval loss in scaled production environments.

Hundreds of teams building autonomous agents rely on Judgment to understand how their systems behave post-deployment. Instead of reactive incident triage, they cluster patterns across conversations and workflows, correlate regressions to specific interaction types, and pinpoint where reliability breaks down. We've raised $30M+ across two rounds in the past five months from investors including Lightspeed, SV Angel, and Valor Equity Partners.

We’ve raised $30M+ across two rounds in the past five months. Our investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, Chris Manning, Michael Ovitz, Michael Abbott, Cory Levy, Kevin Hartz, and others.

The Role:

We are looking for a Senior Data Infrastructure Engineer to build and scale the real-time data pipelines that power agent behavior analysis at production scale. This role is crucial for processing hundreds of thousands of traces per second, running LLM-based scoring and clustering in near-real time, and delivering the low-latency query performance that enables teams to understand agent behavior as it happens. We need someone who has built petabyte-scale data systems, knows how to squeeze performance out of OLAP databases, and can own the data infrastructure from ingestion through analytics.

What You'll Do:
  • Design and automate large-scale, high-performance streaming and batch data processing systems to power Judgment's behavioral analysis products.

  • Partner closely with infrastructure and backend partners to improve scalability, data governance, and efficiency.

  • Evangelize high-quality software engineering practices for data infrastructure at scale.

  • Advocate for a high bar on data and engineering quality: reliable, efficient, well-documented, testable, and maintainable.

  • Design data models for optimal storage and access, with thoughtful data flows to power critical product requirements.

  • Optimize OLAP database performance through schema design, partitioning strategies, storage tiering, and access pattern analysis.

What We're Looking For:
  • 6+ years of relevant industry experience building and operating high-throughput, petabyte-scale data pipelines in production.

  • Experience collaborating with infrastructure, backend, and product partners to align on data flow and system design.

  • Experience designing and deploying high-performance systems with reliable monitoring and observability practices

  • Deep expertise with streaming and batching systems (Kafka, Spark, Flink, or Ray) operating at petabyte scale.

  • Hands-on OLAP database engineering experience, including with columnar databases (ClickHouse or similar) and distributed query engines (Presto or similar)

  • Excellent communication skills, both written and verbal

Nice to have:

  • Experience building pipelines that call LLM APIs at scale: request batching, rate limit management, cost optimization.

  • Familiarity with ML workflow orchestration (Airflow, Dagster, Prefect).

  • Experience with embedding generation pipelines or vector search infrastructure.

  • Background in observability, log processing, or event stream platforms (Datadog, Honeycomb, Sentry).

  • Data quality monitoring and anomaly detection within pipelines

Why Judgment?
  • Agents can’t work without this. Today’s agents hallucinate, drift, and break in production. We’re building the infrastructure that fixes this: the monitoring layer that makes agents self-improving.

  • We’re wired to win. We're a team of less than 20 but we ship like 50+ on the daily. You'll be working with olympiad medalists, debate champions, and competitive athletes who bring that same intensity to company building.

  • Fast track to founding. Our engineers interface directly with customers, ship code into their environments, and use their feedback to dictate what’s next on the roadmap. Everyone on the team is either an ex-founder or a founder-to-be.

  • We make sure our people do their best work. If you deserve a spot on the team, money will never get in the way of it. Full benefits, Equinox, and a private chef to take care of you. We sprint hard but we play hard, ask us about our Smash/Mario Kart tournaments.

    We work in person in San Francisco.

Skills Required

  • Experience building and tuning high-throughput petabyte-scale data pipelines
  • Deep knowledge of data infrastructure (Apache Spark, Ray, dbt, Airflow/Dagster)
  • Experience with OLAP database engineering (ClickHouse)
  • Comfortable with cloud infrastructure and batch + streaming pipelines
  • Design streaming pipelines to score and cluster 100k+ traces/s using LLM APIs
  • Senior-level ownership of infrastructure roadmap, architecture design, and shipping fixes
  • Ability to analyze LLM serving bottlenecks (flamegraphs) and improve RPS via batching and concurrency
  • On-site work in San Francisco
  • Experience with LLM inference and serving optimizations (speculative decoding, continuous/dynamic batching, KV cache management)
  • Familiarity with quantization techniques (INT8, INT4), multi-GPU serving, and tensor parallelism
  • Background from observability companies, trading, RecSys/ML big tech, or AI labs
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
20 Employees
Year Founded: 2025

What We Do

Judgment Labs builds agent behavior monitoring (ABM) infrastructure. Judgment provides a toolkit to track and judge agent behavior in online and offline setups, enabling you to convert high-signal interaction data from production/test environments into more reliable agents.

Similar Jobs

In-Office
2 Locations
2359 Employees
213K-263K Annually
In-Office
Santa Clara, CA, USA
993 Employees
175K-296K Annually

McKinney Logo McKinney

Senior Platform Engineer

AdTech • Agency • Digital Media
In-Office
Los Angeles, CA, USA
473 Employees
140K-160K Annually

Decagon Logo Decagon

Senior Software Engineer

Artificial Intelligence • Software
In-Office
San Francisco, CA, USA
49 Employees
200K-400K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account