Rox

Founding Applied Research Engineer

Reposted 16 Hours Ago

Be an Early Applicant

San Francisco, CA, USA

In-Office

Mid level

Information Technology • Software

The Role

Design and run research programs focused on agent systems, signal classification, cost-efficient inference, and behavioral benchmarking in applied AI. Build evaluation frameworks, conduct experiments, and convert findings into production systems to define applied AI research agendas.

Summary Generated by Built In

Why This Role Exists

Foundation models are commoditizing. Defensibility comes from specialized models, proprietary training signals, and evaluation ownership. Every applied AI company we benchmark against like Decagon, Harvey, Sierra, Cursor has already moved. The window to claim frontier applied AI for revenue is closing in the next few months.

Rox is in market. We run agents against enterprise data at scale, every day. We see exactly where research meets production and where the data is dirty, state is changing, and being wrong costs (a lot of) money.
The Applied Research team exists to close that gap permanently.

What This Team Works On

Four problems we care about right now:

Cost-efficient inference for Clever Columns. Distill a Rox-trained model from frontier teachers so per-account enrichment runs at 1/20th the cost without quality loss. Ships first. Doesn't require trajectory attribution.

Signal classification across the public knowledge graph. A small, fast classifier that distinguishes genuine buying signals from noise across the news, jobs, and filings corpus we already ingest at scale. Powers Recommended Next Moves and Auto Prospecting. Cleanest data subset.

Personalization grounding and hallucination detection. A reward model that catches fabricated prospect context in Sequences in real time. This is the most underrated production failure mode in outbound AI. Trained on cross-customer consensus edits.

Sequencing policy under sparse, delayed rewards. Offline-to-online RL on multi-touch trajectories with intermediate signals as proxies for terminal outcomes. Long-horizon flagship. Hard. [Depends on trajectory instrumentation in progress with Platform Eng.]

These are not benchmark problems. They have real SLAs and real customers depending on them.

What You'll Do

Design and run research programs tied directly to the four above.
Build evaluation frameworks that measure trajectory quality, not just final output, because most eval infrastructure measures end results and we care about the path.
Work on agent memory, retrieval, and context systems alongside elite and competitive engineering minds.
Translate findings into infrastructure with measurable production impact. Help define where Rox Research goes next.

What We're Looking For

You have spent real time thinking about how agents fail in practice, not just on benchmarks. You have built evaluation systems and know exactly where standard approaches break down. You can write code well enough to implement your own ideas, run your own experiments, and ship things that make it into production.

You move fast. The environment changes monthly and the team ships continuously.

Particularly relevant: agent evaluation and behavioral benchmarking; retrieval-augmented generation and knowledge graph systems; RL applied to real-world agent behavior; production ML systems (latency, reliability, observability); post-training and model adaptation for production use cases.

A PhD is not required. Strong research instincts and the ability to ship are.

What Success Looks Like

First few weeks: you understand Rox's architecture, where the production problems are, and where the research gaps are. You have opinions and you share them.

First few months: you are running experiments that directly inform how we build. Something you worked on is in production.

Over time: you are defining the research agenda for the most interesting applied AI problem in the enterprise. The systems you build are things no one else has built before, because no one else has the structural data position to build them.

Why Join Now

We are at an unusual moment. Large enough to have real scale, real customers, and genuinely interesting research problems. Small enough that you are one of a handful of people shaping what the Applied Research function looks like and what it prioritizes.

The team is extraordinary: IMO, IOI, and ICPC medalists, researchers from DeepMind and OpenAI. The feedback loop is a live enterprise system, not a leaderboard. If that's not more interesting to you than publishing for the sake of publishing, this probably isn't the right fit.

San Francisco, onsite. We relocate exceptional people.