Prime Intellect

Applied Research - Evals & Data

Reposted 10 Days Ago

Be an Early Applicant

San Francisco, CA, USA

Hybrid

Mid level

Artificial Intelligence • Software

The Role

This role involves designing AI agents, building robust infrastructure, and translating customer insights into technical requirements while working with reinforcement learning and applied data.

Summary Generated by Built In

Own Your Intelligence

Prime Intellect is building the open superintelligence stack: the infrastructure frontier AI labs build internally, made available to every ambitious AI team.

Our platform, Lab, unifies compute, environments, evaluations, secure sandboxes, high-performance training, and deployment into one full-stack system for post-training at frontier scale - from SFT and RL to tool use, agent workflows, and continuously improving production models. We are building open frontier AI: open-source models trained end to end for long-horizon tasks like autonomous research, and the full-stack platform our own research team uses to build them. The next generation of AI companies, enterprises, and research teams do not just need more GPUs. They need the ability to turn their own workflows, tools, data, and feedback loops into superintelligence they own.

Prime Intellect has raised $150M in total funding from Founders Fund, Radical Ventures, NVIDIA, and exceptional AI, infrastructure, and enterprise operators — including Andrej Karpathy, Dwarkesh Patel, and leaders and founders from Ramp, Perplexity, Harvey, Mercor, Zapier, Datadog, Cognition, OpenAI, Thinking Machines, Together AI, SemiAnalysis, LangChain, Browserbase, Cloudflare, Sierra, Databricks, Airbnb, OpenRouter, Standard Intelligence, Fleet, Core Auto, and more. We are looking for people who want to build at the intersection of frontier research, real infrastructure, and go-to-market for a category that does not fully exist yet.

Role Impact

This is a customer facing role at the intersection of cutting-edge RL/post-training methods, applied data, and agent systems. You’ll have a direct impact on shaping how advanced models are aligned, evaluated, deployed, and used in the real world by:

Advancing Agent Capabilities: Designing and iterating on next-generation AI agents that tackle real workloads—workflow automation, reasoning-intensive tasks, and decision-making at scale. Working with applied data from real deployments to continuously refine policies, improve reasoning, and enhance reliability and safety.
Building Robust Infrastructure: Developing the distributed systems, evaluation pipelines, and coordination frameworks that enable these agents to operate reliably, efficiently, and at massive scale. Building data capture, processing, and versioning workflows for feedback, model traces, and reward signals.
Bridge Between Customers & Research: Translating customer needs and insights from applied data into clear technical requirements that guide product and research priorities. Collaborating closely with RL and eval teams to ensure real-world signals inform model alignment and reward shaping.
Prototype in the Field: Rapidly designing and deploying agents, evals, and harnesses alongside customers to validate solutions. Using applied evaluation data to iterate on model performance and discover new capabilities.

Customer-Facing Engineering

Work side-by-side with customers to deeply understand workflows, data sources, and bottlenecks.
Prototype agents, data pipelines, and eval harnesses tailored to real use cases, then hand off hardened systems to core teams.
Translate customer insights and evaluation results into roadmap and research direction.

Post-training & Reinforcement Learning

Design and implement novel RL and post-training methods (RLHF, RLVR, GRPO, etc.) to align large models with domain-specific tasks.
Build evaluation harnesses and verifiers to measure reasoning, robustness, and agentic behavior in real-world workflows.
Integrate applied data collection and analytics into the post-training process to surface regressions, emergent skills, and alignment opportunities.
Prototype multi-agent and memory-augmented systems to expand capabilities for customer-facing solutions.

Agent Development & Infrastructure

Rapidly prototype and iterate on AI agents for automation, workflow orchestration, and decision-making.
Extend and integrate with agent frameworks to support evolving feature requests and performance requirements.
Architect and maintain distributed training and inference pipelines, ensuring scalability and cost efficiency.
Develop observability and monitoring (Prometheus, Grafana, tracing) to ensure reliability and performance in production deployments.

Requirements

Strong background in machine learning engineering, with experience in post-training, RL, or large-scale model alignment.
Experience with applied data workflows and evaluation frameworks for large models or agents (e.g., SWE-Bench, HELM, EvalFlow, internal eval pipelines).
Deep expertise in distributed training/inference frameworks (e.g., vLLM, sglang, Ray, Accelerate).
Experience deploying containerized systems at scale (Docker, Kubernetes, Terraform).
Track record of research contributions (publications, open-source contributions, benchmarks) in ML/RL.
Passion for advancing the state-of-the-art in reasoning, measurement, and building practical, agentic AI systems.

What We Offer

Cash Compensation Range of $150-300k + equity incentives
Flexible Work (remote or San Francisco)
Visa Sponsorship & relocation support
Professional Development budget
Team Off-sites & conference attendance

Growth Opportunity

You’ll join a mission-driven team working at the frontier of open, superintelligence infra. In this role, you’ll have the opportunity to:

Shape the evolution of agent-driven, data-informed solutions—from research breakthroughs to production systems used by real customers.
Collaborate with leading researchers, engineers, and partners pushing the boundaries of RL, evaluation, and post-training.
Grow with a fast-moving organization where your contributions directly influence both the technical direction and the broader AI ecosystem.

If you’re excited to move fast, build boldly, and help define how agentic AI is developed and deployed, we’d love to hear from you.

Ready to build the open superintelligence infrastructure of tomorrow?
Apply now to help us make powerful, open AGI accessible to everyone.

Skills Required

Strong background in machine learning engineering, with experience in post-training, RL, or large-scale model alignment.
Experience with applied data workflows and evaluation frameworks for large models or agents (e.g., SWE-Bench, HELM, EvalFlow, internal eval pipelines).
Deep expertise in distributed training/inference frameworks (e.g., vLLM, sglang, Ray, Accelerate).
Experience deploying containerized systems at scale (Docker, Kubernetes, Terraform).
Track record of research contributions (publications, open-source contributions, benchmarks) in ML/RL.
Passion for advancing the state-of-the-art in reasoning, measurement, and building practical, agentic AI systems.

View all jobs at Prime Intellect

View Prime Intellect Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: San Francisco, CA

16 Employees

What We Do

Prime Intellect democratizes AI development at scale. Our platform makes it easy to find global compute resources and train state-of-the-art models through distributed training across clusters. Collectively own the resulting open AI innovations, from language models to scientific breakthroughs.