CHAMPtitles

Founding Engineer, Agentic Systems

Posted 25 Days Ago

Be an Early Applicant

San Francisco, CA, USA

In-Office

Senior level

Blockchain • Information Technology

The Role

Build and own core agent runtime and developer/product surfaces: orchestration loops, context and memory management, tool integrations, secure sandboxed execution, data syncing, evals, and AI-native UX. Ship production-grade, observable, recoverable agents and work with design partners to define enterprise requirements, guardrails, and evaluation pipelines.

Summary Generated by Built In

Location: San Francisco Stage: Early, high-ownership, design-partner driven Comp: Competitive salary + meaningful equity

About Champ AI

Champ AI is building a multimodal work-agent orchestration platform that helps ops/support/compliance teams automate end-to-end workflows—not just "chat with docs." We're building agentic systems that can reliably take actions across tools, handle real-world edge cases, and continuously improve with evaluations and feedback loops.

The Role

We're looking for a Founding Engineer (Agentic Systems) to own core pieces of our agent runtime and developer/product surface area. You'll build the systems that let agents operate safely, deterministically, and measurably in production: memory + context management, tool integration, sandbox execution, data syncing, evals, and AI-native UX.

This is a hands-on role where you'll ship to production quickly, work directly with design partners, and help define what "good" looks like for enterprise-grade agents.

What You'll Build

You'll likely own several of these areas end-to-end:

Agent runtime + orchestration

Agent loop design (planning → tool-use → verification → recovery) with strong guardrails.
Context assembly pipelines: retrieval + compression + summarization + "state" that survives long workflows.
Memory management: short-term working memory, long-term memory, user/org/project memory, and safe write policies.
Multi-agent patterns: delegation, handoffs, coordinator/worker setups, and concurrency.

Tooling + integrations

Tool definition frameworks: typed schemas, validation, retries, idempotency, rate limits, and observability.
Connectors + data syncing: SaaS APIs, webhooks, polling strategies, incremental sync, conflict resolution.
Browser automation / computer-use flows (auth, session handling, DOM variability, screenshots, network traces).

Sandbox + execution

Secure execution environments for "agent writes code / runs scripts / transforms data."
Permissions, isolation, secret management, and audit trails.
Deterministic replays where possible; safe "dry run" modes; blast-radius controls.

Evals + reliability

Evaluation harnesses for tool-use correctness, workflow completion, policy compliance, and regression detection.
Golden tasks + synthetic tasks + real production traces; offline + online metrics.
Experimentation frameworks (prompt/model/tool changes), versioning, and rollbacks.
Human-in-the-loop review flows: sampling, labeling, adjudication, continuous improvement loops.

AI-native product + UX

Interfaces that make agents understandable and controllable: traces, state, "why it did that," and editable plans.
UX patterns for approvals, step-through execution, partial automation, and exception handling.
Customer-facing debugability: audit logs, run history, data provenance.

What We're Looking For

You have real agent-building scars. We're specifically looking for engineers who have either:

Shipped AI agents into production (internal or external), or
Built meaningful open-source contributions in agent frameworks, eval tooling, RAG/memory tooling, browser automation, or similar.

You likely have experience with:

LLM tool-use, structured outputs, function calling, and multi-step workflows.
Context engineering: retrieval strategies, chunking, reranking, summarization, memory write/read policies.
Systems thinking: state machines, retries, idempotency, failure modes, and "what happens at 3am."
Integrations: OAuth, scopes, token refresh, pagination, incremental sync, webhooks, rate limiting.
Sandboxed execution or secure-by-default infra patterns (containers, ephemeral environments, secrets).
Observability: traces, metrics, logs; building "explainable runs" for humans.
Evaluation approaches for non-deterministic systems; confidence scoring; regression testing.

Bonus points

You've built AI-native UI surfaces (not just APIs): agent run views, trace explorers, approval UIs, etc.
You've worked with enterprise requirements: SOC2 posture, auditability, access controls, tenant isolation.
You can move between research-y prototyping and production-grade engineering without getting stuck in either.

How We Work

High ownership, fast iteration, direct customer feedback loops.
Strong bias toward shipping + measuring + improving.
You'll have meaningful influence on architecture, product direction, and hiring.

Interview Process (example)

30-min intro + deep dive on prior agent work (we'll ask about failure modes, evals, and production learnings)
Technical session: design an agent system for a real workflow (with tools, memory, guardrails, and evals)
Practical take-home or pair session (small scope, production-minded)
Founder chat + Q&A

Skills Required

Shipped AI agents into production (internal or external)
Built meaningful open-source contributions in agent frameworks, eval tooling, RAG/memory tooling, or browser automation
LLM tool-use, structured outputs, function calling, and multi-step workflow design
Context engineering: retrieval strategies, chunking, reranking, summarization, and memory write/read policies
Systems thinking: state machines, retries, idempotency, failure-mode handling, and operational reliability
Integrations experience: OAuth, token refresh, scopes, pagination, incremental sync, webhooks, and rate limiting
Sandboxed execution and secure-by-default infrastructure (containers, ephemeral environments, secret management)
Observability and explainability: traces, metrics, logs, audit trails, and human-readable run traces
Evaluation approaches for non-deterministic systems: confidence scoring, regression testing, and eval harnesses
Built AI-native UI surfaces (agent run views, trace explorers, approval UIs)
Experience with enterprise requirements: SOC2 posture, auditability, access controls, tenant isolation
Ability to move between research prototyping and production-grade engineering without getting stuck