Location: San Francisco Stage: Early, high-ownership, design-partner driven Comp: Competitive salary + meaningful equity
About Champ AIChamp AI is building a multimodal work-agent orchestration platform that helps ops/support/compliance teams automate end-to-end workflows—not just "chat with docs." We're building agentic systems that can reliably take actions across tools, handle real-world edge cases, and continuously improve with evaluations and feedback loops.
The RoleWe're looking for a Founding Engineer (Agentic Systems) to own core pieces of our agent runtime and developer/product surface area. You'll build the systems that let agents operate safely, deterministically, and measurably in production: memory + context management, tool integration, sandbox execution, data syncing, evals, and AI-native UX.
This is a hands-on role where you'll ship to production quickly, work directly with design partners, and help define what "good" looks like for enterprise-grade agents.
What You'll BuildYou'll likely own several of these areas end-to-end:
Agent runtime + orchestrationAgent loop design (planning → tool-use → verification → recovery) with strong guardrails.
Context assembly pipelines: retrieval + compression + summarization + "state" that survives long workflows.
Memory management: short-term working memory, long-term memory, user/org/project memory, and safe write policies.
Multi-agent patterns: delegation, handoffs, coordinator/worker setups, and concurrency.
Tool definition frameworks: typed schemas, validation, retries, idempotency, rate limits, and observability.
Connectors + data syncing: SaaS APIs, webhooks, polling strategies, incremental sync, conflict resolution.
Browser automation / computer-use flows (auth, session handling, DOM variability, screenshots, network traces).
Secure execution environments for "agent writes code / runs scripts / transforms data."
Permissions, isolation, secret management, and audit trails.
Deterministic replays where possible; safe "dry run" modes; blast-radius controls.
Evaluation harnesses for tool-use correctness, workflow completion, policy compliance, and regression detection.
Golden tasks + synthetic tasks + real production traces; offline + online metrics.
Experimentation frameworks (prompt/model/tool changes), versioning, and rollbacks.
Human-in-the-loop review flows: sampling, labeling, adjudication, continuous improvement loops.
Interfaces that make agents understandable and controllable: traces, state, "why it did that," and editable plans.
UX patterns for approvals, step-through execution, partial automation, and exception handling.
Customer-facing debugability: audit logs, run history, data provenance.
You have real agent-building scars. We're specifically looking for engineers who have either:
Shipped AI agents into production (internal or external), or
Built meaningful open-source contributions in agent frameworks, eval tooling, RAG/memory tooling, browser automation, or similar.
You likely have experience with:
LLM tool-use, structured outputs, function calling, and multi-step workflows.
Context engineering: retrieval strategies, chunking, reranking, summarization, memory write/read policies.
Systems thinking: state machines, retries, idempotency, failure modes, and "what happens at 3am."
Integrations: OAuth, scopes, token refresh, pagination, incremental sync, webhooks, rate limiting.
Sandboxed execution or secure-by-default infra patterns (containers, ephemeral environments, secrets).
Observability: traces, metrics, logs; building "explainable runs" for humans.
Evaluation approaches for non-deterministic systems; confidence scoring; regression testing.
You've built AI-native UI surfaces (not just APIs): agent run views, trace explorers, approval UIs, etc.
You've worked with enterprise requirements: SOC2 posture, auditability, access controls, tenant isolation.
You can move between research-y prototyping and production-grade engineering without getting stuck in either.
High ownership, fast iteration, direct customer feedback loops.
Strong bias toward shipping + measuring + improving.
You'll have meaningful influence on architecture, product direction, and hiring.
30-min intro + deep dive on prior agent work (we'll ask about failure modes, evals, and production learnings)
Technical session: design an agent system for a real workflow (with tools, memory, guardrails, and evals)
Practical take-home or pair session (small scope, production-minded)
Founder chat + Q&A
Skills Required
- Shipped AI agents into production (internal or external)
- Built meaningful open-source contributions in agent frameworks, eval tooling, RAG/memory tooling, or browser automation
- LLM tool-use, structured outputs, function calling, and multi-step workflow design
- Context engineering: retrieval strategies, chunking, reranking, summarization, and memory write/read policies
- Systems thinking: state machines, retries, idempotency, failure-mode handling, and operational reliability
- Integrations experience: OAuth, token refresh, scopes, pagination, incremental sync, webhooks, and rate limiting
- Sandboxed execution and secure-by-default infrastructure (containers, ephemeral environments, secret management)
- Observability and explainability: traces, metrics, logs, audit trails, and human-readable run traces
- Evaluation approaches for non-deterministic systems: confidence scoring, regression testing, and eval harnesses
- Built AI-native UI surfaces (agent run views, trace explorers, approval UIs)
- Experience with enterprise requirements: SOC2 posture, auditability, access controls, tenant isolation
- Ability to move between research prototyping and production-grade engineering without getting stuck
What We Do
Digitizing the process of vehicle titling between state government, insurance carriers, financial institutions, auto dealers, and consumers





