Guild.ai

AI Engineer, Production Agents

Reposted 22 Days Ago

San Francisco, CA, USA

Hybrid

140K-320K Annually

Mid level

Artificial Intelligence • Software • Automation

The Role

As an AI Engineer, you will design, implement, and ship production agents on an AI platform, integrating with real developer environments and ensuring reliability, safety, and observability. You'll drive the engineering practice by creating best practices and collaborating with product and ML teams.

Summary Generated by Built In

We’re looking for a founding engineer focused on building production agents—someone who will push our platform to its limits by creating some of the first real-world agents that developers rely on.

The Opportunity: Build the First Production Agents on a New AI Platform

This isn’t a typical “apply-the-API” role. As the engineer responsible for our first production agents, you’ll be at the intersection of product, infrastructure, and AI. You’ll take our platform primitives and turn them into concrete agents that ship to real customers—driving both what the platform is today and what it becomes next.

We’re tackling one of the hardest—and most important—problems in software engineering: helping developers understand, evolve, and operate complex systems using autonomous and event-driven AI. Your agents will be among the first proof points that this new way of building is not only possible, but better.

If you’re excited by 0→1 systems, love owning features end-to-end, and want to work hands-on with LLMs and agents in real production environments, this role is for you.

What You Will Do

Build the First Production Agents: Design, implement, and ship some of the earliest agents built on the Guild.ai platform—agents that developers will use to understand, debug, and evolve complex software systems.
Push the Platform by Using It: Act as both power user and core contributor: use our platform to build agents, then feed your experience back into the platform’s APIs, abstractions, and UX.
Own Agent Workflows End-to-End: Take agents from idea → prototype → production: task scoping, architecture, prompts, tools, integrations, logging, and iteration based on real-world behavior.
Integrate Deeply with Real Developer Environments: Connect agents to source control, CI/CD, observability, and other components of modern engineering stacks so they can operate on real code and real systems.
Make Agents Reliable, Safe, and Observable: Implement guardrails, monitoring, and debugging tooling so we can understand what agents are doing, why they’re doing it, and how to improve them.
Collaborate Closely with Product & Evaluation: Work with PMs and evaluation/ML teammates to define agent behaviors, success metrics, and iteration loops. Use evaluation harnesses and telemetry to guide improvements.
Shape the Agent Engineering Practice at Guild.ai: Help define patterns, libraries, and best practices for building agents on our platform—to be used by future engineers, customers, and partners.

What You Will Bring

Strong software engineering background and experience owning complex features or systems end-to-end.
Hands-on experience building with LLMs (e.g., prompting, tool calling, function calling, RAG, workflows) in a production or high-stakes environment.
Proficiency in Python and comfort with TypeScript or modern web/backend stacks
Ability to design and reason about distributed or event-driven systems, APIs, and integrations.
A practical mindset around reliability: logging, observability, debugging, and iterative hardening of systems in production.
Comfort operating in a high-ambiguity, high-ownership startup environment.
Clear communication and a strong product sense—you care that what you build solves real problems for engineers.

Bonus Points

Experience building agentic systems (tool-using agents, workflow engines, multi-step or multi-agent setups).
Familiarity with developer tools, infrastructure, observability, or platform products.
Experience integrating with Git-based workflows, CI/CD, cloud services, or internal tooling used by engineering teams.
Prior work with evaluation or monitoring of LLM-based systems in production.
Experience at an early-stage startup or in a role where you were the primary builder for a new product area.

Benefits & Perks

Significant equity in an early-stage, venture-backed startup
Comprehensive Health Benefits (Medical, Dental, Vision)
Flexible PTO to ensure you have the time you need to recharge

Skills Required

Strong software engineering background
Hands-on experience with LLMs in production
Proficiency in Python
Comfort with TypeScript or modern web/backend stacks
Ability to design distributed or event-driven systems
Comfort operating in a high-ambiguity startup environment

View all jobs at Guild.ai

View Guild.ai Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: San Francisco, California

25 Employees

Year Founded: 2025

What We Do

Guild turns agents into shared production infrastructure, with a managed software center for trusted agent capabilities, and an agent hub for discovering and sharing agents. For Enterprises. AI, Trusted in Production Autonomous software requires the same guardrails as any production system. Guild enforces centralized identity, least-privilege access, and immutable audit logging so enterprise governance extends to AI agents. Agents can act on code, tickets, and operational workflows without bypassing identity controls or becoming a black box. For Developers. AI, Built Like Real Software Guild gives developers the primitives they expect: typed interfaces, versioned releases, safe execution boundaries, and full execution traces, so agents behave like systems, not scripts. The Agent Hub is a public GitHub-like platform for broad discovery and reuse of agents, allowing developers to build agents like real software and ship them as products. One Platform. Any Model Universal by design. Guild is neutral toward models, vendors, and frameworks, doesn’t lock governance into a single stack, and works with Anthropic, OpenAI, Google, and open-source models. Companies can run agents via chat, APIs, webhooks, and schedules, as well as publish trusted capabilities to version, reuse, and improve - so teams don't start from zero. Access can be controlled centrally, and usage tracked by workspace, user, agent, and trigger.