AI Engineer - Core

Posted 23 Days Ago
San Francisco, CA, USA
Hybrid
Mid level
Artificial Intelligence • Information Technology • Machine Learning • Software
The Role
Design, build, and maintain production-grade AI systems and agent-based workflows; own end-to-end pipelines from experimentation to deployment and monitoring; build evaluation pipelines; collaborate with product, data, and GTM; iterate quickly under ambiguity to shape the AI stack.
Summary Generated by Built In
Hilbert is building a reasoning engine that must navigate non-deterministic user behavior across data silos — turning months-long decision cycles into minutes. Fully agentic by design, our demand intelligence platform doesn't just call APIs; it solves the hard problem of orchestrating multi-step inference over messy, high-stakes enterprise data where deterministic answers don't exist.

From Fortune 500 enterprises to beloved brands like FreshDirect, Blank Street, and Levain Bakery, operators run their growth on Hilbert. We're also co-building alongside leading AI companies.

We're looking for an AI Engineer who can build production-grade AI systems end-to-end — from prototype to pipeline to product — with the ownership and urgency of a startup culture.

This is not a "wire up a prompt chain and move on" role. You'll own core pieces of the AI stack that power Hilbert's demand intelligence platform — designing agent architectures, building evaluation systems, and making hard tradeoffs between accuracy, latency, and cost in production. You'll ship fast in conditions where the spec is evolving, and communicate what you're building (and why) with clarity to the rest of the team. If you think in systems, have opinions about how agentic workflows should actually work, and want to build AI products that drive real enterprise outcomes, we want to meet you.

THE ROLE

You'll work directly with the founding team and across product, data, and GTM to design, build, and improve the AI systems at the heart of Hilbert. The environment is high-autonomy and high-ambiguity — the nature of building AI-native products means requirements shift, approaches evolve, and the person closest to the problem often makes the call.

What you'll do:
  • Design, build, and maintain AI-driven features and pipelines that serve enterprise customers at scale

  • Architect and implement agent-based workflows using LangChain, LangGraph, or equivalent orchestration frameworks

  • Own systems end-to-end — from experimentation through production deployment and monitoring

  • Build and improve evaluation pipelines to measure, validate, and iterate on AI system performance

  • Collaborate closely with the founding team and cross-functional partners — communicating tradeoffs, progress, and technical decisions with clarity

  • Make pragmatic engineering decisions under ambiguity — ship, learn, iterate

  • Shape the technical direction of the AI stack as the company scales

Our Current Hurdles

These are the kinds of problems you'll walk into on day one:

  • Intelligent retrieval across heterogeneous approaches — our agents need the right information at exactly the right moment. The challenge isn't picking one retrieval method; it's combining RAG, graph-based retrieval, and other approaches into a unified strategy that fetches the most relevant content precisely when the agent needs it — no more, no less.

  • Agentic workflows that solve real-world problems — it's building workflows robust enough to handle the unexpected. When an agent hits an edge case, missing data, or a situation it wasn't explicitly designed for, it needs to reason through it — leveraging available context, escalating to a human when it can't, and never silently failing.

  • Evaluation beyond vibes — we need systematic, reproducible evals that actually predict real-world performance. If you've built custom evaluators for RAG or agent workflows, we want to talk.

  • Execution and real-world integration — an agent that only surfaces insights isn't enough. We're building systems where agents take action — integrating with external platforms, executing workflows, and doing real work with the information they have, combined with human-in-the-loop checkpoints that keep enterprise trust intact.

WHO THRIVES IN THIS ROLE

We care about how you think and how you ship - not how many years are on your resume.

The profile:
  • You're a strong Software engineer. Your code is clean, testable, and production-ready.

  • You have real experience with LangChain, LangGraph, or equivalent agent/orchestration frameworks. You've built with them, hit their limits, and worked around them - not just followed tutorials

  • You communicate with clarity and conviction. You can explain a technical decision to a non-technical founder and debate architecture tradeoffs with a senior engineer . Communication is not a nice-to-have here - it's core to the role

  • You take ownership. You don't wait for tickets. You see what needs to be built, raise your hand, and ship it

  • You thrive in ambiguity. AI products evolve fast. Requirements change. You're energized by figuring it out.

  • You move at startup speed. You understand what it means to be available, responsive, and biased toward action in a fast-moving, early-stage environment

Strong pluses:
  • Experience building evals pipelines — designing metrics, running systematic evaluations, and using results to drive iteration on AI systems

  • Backend software engineering experience — building APIs, services, data infrastructure, or production systems

  • Exposure to retrieval-augmented generation (RAG), vector databases, or LLM-powered search and recommendation systems

  • Experience at early-stage startups or high-growth environments where you wore multiple hats

You might be:

A backend engineer who went deep on LLMs and never looked back. An ML engineer who realized they love building products, not just models. A startup CTO who wants to go deep on AI at a company where the stack is the product. Someone who's been hacking on agents and pipelines nights and weekends and wants to do it full-time with real enterprise stakes. What matters: you ship, you own it, and you communicate like a teammate — not a silo.

LocationSan Francisco, with occasional travel for team meets, offsites or customer engagements.CompensationCompetitive salary + equity package, commensurate with experience. Performance-based bonuses tied to project milestones and customer impact.The Hiring JourneyShort form → Intro call → Technical working session → Team conversations → Offer

Top Skills

Agent Orchestration Frameworks
APIs
Data Infrastructure
Evaluation Pipelines
Langchain
Langgraph
Llms
Python
Rag
Services
Vector Databases
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
20 Employees

What We Do

Hilbert is a scalable, data science-first growth engine that gives B2C teams predictive clarity into user behavior, revenue drivers, and the actions that drive sustainable growth. Fully agentic by design, Hilbert shrinks months-long decision cycles to minutes. One Engine. Four Intelligence Layers. Built for Momentum. Hilbert turns fragmented data into clear answers and decisive actions using a proprietary architecture of AI/ML algorithms accessible through natural language. No code, no dashboards, no guesswork.

Similar Jobs

Capital One Logo Capital One

Artificial Intelligence Engineer

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
5 Locations
55000 Employees
230K-286K Annually

Navan Logo Navan

Senior Software Engineer

Fintech • Information Technology • Payments • Productivity • Software • Travel • Automation
Easy Apply
Hybrid
2 Locations
3300 Employees
113K-252K Annually

Capital One Logo Capital One

Artificial Intelligence Engineer

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
4 Locations
55000 Employees
230K-286K Annually

Capital One Logo Capital One

Artificial Intelligence Engineer

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
4 Locations
55000 Employees
197K-246K Annually

Similar Companies Hiring

Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account