Hilbert's AI

ML Engineer / Data Scientist - Core

Reposted 17 Days Ago

San Francisco, CA, USA

Hybrid

Mid level

Artificial Intelligence • Information Technology • Machine Learning • Software

The Role

Own end-to-end data science and ML work for B2C customers: build recommendation, forecasting, segmentation, and activation models; design A/B tests and causal analyses; create configurable multi-tenant model architectures; extract signal from sparse/noisy data; partner with engineering to productionize models and deliver analyses that change business decisions.

Summary Generated by Built In

Hilbert is building the ML systems that power demand intelligence for the world's largest consumer companies — recommendation engines, demand forecasting, customer lifecycle models, and activation systems that must work across wildly different retailers, data environments, and business contexts. This isn't single-tenant model building; it's designing configurable, production-grade ML systems that generalize across Fortune 500 enterprises and beloved consumer brands alike.

We're looking for an ML Engineer who understands B2C business problems deeply, builds models and pipelines that work with real-world data, and delivers systems that drive real growth outcomes — all with the ownership and urgency of a startup.

This is not a "receive a ticket, train a model, hand off a notebook" role. You'll own problems end-to-end — from framing through modeling through production deployment through impact — for enterprise customers where the stakes are real and the feedback loop is tight. If you understand why churn analysis matters differently for a grocery retailer versus a fashion marketplace, can build a recommendation system that works with sparse data and runs reliably in production, and can walk a customer through your causal analysis with clarity and conviction, we want to meet you.

Why Hilbert AI

Hilbert is building the demand intelligence platform used by world-class B2C leaders — including the world's largest retailer — to unlock compounding growth outcomes. We sit at the intersection of AI, data, and commercial activation for retail and e-commerce.

We're scaling fast with top-tier investors behind us. ML systems are the engine behind what we deliver to customers — which means every model you build, every pipeline you ship, every system you contribute to has direct, measurable impact on enterprise revenue. We're a small, talent-dense, low-ego team. We value ownership, speed, intellectual honesty, and shipping real impact.

The Role

You'll work directly with the founding team and alongside engineering, product, and GTM to build and improve the ML systems at the heart of Hilbert. You'll be hands-on every day — building models, designing pipelines, running experiments, interrogating data, and shipping to production. B2C is our world. The problems we solve — demand prediction, customer lifecycle, personalization, activation — require someone who understands these domains and can translate business context into modeling and engineering decisions. The environment is high-autonomy and high-ambiguity. Data is often messy, incomplete, or limited. You thrive in exactly those conditions.

Our Current Hurdles

These are the kinds of problems you'll be working on from day one.

Multi-tenant ML systems that actually generalize — we serve enterprises with fundamentally different data shapes, catalog sizes, customer behaviors, and business constraints. The challenge is contributing to model architectures and pipelines that are configurable and adaptive across customers — not rebuilding bespoke systems for every account. You'll work on the abstractions that make this possible.
Extracting real signal from messy, limited data — enterprise data is never clean and rarely complete. Cold-start problems, sparse interaction histories, inconsistent taxonomies, missing features — this is the norm, not the exception. You'll need to make pragmatic modeling choices that produce real value when the data fights back.
Connecting model outputs to business actions — a recommendation score or a demand forecast is worthless if it doesn't change what an operator actually does. The challenge is closing the loop between ML outputs and real commercial decisions — activation, merchandising, retention — in a way that's measurable and defensible.
Causal rigor in a world that wants quick answers — enterprise customers want to know why something is happening, not just what. The challenge is applying causal inference in a way that's rigorous but practical — knowing when an A/B test is sufficient, when you need difference-in-differences or synthetic controls, and when the honest answer is "we can't know yet."

What you'll do:

Build and deploy ML models and pipelines that power core product capabilities: recommendation systems, search relevance, customer segmentation, demand forecasting, and activation optimization
Contribute to configurable, multi-tenant model architectures that adapt across different customer contexts, data availability, and business requirements — not bespoke rebuilds for every use case
Own your models through to production — working with engineering on serving, monitoring, and reliability, not just handing off prototypes
Create meaningful models with the data that's actually available — not the data you wish you had. You extract signal from limited, noisy, or sparse datasets and reach for the right level of complexity
Design and run rigorous A/B tests — including understanding when A/B testing is insufficient and causal inference methods are required
Apply causal reasoning rigorously — you know the difference between correlation and causation, you surface true drivers, and you flag when others confuse the two
Deliver analyses that drive decisions — you connect model outputs to business outcomes and communicate them with clarity to founders, teammates, and customers
Think in systems. You don't build isolated models — you understand how recommendation, segmentation, scoring, and activation interact with each other and design your work to fit within the broader system
Move fast — prototype, validate, ship, iterate. You're comfortable with imperfect information and evolving requirements

Who You Are

We care about how you think about problems, how you connect models to business impact, and how you operate when things are ambiguous.

The profile:

You're an ML engineer who ships to production. You write clean, testable Python. You care about how your models are served, monitored, and maintained — not just how they perform offline. Your work doesn't end at a notebook; it ends when the system is running and delivering value.
You're a product-minded ML engineer. You understand that a model with great offline metrics is useless if it doesn't move the customer's business. You frame every modeling decision in terms of the outcome it enables — and you push back when a technically elegant approach doesn't serve the actual problem.
You have strong B2C business knowledge. You understand the problems consumer businesses actually face — customer acquisition vs. retention economics, lifecycle dynamics, basket composition, churn drivers, promotional cannibalization, channel attribution, demand elasticity. This knowledge informs how you frame problems and design models.
You're a systems thinker. You understand how models, data flows, customer behavior, and business outcomes connect. You don't optimize one metric in a vacuum — you consider second-order effects and how your work fits the bigger picture.
You've built recommendation, search, and/or customer-based ML systems — collaborative filtering, content-based methods, ranking systems, segmentation, propensity modeling. You understand when each applies and why.
You know how to build for configurability. You've worked on or contributed to model architectures and pipelines that flex across multiple customers, segments, or contexts — not rigid, single-purpose implementations.
You create value from limited data. You make pragmatic modeling choices when data is sparse, noisy, or cold-start. You know when a simpler approach beats a complex one and aren't seduced by unnecessary sophistication.
You're rigorous about causality. You understand causal inference methods — difference-in-differences, instrumental variables, propensity scoring, synthetic controls — and know when to apply them. You design A/B tests properly and understand their limitations.
You communicate with clarity and conviction. You can present an analysis to a non-technical audience and make it land. You can write a one-pager that changes a decision. You explain your reasoning, not just your results. Communication is not a nice-to-have here — it's core to the role.
You take ownership. You don't wait for someone to define the problem perfectly. You dig in, frame it, propose an approach, and ship it. If something breaks or underperforms, you treat it as your problem.
You thrive in ambiguity. Problem definitions shift. Data availability surprises you. Requirements evolve. You're energized by figuring it out, not paralyzed by incomplete specs.
You move at startup speed. You understand what it means to be available, responsive, and biased toward action in a fast-moving, early-stage environment.

Strong pluses:

Experience with ML infrastructure — feature stores, model serving, orchestration, monitoring, or retraining pipelines
Experience with experimentation platforms and A/B testing infrastructure
Exposure to retail, e-commerce, CPG, or marketplace data environments
Experience at early-stage startups or high-growth companies where you wore multiple hats
Experience taking models from prototype to production deployment and owning them in production
Background in economics, econometrics, or quantitative social science that informs your causal thinking

You might be:

An ML engineer at a B2C company who wants higher stakes, more ownership, and less bureaucracy. A data scientist who's moved toward engineering because you got tired of handing off notebooks and never seeing them in production. Someone who came up through economics or quantitative research and moved into applied ML because you wanted to see models drive real decisions — and real systems. An engineer at a larger company who's frustrated by slow cycles and wants to see your work hit production and move the needle — this week, not next quarter. An early-career ML engineer with disproportionate output who's ready to punch above their title. What matters: you understand the business, you build models that work with real data in real production environments, you think in systems, and you communicate impact — not just methodology.

Location

San Francisco, US

Compensation

Competitive salary + equity. Compensation details and structure shared in next steps.

The Hiring Journey

Short form → Intro call → Practical working session → Team conversations → Offer

Fast, human, no bureaucracy.

Skills Required

Strong B2C business knowledge (acquisition vs retention, lifecycle, churn, promotions, demand elasticity)
Experience building recommendation, search relevance, or customer-based ML models (collaborative filtering, ranking, segmentation, propensity modeling)
Ability to design and run rigorous A/B tests and apply causal inference methods (difference-in-differences, IV, propensity scoring, synthetic controls)
Experience creating configurable, multi-tenant model architectures and pipelines (not bespoke per customer)
Proven ability to extract signal and create pragmatic models from limited, noisy, or sparse data
Strong communication skills to present analyses to non-technical audiences and drive decisions
Demonstrated ownership mindset and ability to operate in high-autonomy, ambiguous startup environments
Experience collaborating with engineering to productionize models
Strong Python proficiency (production-quality code)
Experience with experimentation platforms and A/B testing infrastructure
Familiarity with modern data and ML infrastructure (feature stores, orchestration, model serving, monitoring)
Exposure to retail, e-commerce, CPG, or marketplace data environments
Experience at early-stage startups or high-growth companies
Background in economics, econometrics, or quantitative social science

View all jobs at Hilbert's AI

View Hilbert's AI Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.