Data Engineer

Posted 2 Days Ago
Redwood City, CA, USA
In-Office
Senior level
Artificial Intelligence • Productivity • Software • Generative AI
The Role
Own and build the end-to-end data layer: extraction pipelines from messy supplier emails and PDFs, the unified data model, reliable observable pipelines, and ML-driven extraction evaluation. Drive extraction accuracy, data quality, lineage, monitoring, and automation to scale coverage and enable engineering and product to depend on the data layer.
Summary Generated by Built In
Data Engineer

The owner of the data layer the entire product is built on — from raw supplier email to structured system of record.

Location: Redwood City, CA (In-person, 5 days/week)

Experience: 8+ years building production data systems, including hands-on early-stage startup experience (required); document extraction / ML / NLP pipelines a strong plus

Company: Waystation AI

About Waystation AI

Waystation is building the operating system for procurement in consumer packaged goods (CPG).

Today, ingredient and packaging sourcing still runs through inboxes, PDFs, and spreadsheets. It's slow, opaque, and costly. Waystation replaces that chaos with an AI-powered procurement platform that creates structure, visibility, and leverage — without forcing suppliers into portals.

The result: real ROI. One customer saved over $200,000 in the first three months, paying for their annual contract in the first 30 days.

Waystation is led by repeat founder Ryan Caldbeck (previously founded CircleUp) and backed by Founder Collective, Homebrew, Slow Ventures, 87 Capital, Floodgate, and SuccessVP. We have paying customers, real usage, and a product that works.

The Role

Structured data isn't a feature of our product — it is the product. We take the messiest input imaginable (hundreds of thousands of disconnected supplier emails and PDFs — specs, COAs, pricing, certs) and turn it into a clean, queryable system of record shared across procurement, QA, and R&D.

You own that layer end to end. The extraction pipeline, the data model, the infrastructure the rest of engineering builds on — it's yours, not a slice of it. The quality of what every user sees, what every model trains on, and what every customer ROI claim rests on flows through what you build. No one will hold your hand. You'll have unusual access and unusual scope, and you'll be expected to use both. You'll move fast and ship scrappy — a rough system working today beats a perfect one next quarter. We don't have the resources to gold-plate, and neither do you.

What You'll Do
  • Own the extraction pipeline. Turn messy supplier emails and documents — specs, COAs, pricing, certs, multi-language, bad scans — into structured, validated data.

  • Push accuracy and prove it. Drive extraction past today's 85%+ and build the eval harness that measures it, per document type, so the number is real and not a vibe.

  • Own the data model. Unify suppliers, documents, RFPs, pricing, and certifications into one source of truth — and build for institutional memory, so every email compounds into leverage.

  • Build infrastructure others depend on. Ship reliable, observable pipelines and own data quality, lineage, and the monitoring that catches problems before customers do.

  • Treat extraction as an ML problem. Eval sets, regression testing, accuracy tracking over time — turn customer-reported errors into systematic improvements, not one-off patches.

  • Build leverage. Reach for models and agents first. Automate the long tail instead of grinding it.

What We're Looking For

We'll back the right engineer over the right résumé. We care about a defined edge, depth, and ownership — not polish.

You're a strong fit if you:

  • Have built in the chaos — required. You've done real work at an early-stage startup (seed or Series A), where there was no playbook, no infrastructure handed to you, and never enough hours. You know the difference between building from zero and maintaining someone else's system. A purely big-company background isn't a fit for this seat.

  • Move fast and stay scrappy. You ship, learn, and iterate in the open rather than polishing in private. Constraints — fewer people, less tooling, no time — energize you instead of stalling you. You find the version that works now and earn the polish later.

  • Have one superpower. There's a thing you're genuinely better at than almost anyone — data systems, extraction, ML pipelines — and you can name it and point to results that prove it. A sharp edge and the slope to outgrow the job, not evenly good at everything.

  • Have real depth. 8+ years building production data systems. Deep with Python, SQL, and modern data tooling. You can architect a system as easily as you can ship a fix — and you do both at startup speed.

  • Own whole problems. You take messy things start to finish and close them without being asked. When the data is wrong, you fix the system, not the symptom.

  • Build leverage. You reach for tools, automation, and agents to scale yourself instead of grinding manually. We live in Claude Code — you should want to, too.

  • Are all in. This is a rocket ship you want to plant a flag on and ride through the messy middle — not a stepping stone. We're betting on you; we need you betting on us.

  • Have grit. You've ground at something hard for a long time, through the part where it stopped being fun and the feedback loop ran far longer than your next review. You don't flinch when the work gets ugly.

Bonus: document extraction, NLP, or ML pipelines; regulated document-heavy domains; CPG, supply chain, or procurement; multi-language data (Chinese, Spanish).

What Success Looks Like

You'll ramp fast and gear toward a scorecard built on four measures:

  • Extraction accuracy. A measurable climb past existing accuracy (precision & recall) across document types — proven by the evals you built, not asserted.

  • Pipeline reliability. Data-quality and uptime the product can depend on. Bad or missing data gets flagged automatically, before a customer ever sees it.

  • Coverage of the long tail. More supplier formats and document types handled cleanly. The set of things that break the pipeline keeps shrinking.

  • Leverage for the team. The data layer becomes something the rest of engineering builds on without thinking about it.

Values
  • We are reliable, credible, and authentic

  • We are solution-oriented

  • We are proud of our work, our customers, and ourselves

What We Offer
  • Competitive base salary + meaningful equity — real ownership, with upside tied to the outcomes you drive

  • Ownership of the data layer the entire product is built on, working directly with a repeat founder & CEO — a front-row seat to how an AI-native company gets built

  • A real product with real ROI — value you can measure

  • Full health, dental, and vision coverage

  • Unlimited vacation — we care about outcomes, not hours

  • An in-person team that values craft and ambition

How to Apply

Don't send a cover letter. Send two things:

  • A hard system you owned. One pipeline or data problem, taken start to finish — what was true before, what you built, what was true after.

  • Something you automated or built with AI. An eval harness, an agent, a workflow that scaled you — anywhere you replaced manual work with a system.

Short is fine. We're reading for ownership and judgment, not polish.

Skills Required

  • 8+ years building production data systems
  • Hands-on early-stage startup experience (seed or Series A)
  • Deep experience with Python
  • Deep experience with SQL
  • Experience with modern data tooling and shipping reliable data pipelines
  • Ownership of end-to-end data systems, models, and infrastructure
  • Experience with document extraction, ML, or NLP pipelines
  • Experience with regulated document-heavy domains, CPG, supply chain, or multi-language data
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
0 Employees

What We Do

WayStation provides a no-code, secure integration hub that connects AI assistants, such as ChatGPT and Claude, with the productivity tools professionals use daily, including Notion, Monday, and Airtable. The platform empowers large language models (LLMs) to perform real-world actions, such as managing tasks and updating databases, effectively bridging the gap between AI agents and a user's daily business applications.

Similar Jobs

General Motors Logo General Motors

Data Engineer

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
3 Locations
165000 Employees
139K-207K Annually

PwC Logo PwC

Data Engineer

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Remote or Hybrid
42 Locations
370000 Employees
77K-202K Annually

GC AI Logo GC AI

Data Engineer

Artificial Intelligence • Legal Tech
In-Office or Remote
San Mateo, CA, USA
100 Employees
165K-350K Annually

PwC Logo PwC

Data Engineer

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Remote or Hybrid
34 Locations
370000 Employees
77K-202K Annually

Similar Companies Hiring

Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
LTX Thumbnail
Conversational AI • Generative AI
Jerusalem, Israel
360 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account