WayStation Jobs

Data Engineer

WayStation

Data Engineer

Reposted 10 Days Ago

Redwood City, CA, USA

In-Office

Senior level

Artificial Intelligence • Productivity • Software • Generative AI

The Role

Own and build the end-to-end data layer: extraction pipelines from messy supplier emails and PDFs, the unified data model, reliable observable pipelines, and ML-driven extraction evaluation. Drive extraction accuracy, data quality, lineage, monitoring, and automation to scale coverage and enable engineering and product to depend on the data layer.

Summary Generated by Built In

Data Engineer

The owner of the data layer the entire product is built on, from raw supplier email to structured system of record.

Location: Redwood City, CA (in-person, 5 days/week)

Experience: 8+ years building and scaling production data systems — pipelines, schema and data modeling, migrations and backfills, and the databases underneath — that turn messy, unstructured input into reliable structured data. You built these from scratch and chose the tooling, not inherited a configured stack. Hands-on early-stage startup experience required.

Company: Waystation AI

About Waystation AI

Waystation is building the operating system for procurement in consumer packaged goods (CPG).

Today, ingredient and packaging sourcing still runs through inboxes, PDFs, and spreadsheets. It's slow, opaque, and costly. Waystation replaces that chaos with an AI-powered procurement platform that creates structure, visibility, and leverage — without forcing suppliers into portals.

The result: real ROI. One customer saved over $200,000 in the first three months, paying for their annual contract in the first 30 days.

Waystation is led by repeat founder Ryan Caldbeck (previously founded CircleUp) and backed by Founder Collective, Homebrew, Slow Ventures, 87 Capital, Floodgate, and SuccessVP. We have paying customers, real usage, and a product that works.

The Role

Structured data isn't a feature of our product — it is the product. We take the messiest input imaginable (thousands of disconnected supplier emails and PDFs — specs, COAs, pricing, certs) and turn it into a clean, queryable system of record shared across procurement, QA, and R&D. The hard part is the whole stack underneath: a data model and pipelines that stay reliable as the schema evolves, migrations and backfills that don't lose or corrupt data, and resolving inconsistent, multi-language input into one record every downstream workflow can trust.

You own that layer end to end. The pipelines, the data model, the database, the infrastructure the rest of engineering builds on — it's yours, not a slice of it. The quality of what every user sees, what every workflow reads, and what every customer ROI claim rests on flows through what you build. No one will hold your hand. You'll move fast and ship scrappy — a rough system working today beats a perfect one next quarter.

A note on the title: we call this a Founding Engineer role because the scope is bigger than any single discipline. Your spike is data engineering — pipelines, schema, and the data model — but you own the whole layer, not a slice.

What You'll Do

Own the pipelines. Turn messy supplier emails and documents into structured, validated data and own the ingestion, orchestration, reliability, and reprocessing behind them.
Own the data model and migrations. Design and evolve the model that unifies suppliers, documents, RFPs, pricing, and certifications into one source of truth — with schema changes, migrations, and backfills that don't break the app or lose data, so every email compounds into institutional memory.
Keep the database fast. Schema design, query performance, and efficiency — reliable as catalog and record counts climb.
Scale for throughput. Capacity planning, queue and rate-limit management, and large historical ingests, so growth doesn't degrade the product.
Resolve the unstructured. Normalize and entity-resolve inconsistent, conflicting, multi-source inputs into one trustworthy record the application and workflows build on.
Own data quality and observability. Ship reliable, observable pipelines with the validation, lineage, and monitoring that catch bad or missing data automatically — before a customer sees it — and turn reported errors into systematic fixes, not one-off patches.

What We're Looking For

We'll back the right engineer over the right résumé. We care about a defined edge, depth, and ownership — not polish.

You're a strong fit if you:

Communicate with precision. Strong written and verbal communication, clear and concise. In a company this size you'll work closely across functions and with customers, so you can explain a pipeline decision to an engineer, a customer, or the CEO in plain language and adjust the level to your audience.
Built data systems from zero at a startup — required. You've designed and evolved schemas, built ingestion-to-serving pipelines from scratch, managed and scaled databases, and owned migrations and backfills — and you made the architecture and tooling calls, including when not to reach for heavy tooling (a warehouse, dbt, a dedicated orchestrator) before it earns its keep. 8+ years, at a seed or Series A company with no playbook and no infrastructure handed to you — a purely big-company background isn't a fit. Big plus if you've evolved a schema and pipeline as a company scaled, even imperfectly.
Are a generalist with a spike. There's a thing you're genuinely better at than almost anyone — data systems: pipelines, schema, and the data model — and you can name it and point to results that prove it. But you're not precious about staying in that lane. You'll write the backend code, stand up the infra, and do the unglamorous work, because owning the whole problem means all of it.
Own whole problems. You take messy things start to finish and close them without being asked. When the data is wrong, you fix the system, not the symptom.
Build leverage. You reach for tools, automation, and agents to scale yourself instead of grinding manually. We live in Claude Code — you should want to, too.
Are all in. This is a rocket ship you want to plant a flag on and ride through the messy middle — not a stepping stone. We're betting on you; we need you betting on us.
Have grit. You've ground at something hard for a long time, through the part where it stopped being fun and the feedback loop ran far longer than your next review. You don't flinch when the work gets ugly.

Bonus: regulated, document-heavy domains; CPG, supply chain, or procurement; multi-language data (Chinese, Spanish); scaling a data platform as a company grew; data-intensive startups; search/retrieval.

What Success Looks Like

You'll ramp fast and gear toward a scorecard built on these measures:

Data quality & reliability. Uptime the product can depend on, and bad or missing data flagged automatically before a customer ever sees it — measured, not asserted.
A foundation that scales. Schema and pipelines absorb more volume, formats, and customers without re-architecture; migrations and backfills ship safely.
Coverage of the long tail. More supplier formats and document types handled cleanly. The set of things that break the pipeline keeps shrinking.
Leverage for the team. The data layer becomes something the rest of engineering builds on without thinking about it.

Values

We are reliable, credible, and authentic
We are solution-oriented
We are proud of our work, our customers, and ourselves

What We Offer

Competitive base salary + meaningful equity — real ownership, with upside tied to the outcomes you drive
Ownership of the whole data layer, working directly with a repeat founder & CEO — a front-row seat to how an AI-native company gets built.
A real product with real ROI — value you can measure
Full health, dental, and vision coverage
Unlimited vacation — we care about outcomes, not hours
An in-person team that values craft and ambition

How to Apply

Don't send a cover letter. Send two things:

A hard system you owned. One pipeline or data problem, taken start to finish — what was true before, what you built, what was true after.
A gnarly data problem you solved. A schema you evolved, a migration or backfill you got right, a scaling or performance fix, or a pipeline you re-architected — what was breaking, and how you fixed it.

Short is fine. We're reading for ownership and judgment, not polish.

Skills Required

8+ years building production data systems
Hands-on early-stage startup experience (seed or Series A)
Deep experience with Python
Deep experience with SQL
Experience with modern data tooling and shipping reliable data pipelines
Ownership of end-to-end data systems, models, and infrastructure
Experience with document extraction, ML, or NLP pipelines
Experience with regulated document-heavy domains, CPG, supply chain, or multi-language data

View all jobs at WayStation

View WayStation Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

What We Do

WayStation provides a no-code, secure integration hub that connects AI assistants, such as ChatGPT and Claude, with the productivity tools professionals use daily, including Notion, Monday, and Airtable. The platform empowers large language models (LLMs) to perform real-world actions, such as managing tasks and updating databases, effectively bridging the gap between AI agents and a user's daily business applications.