Data Engineer (Founding Team)

Posted Yesterday
7 Locations
In-Office or Remote
Senior level
Artificial Intelligence • Software • Industrial • Manufacturing
The Role
Build and operate scalable data ingestion, transformation, and connector frameworks; design and maintain a knowledge-graph-based data fabric; normalize and vectorize enterprise data for LLM/AI workflows; implement governance, lineage, access controls, and secure APIs to serve ML/agent pipelines.
Summary Generated by Built In

Data/ETL Engineer (Founding Team)

Location: San Francisco Bay Area

Type: Full-Time

Compensation: Competitive salary + early-stage equity

Backed by 8VC, we're building a world-class team to tackle one of the industry’s most critical infrastructure problems.

About the Role

We’re building a multi-tenant, AI-native platform where enterprise data becomes actionable through semantic enrichment, intelligent agents, and governed interoperability. At the heart of this architecture lies our Data Fabric — an intelligent, governed layer that turns fragmented and siloed data into a connected ontology ready for model training, vector search, and insight-to-action workflows.

We're looking for engineers who enjoy hard data problems at scale: messy unstructured data, schema drift, multi-source joins, security models, and AI-ready semantic enrichment. You’ll build the backend systems, data pipelines, connector frameworks, and graph-based knowledge models that fuel agentic applications.

If you've worked on streaming unstructured pipelines, built connectors into ugly legacy systems, or mapped knowledge graphs that scale — this role will feel like home.

Responsibilities
  • Build highly reliable, scalable data ingestion and transformation pipelines across structured, semi-structured, and unstructured data sources

  • Develop and maintain a connector framework for ingesting from enterprise systems (ERPs, PLMs, CRMs, legacy data stores, email, Excel, docs, etc.)

  • Design and maintain the data fabric layer — including a knowledge graph (Neo4j or Puppygraph) enriched with ontologies, metadata, and relationships

  • Normalize and vectorize data for downstream AI/LLM workflows — enabling retrieval-augmented generation (RAG), summarization, and alerting

  • Create and manage data contracts, access layers, lineage, and governance mechanisms

  • Build and expose secure APIs for downstream services, agents, and users to query enriched semantic data

  • Collaborate with ML/LLM teams to feed high-quality enterprise data into model training and tuning pipelines

What We’re Looking For

Core Experience:

  • 5+ years building large-scale data infrastructure in production environments

  • Deep experience with ingestion frameworks (Kafka, Airbyte, Meltano, Fivetran) and data pipeline orchestration (Airflow, Dagster, Prefect)

  • Comfortable processing unstructured data formats: PDFs, Excel, emails, logs, CSVs, web APIs

  • Experience working with columnar stores, object storage, and lakehouse formats (Iceberg, Delta, Parquet)

  • Strong background in knowledge graphs or semantic modeling (e.g. Neo4j, RDF, Gremlin, Puppygraph)

  • Familiarity with GraphQL, RESTful APIs, and designing developer-friendly data access layers

  • Experience implementing data governance: RBAC, ABAC, data contracts, lineage, data quality checks

Mindset & Culture Fit:

  • You’re a system thinker: you want to model the real world, not just process it

  • Comfortable navigating ambiguous data models and building from scratch

  • Passionate about enabling AI systems with real-world, messy enterprise data

  • Pragmatic about scalability, observability, and schema evolution

  • Value autonomy, high trust, and meaningful ownership over infrastructure

Bonus Skills

  • Prior work with vector DBs (e.g. Weaviate, Qdrant, Pinecone) and embedding pipelines

  • Experience building or contributing to enterprise connector ecosystems

  • Knowledge of ontology versioning, graph diffing, or semantic schema alignment

  • Familiarity with data fabric patterns (e.g. Palantir Ontology, Linked Data, W3C standards)

  • Familiar with fine-tuning LLMs or enabling RAG pipelines using enterprise knowledge

  • Experience enforcing data access policy with tools like OPA, Keycloak, Snowflake row-level security

Why This Role Matters

Agents are only as smart as the data they operate on. This role builds the foundation — the semantic, governed, connected substrate — that makes autonomous decision-making and agent action possible. From factory ERP records to geopolitical news alerts, the data fabric unifies it all.

If you're excited to tame complexity, unify chaos, and power intelligent systems with trusted data — we’d love to hear from you.

Skills Required

  • 5+ years building large-scale data infrastructure in production environments
  • Experience with ingestion frameworks (Kafka, Airbyte, Meltano, Fivetran)
  • Experience with data pipeline orchestration (Airflow, Dagster, Prefect)
  • Comfortable processing unstructured data formats (PDFs, Excel, email, logs, CSV, web APIs)
  • Experience with columnar stores, object storage, and lakehouse formats (Iceberg, Delta, Parquet)
  • Strong background in knowledge graphs or semantic modeling (Neo4j, RDF, Gremlin, Puppygraph)
  • Familiarity with GraphQL and RESTful APIs and designing data access layers
  • Experience implementing data governance: RBAC, ABAC, data contracts, lineage, and data quality checks
  • Prior work with vector DBs or embedding pipelines (Weaviate, Qdrant, Pinecone)
  • Experience building or contributing to enterprise connector ecosystems
  • Knowledge of ontology versioning, graph diffing, or semantic schema alignment
  • Familiarity with fine-tuning LLMs or enabling RAG pipelines
  • Experience enforcing data access policy with OPA, Keycloak, or Snowflake row-level security
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
8 Employees

What We Do

Fabrion is an AI-native platform and operating system purpose-built for the new industrial era. The company provides an intelligence layer for modern manufacturing, aiming to transform complex industrial value chains and enterprises. By offering an AI-powered supplier intelligence platform, Fabrion helps industrial manufacturers move faster, operate smarter, and build with confidence, effectively transforming the industrial enterprise and complex supply chains.

Similar Jobs

PAR Technology Logo PAR Technology

Data Engineer

Food • Software • Hospitality
In-Office or Remote
Markham, ON, CAN
2000 Employees
85K-102K Annually

Magna International Logo Magna International

Quality Engineer

Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Remote or Hybrid
Woodbridge, ON, CAN
171000 Employees
70K-80K Annually

Applied Systems Logo Applied Systems

Sr. UX Engineer

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
Remote or Hybrid
Canada
3040 Employees

PwC Logo PwC

Quality Engineer - Senior Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Remote or Hybrid
67 Locations
370000 Employees
124K-280K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account