Forward Deployed Data Engineer

Posted 9 Days Ago
New York, NY, USA
In-Office
180K-250K Annually
Senior level
Artificial Intelligence • Computer Vision • Machine Learning • Robotics
The Role
Own end-to-end delivery of customer datasets: gather requirements, build and harden ingestion/transformation/QA/export pipelines, define dataset contracts and quality metrics, query and slice large corpora for model fit, and drive tooling improvements. Serve as technical customer contact and translate ambiguous needs into reproducible, versioned datasets suitable for robotics/embodied-AI training.
Summary Generated by Built In
About Mecka AI

Mecka AI is building the data infrastructure layer for robotics and embodied AI.

We partner with leading AI labs and robotics companies to deliver high-quality, real-world datasets used to train, evaluate, and deploy robotic systems - where model performance is dictated by data quality.

The Role

We are hiring a Forward Deployed Data Engineer to operate on the frontier with customers: take messy, real-world capture data - much of it raw video - and turn it into beautiful, reliable, model-ready datasets, while owning the technical relationship end-to-end.

This is a senior, high-trust role with significant autonomy. You'll combine data engineering, hands-on analysis, and product judgment to deliver datasets customers can train and ship on - and to make our delivery systems more reliable every time you do.

What You'll Work OnCustomer Delivery & Technical Ownership
  • Own the end-to-end delivery of customer datasets: requirements, validation, iteration, final handoff.

  • Be the technical point of contact: communicate clearly, set expectations, and close loops.

  • Turn one-off customer needs into durable internal improvements - tooling, pipelines, and standards that make every future delivery faster and safer.

Data Systems & Pipelines
  • Build, debug, and harden data pipelines across ingestion, transformation, QA, and export.

  • Work fluently across storage and database paradigms (SQL + NoSQL + object storage) and pick the right tool for the job.

  • Establish reliable dataset "contracts": schemas, versioning, provenance, and reproducible builds - so every dataset has a clear source of truth.

Dataset Quality & Signal
  • Define and measure what makes a dataset good for a given task: coverage, diversity, balance, label fidelity, and fitness for the customer's model.

  • Build quality scorecards and coverage/diversity reports that make dataset health legible to customers and internal teams.

  • Query and slice large corpora to maximize customer fit - surface exactly the data that matches a target distribution, not just bulk volume.

  • When the signal a customer needs is missing or weak in the raw video, diagnose it and partner with the perception/ML pipeline teams to extract or improve it upstream.

Who You AreRequired Background
  • 5+ years in data engineering and/or backend engineering (or equivalent impact).

  • Strong experience with large data systems, pipelines, and analytical workflows.

  • Strong SQL proficiency and comfort across multiple database/storage paradigms.

  • Excellent engineering judgment and debugging ability in production systems.

  • Genuine data taste - you can look at a dataset and reason about whether it's complete, balanced, and trustworthy, not just whether the job ran.

Strong Signals
  • You've owned high-stakes customer deliveries with autonomy and trust.

  • You can translate ambiguous requirements into crisp dataset specs and execution plans.

  • You have strong product instincts and care about polish: "would I trust this dataset?"

  • You're comfortable working with unstructured, real-world data - especially video.

Nice to Have
  • Working literacy in video understanding, embeddings, and encoders - enough to reason about what a dataset teaches a model and where signal is missing.

  • Experience building data-quality, coverage, or diversity tooling.

  • Background adjacent to ML, computer vision, or robotics data.

Why This Role
  • Own the customer-facing delivery loop for world-class robotics datasets.

  • High autonomy, high trust, and direct impact on customer success and revenue.

  • Work across the full stack of the problem: data, pipelines, analysis, and delivery quality.

  • Sit at the exact point where raw, messy, real-world data becomes the thing that makes embodied-AI models work.

Skills Required

  • 5+ years in data engineering and/or backend engineering
  • Experience with large data systems, pipelines, and analytical workflows
  • Strong SQL proficiency
  • Comfort across multiple database/storage paradigms (SQL, NoSQL, object storage)
  • Excellent engineering judgment and debugging ability in production systems
  • Ability to reason about dataset quality (coverage, balance, label fidelity, fitness)
  • Owned high-stakes customer deliveries with autonomy and trust
  • Translate ambiguous requirements into dataset specs and execution plans
  • Product instincts and care about polish for dataset deliverables
  • Comfort working with unstructured, real-world data, especially video
  • Working literacy in video understanding, embeddings, and encoders
  • Experience building data-quality, coverage, or diversity tooling
  • Background adjacent to ML, computer vision, or robotics data
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
58 Employees
Year Founded: 2024

What We Do

Mecka AI is a data and infrastructure company that provides high-quality human movement data to accelerate the development of autonomous systems for humanoid robotics. It serves as the data and deployment layer for physical AI, capturing, structuring, and evaluating real-world activity to create labeled datasets that enable robots to learn and deploy reliably in commercial settings.

Similar Jobs

PwC Logo PwC

Connected Supply Chain, Planning - Kinaxis, Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
18 Locations
370000 Employees
99K-232K Annually

PwC Logo PwC

Strategy& Financial Services - AWM Consulting Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
14 Locations
370000 Employees
99K-232K Annually

PwC Logo PwC

Connected Supply Chain, Planning - Kinaxis, Senior Associate

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
18 Locations
370000 Employees
77K-202K Annually

Cox Enterprises Logo Cox Enterprises

Communications Specialist

Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity
Remote or Hybrid
United States
50000 Employees
61K-92K Annually

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account