Senior Data Engineer

Reposted 7 Days Ago
3 Locations
In-Office
Senior level
Professional Services • Retail • Design • Manufacturing
The Role
The Senior Data Engineer will design, implement, and manage data pipelines for real-time analytics and ML training datasets, ensuring efficient data flow and quality in a hybrid environment.
Summary Generated by Built In

Marble is a technology company founded to revolutionize the food processing industry. Marble is seeking a full-time Senior Data Engineer who is ready for a challenge and eager to design, implement, and support automation solutions that are transforming the industry. As a part of the Marble team, you will leverage cutting-edge technologies to develop the next generation of automated solutions for food processing, enhancing resilience in the food supply chain.

Job Summary:

As a Senior Data Engineer at Marble, you will own the design and performance of data pipelines that power everything from real-time classification dashboards to ML training datasets to operational analytics for production facilities. You will work closely with Software, Infrastructure, and Machine Learning teams to ensure data flows efficiently through our pipelines securely, reliably, and at scale.

You will design for both high-throughput real-time ingestion and large-scale batch processing across on-prem edge nodes and AWS.

Responsibilities:
  • Architect and build scalable ETL/ELT pipelines for both batch and streaming workloads

  • Design real-time ingestion and transformation workflows integrating NATS JetStream and distributed microservices

  • Develop robust data models and ETL layers for ClickHouse, enabling high-performance analytics and ML feature extraction

  • Manage and optimize data storage across AWS S3, ClickHouse, and operational datasets generated on-prem

  • Build automation workflows for labeling data, CV pipeline pre-annotation, dataset generation, and versioning

  • Ensure data quality, validation, integrity, and lineage, including automated tests and monitoring across pipelines

  • Collaborate with ML and backend teams to deliver pipelines for training datasets and annotation tools.
    Implement scalable compute workloads for large dataset transformations

  • Define and enforce data governance best practices, including schema evolution, retention policies, and compliance requirements

  • Monitor and improve data pipeline performance across multi-region environments

Minimum Qualifications:
  • B.S. or M.S. in Computer Science, Data Engineering, or related field

  • 4+ years of experience building production-grade data pipelines or distributed systems

  • Strong proficiency in Python and SQL

  • Production experience with at least one major distributed compute framework, Apache Spark, Ray, or Apache Airflow (2+ years preferred)

  • Experience building streaming pipelines or real-time systems (Kafka, NATS, Redis Streams, or similar)

  • Deep familiarity with AWS cloud services (S3, Lambda, IAM, EC2, Glue etc.)

  • Experience with PostgreSQL, MongoDB, Clickhouse or other columnar/NoSQL systems

  • Strong understanding of data modeling, partitioning, schema evolution, and performance tuning

  • Understanding of data quality, lineage, orchestration, and governance

  • Ability to design systems in hybrid environments (on-prem + cloud)

  • Excellent communication, documentation, and teamwork skills

Preferred Qualifications:
  • Experience with NATS JetStream, Kafka, or high-throughput messaging systems

  • Familiarity with GPU-based CV pipelines, ML datasets, or annotation workflows

  • Experience with ClickHouse Materialized Views, Replicated Tables, or S3-backed storage

  • Experience working in a regulated, safety-critical, or high-uptime environment

  • Experience with Nomad, Consul, Vault, or HashiCorp ecosystem

Job Type: Full-time

Location: Lincoln, NE - US, Omaha, NE - US, or Cambridge, MA - US

Team members can expect occasional travel for in-person meetings and site visits.

Marble is an equal-opportunity employer. We understand the power of a diverse team, celebrate differences, and promote inclusion.

Skills Required

  • B.S. or M.S. in Computer Science, Data Engineering, or related field
  • 4+ years of experience building production-grade data pipelines or distributed systems
  • Strong proficiency in Python and SQL
  • Production experience with at least one major distributed compute framework (Apache Spark, Ray, or Apache Airflow)
  • Experience building streaming pipelines or real-time systems (Kafka, NATS, Redis Streams)
  • Deep familiarity with AWS cloud services (S3, Lambda, IAM, EC2, Glue)
  • Experience with PostgreSQL, MongoDB, Clickhouse or other columnar/NoSQL systems
  • Strong understanding of data modeling, partitioning, schema evolution, and performance tuning
  • Understanding of data quality, lineage, orchestration, and governance
  • Ability to design systems in hybrid environments (on-prem + cloud)
  • Excellent communication, documentation, and teamwork skills
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
132 Employees
Year Founded: 2015

What We Do

Marble.com is the premier natural stone countertop fabricator and installer in the world, offering a massive selection of over 2,000 stone colors, utilizing state-of-the-art technology, and providing customer service.

Similar Jobs

Samsara Logo Samsara

Senior Data Engineer

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
United States
4000 Employees
120K-201K Annually

Jellyfish Logo Jellyfish

Senior Data Engineer

Big Data • Cloud • Productivity • Software • Database • Analytics • Automation
Remote or Hybrid
United States
225 Employees
190K-240K Annually

Cogniify Logo Cogniify

Senior Data Engineer

Artificial Intelligence • Information Technology • Machine Learning • Consulting
In-Office or Remote
8 Locations
103K-162K Annually

Samsara Logo Samsara

Senior Software Engineer

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
United States
4000 Employees
131K-220K Annually

Similar Companies Hiring

Fortune Brands Innovations Thumbnail
Manufacturing
Deerfield, IL
2450 Employees
Amalgamated Sugar Thumbnail
Food • Greentech • Agriculture • Industrial • Manufacturing
Boise, Idaho
768 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account