Data Engineer, League Analytics & Infrastructure

Posted 3 Days Ago
New York, NY, USA
In-Office
115K-140K Annually
Mid level
Sports
The Role
Build and operate GCP-native data pipelines and lakehouse infrastructure using Airflow, dbt, and BigQuery. Design layered data models, manage ingestion (batch and streaming), implement observability and governance, and maintain CI/CD workflows to deliver reliable analytics for MLB stakeholders.
Summary Generated by Built In

Major League Baseball is scaling the data platform that powers America's pastime — and we need a builder to help us do it. The League Analytics & Infrastructure (LAI) team operates the cloud foundation behind every analytics decision at MLB from Statcast player-tracking pipelines processing millions of pitch-level events to the baseball operations data that informs decisions for 30 Clubs and the Commissioner's Office.

We are continuing a multi-year evolution of our GCP-native lakehouse — building out our dbt transformation layer, hardening our Airflow orchestration, and pushing more of our infrastructure into code. You will be a core contributor to that build. The systems you create will be the backbone of analytics products used across the league, and your work will be visible to engineers, analysts, and decision-makers at every level of the organization.

This is a hands-on data engineering role focused on execution. Reporting to the Manager of BI Data Engineering, you'll join a small, high-performing team within LAI that values careful craftsmanship, rolls up its sleeves, and treats data as a product rather than a byproduct. You'll work upstream of our analytics engineers and analysts — designing the pipelines, models, and infrastructure they rely on every day. We are looking for someone with production experience who can hit the ground running, but also someone who's hungry to grow into the next level. "Delivering" is the aim of the game: shipping reliable pipelines, optimizing data models for the analysts and engineers downstream, and making the platform a little better with every pull request. Beyond that, we want someone who reads the codebase critically, asks why something was built a certain way, and brings ideas — about tooling, architecture, or process — that push the team forward. The standards are high, the autonomy is real, and the work is visible across the league.

Responsibilities 

  • Build production-grade pipelines using Airflow and dbt to orchestrate batch and streaming transformations across GCP, so that downstream analysts and engineers can trust the data they query without checking the wiring
  • Architect clean, layered data models (staging intermediate mart) that serve as the single source of truth for league analytics, applying dbt best practices for materialization, testing, and documentation
  • Operate the ingestion layer using Pub/Sub, GCS, Dataflow, and Knowledge Catalog DataPlex) to land both batch and streaming sources cleanly into the lakehouse
  • Implement observability and monitoring standards so that data quality issues surface before stakeholders notice them, not after
  • Manage code through GitHub-based CI/CD, contributing to the deployment workflows that keep our platform reliable and our changes safe
  • Adhere to data governance practices that keep proprietary baseball data secure and compliant

Qualifications & Skills

  • 2–4 years of production data engineering experience
  • Expert-level SQL — comfortable writing complex freehand queries (sub-queries, nested logic, window functions) and reading someone else's to spot issues
  • Strong Python for data processing, scripting, and automation
  • Hands-on dbt experience — you've built models across staging, intermediate, and mart layers, written tests, and shipped to production
  • Production Airflow experience — DAG authoring, dependency management, debugging failed runs
  • Deep familiarity with Google Cloud Platform (BigQuery, GCS, Pub/Sub) or equivalent depth in AWS/Azure with willingness to convert
  • Git-based development workflows — branches, PRs, code review as a daily practice
  • You communicate clearly with both engineers and non-engineers, take feedback well, and give it kindly
  • Execution mindset. You can own a project from requirements to deployment with minimal oversight.

Nice-to-Have

  • A degree in Computer Science, Engineering, or a related field — or non-traditional background with equivalent practical experience
  • Experience with Terraform or other Infrastructure-as-Code tools
  • Experience with AI-assisted development or enterprise AI tooling (Gemini Enterprise, Vertex AI). We're early but ambitious — we see AI as a lever for engineering efficiency
  • A passion for baseball, or prior experience in sports, media, or entertainment
  • Ability to build creative solutions for unusual problems

Salary Range: $115,000 - $140,000 (Base Salary) + Bonus

As a candidate for this position, your salary and related aspects of compensation will be contingent upon your work experience, education, skills, and any other factors MLB considers relevant to the hiring decision. In addition to your salary, MLB believes in providing a competitive compensation and benefits package for its employees.

Top MLB Perks & Benefits:

  • Competitive Benefits Package
  • Company Contributed 401K Plan
  • Paid Time Off and Holidays
  • Paid Parental Leave
  • Access to Free Tickets to Baseball Games & MLB.TV
  • Discounts at MLB Store | MLBShop.com
  • Employee Assistance Programs (EAP)
  • Onsite/Online Training & Development Programs
  • Tuition Reimbursement
  • Disability Benefits (short term and long term)
  • Life and Accidental Death Insurance
  • Pet Insurance

Why MLB?

Major League Baseball (MLB) is the most historic of the major professional sports leagues in the United States and Canada. Employees love working at MLB because of the culture of growth, teamwork, and professionalism. Employees who are most successful at MLB take initiative, know how to identify problems and provide solutions, and always put the Team first. For those ready to step up to the plate and join the major leagues, MLB takes the same approach as teams do with their players: empowering our “workforce athletes” to be at their best by engineering experiences that put employees in the best position to succeed. Major League Baseball is looking for candidates who are passionate about growing America’s pastime to best serve its fans for decades to come.

California Residents: Please see our California Recruitment Privacy Policy for more details.

Colorado Residents: Colorado based applicants may redact or remove age-identifying information such as age, date of birth, or dates of school attendance or graduation. You will not be penalized for redacting or removing this information.

Applicants requiring a reasonable accommodation for any part of the application and hiring process, please email us at [email protected]. Requests received for non-disability related issues, such as following up on an application, will not receive a response.

Are you ready to Step Up to the Plate? Apply below!

Skills Required

  • 2-4 years of production data engineering experience
  • Expert-level SQL (complex queries, window functions, subqueries)
  • Strong Python for data processing, scripting, and automation
  • Hands-on dbt experience across staging, intermediate, and mart layers with tests and production deployments
  • Production Airflow experience (DAG authoring, dependency management, debugging)
  • Familiarity with Google Cloud Platform (BigQuery, GCS, Pub/Sub) or equivalent AWS/Azure experience
  • Experience building and operating batch and streaming ingestion pipelines (Pub/Sub, Dataflow, GCS, DataPlex)
  • Experience with Git-based development workflows and GitHub-based CI/CD
  • Designing clean, layered data models and applying dbt best practices for materialization, testing, and documentation
  • Implement observability, monitoring, and data quality practices
  • Adhere to data governance practices to keep proprietary data secure and compliant
  • Ability to own projects from requirements through deployment with minimal oversight; strong communication skills
  • Degree in Computer Science, Engineering, or related field (or equivalent practical experience)
  • Experience with Terraform or other Infrastructure-as-Code tools
  • Experience with AI-assisted development or enterprise AI tooling (Vertex AI, Gemini Enterprise)
  • Passion for baseball or prior experience in sports, media, or entertainment
  • Ability to build creative solutions for unusual problems
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account