Lead - Data & Ml Platform Engineering

Posted Yesterday
Be an Early Applicant
2 Locations
In-Office
Expert/Leader
Artificial Intelligence • HR Tech • Professional Services • Software
The Role
Lead architecture, build, and operate a Databricks-based Lakehouse and ML platform across four pillars: Data Platform, ML Platform & MLOps, Platform Operations & FinOps, and Data Governance & Quality. Deliver sub-second inference, industrialize ML lifecycles with MLflow and Mosaic AI, implement governance-as-code, run FinOps for DBU cost allocation, and ensure platform reliability for retail-scale traffic and thousands of developers.
Summary Generated by Built In

This role is for one of the Weekday's clients

Min Experience: 10+ years

Location: Bengaluru, Mumbai

JobType: full-time

Focus Areas: (i) Data Platform Engineering, (ii) ML Platform & MLOps, (iii) Platform Operations & FinOps, (iv) Data Governance & Quality

Experience: 14–20 years total |  8–12 years in Data/ML Platform Engineering   

Core Platform: Databricks Intelligence Platform (Unity Catalog, Delta Lake, MLflow, Mosaic AI)

The Context

We are currently developing the “v2.0” intelligence layer atop this Lakehouse—aiming to standardize MLOps, expand Agentic AI capabilities, and guarantee that the platform delivers sub-second latency across the entire retail network, which includes tens of thousands of stores and high-traffic digital channels.

The Data & ML Platforms group (Group A in Enterprise IT) serves as the driving force behind this transformation. It is led by a VP (L2) and organized into four AVP-led pillars, supported by 10 AI-ready Platform Engineers and a transitioning team of Data Engineers. Each AVP is responsible for a specific platform layer and functions as a builder-leader—expected not only to manage but also to architect, perform code reviews, and actively contribute to development alongside their team.

The Four Pillars

We are seeking to hire four AVPs, each heading one of the platform pillars. While each AVP has full ownership of their respective pillar, all four collaborate closely as a unified leadership team under the VP. Candidates may be evaluated for placement in any pillar depending on their strengths and fit.


Requirements(i) Data Platform Engineering

Mission: Take full ownership of the core Lakehouse infrastructure, encompassing storage, compute, and developer platform layers that support all other operations.

  • Design and maintain the Delta Lake storage layer, Photon compute engine, and Unity Catalog abstraction, serving over 1,000 developers across various retail sectors.
  • Implement advanced optimization techniques including query plan tuning, cluster auto-scaling policies, Z-ordering strategies, and partitioning schemes for datasets with trillions of rows.
  • Manage the internal developer platform by developing SDKs, CLI tools, templates, and enabling self-service onboarding to accelerate new teams' time-to-first-query.
  • Lead the technical cleanup of Phase-1 migration challenges, including schema standardization, pipeline consolidation, and deduplication of source of record (SOR) systems across hundreds of sources.
  • Oversee the Data Engineer transition cohort within this pillar, establishing engineering standards, enforcing code review processes, and defining career progression paths.
(ii) ML Platform & MLOps

Mission: Industrialize machine learning by building infrastructure that efficiently moves models from experimentation notebooks to production at retail scale.

  • Develop and maintain the end-to-end ML lifecycle leveraging MLflow, including experiment tracking, model registry, automated retraining, A/B testing, and canary deployments.
  • Design the real-time inference architecture to deliver model serving with sub-100ms latency across recommendation, pricing, and demand forecasting applications.
  • Construct the Agentic AI infrastructure comprising RAG pipelines, vector stores, fine-tuning workflows for Foundation Models (utilizing Mosaic AI), and agent orchestration frameworks.
  • Establish governance for the Feature Store by standardizing feature definitions, enforcing freshness SLAs, lineage tracking, and promoting feature reuse across retail divisions.
  • Ensure reliability of the ML platform through GPU/TPU cluster management, training job scheduling, cost attribution per model, and managing incident response for production model degradations.
(iii) Platform Operations & FinOps

Mission: Maintain platform stability, performance, and cost-efficiency—especially during critical periods.

  • Ensure 99.99% platform uptime, providing leadership during critical events such as festive sales, store openings, and retail peak periods.
  • Establish and run the FinOps practice focusing on DBU cost allocation by team and workload, implementing chargeback models, automating resource right-sizing, and delivering executive cost dashboards.
  • Design and manage monitoring and observability systems covering pipeline health, query performance, cluster utilization, and data freshness SLAs across all six value streams.
  • Lead capacity planning by forecasting compute and storage demands in line with retail seasonality (festive cycles, new store launches, category introductions) and provisioning resources accordingly in advance.
  • Oversee incident management, develop runbooks, and conduct post-mortem evaluations for the Databricks platform, ensuring targets for mean time to recovery are met and continually improved.
(iv) Data Governance & Quality

Mission: Serve as the technical steward for India’s largest consumer dataset, ensuring its trustworthiness, compliance, and discoverability.

  • Develop “Governance-as-Code” frameworks on Unity Catalog, incorporating automated access controls, data classification, PII masking, and audit trails to comply with DPDP Act requirements.
  • Design and implement a data quality framework that includes automated profiling, anomaly detection, schema enforcement, and freshness monitoring across thousands of datasets.
  • Manage the data catalog and discovery platform, providing metadata management, lineage visualization, business glossary, and search tools to support over 1,000 users.
  • Build consent management infrastructure to monitor, enforce, and audit user consent signals throughout the comprehensive “Phygital” retail ecosystem (online and offline).
  • Drive enterprise-wide data standards by defining naming conventions, rules for SOR deduplication, master data alignment, and data contract enforcement between producing and consuming teams.
Minimum Qualifications (All Pillars)
  • 14 to 20 years of professional experience in software engineering, data engineering, or ML infrastructure, including a minimum of 3 years leading a platform team of 5 or more engineers.
  • 8 to 12 years of hands-on experience in building and scaling data or ML platforms such as Lakehouse architectures, Feature Stores, Streaming Engines, or MLOps pipelines.
  • Strong technical expertise within the Databricks ecosystem or similar distributed data platforms (e.g., Spark, Presto/Trino, Flink, or Kafka at scale), with a strong preference for Databricks experience.
  • Proven “builder-leader” approach: actively involved in code review, production debugging, and architectural decision-making without fully delegating technical responsibilities.
  • Experience operating within large and complex technology organizations featuring inherited teams, cross-functional dependencies, and enterprise-grade compliance requirements.
  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related discipline, or equivalent expertise acquired through industry experience and open-source contributions.
Preferred Qualifications
  • Previous experience managing India-scale data platforms handling multi-billion events per day, petabyte-scale data warehouses, or real-time serving at over 10,000 queries per second.
  • Hands-on experience with MLflow, Mosaic AI, or similar ML infrastructure platforms at production level—not limited to experimentation phases.
  • Familiarity with retail or e-commerce data domains such as product catalogs, inventory management, order processing, customer behavior signals, or supply chain datasets.
  • Demonstrated success in building internal tooling or developer platforms that have gained widespread organic adoption within large engineering organizations.
  • Experience with FinOps practices including DBU/compute cost attribution, chargeback modeling, and enterprise-scale cloud cost optimization.
  • Knowledge of Indian data privacy regulations (DPDP Act) or global frameworks (GDPR, CCPA) in the context of data platform governance.
Organisation Context

This position reports directly to the VP & Head of Data & ML Platforms, who in turn reports to the Head of Enterprise IT, and ultimately to the CEO. You will collaborate as a peer with three other AVPs within the Data & ML Platforms group and work closely with more than 10 AI-ready Platform Engineers at Architect and Principal levels, alongside the transitioning Data & Platforms Engineers cohort.

The broader Enterprise IT division comprises five additional L2 groups: CISO/Cybersecurity, HR/Finance/Legal Platforms, SAP-Core, Systems & AI Architects, and CIO + Cloud & Infrastructure.

Must-have skills

Data & ML Platform, Databricks, Platform Architecture

Good-to-have skills

MLOps, System Architecture, Retail

Skills Required

  • 14-20 years professional experience in software engineering, data engineering, or ML infrastructure
  • Minimum 3 years leading a platform team of 5+ engineers
  • 8-12 years hands-on experience building and scaling data or ML platforms (Lakehouse, Feature Stores, Streaming, MLOps)
  • Strong technical expertise in Databricks ecosystem or similar distributed platforms (Databricks, Spark, Presto/Trino, Flink, Kafka)
  • Proven builder-leader approach (active in code review, production debugging, architecture)
  • Experience operating within large, complex technology organizations with enterprise compliance
  • Bachelor's or Master's in Computer Science, Data Science, or related discipline or equivalent industry experience
  • Hands-on experience with MLflow, Mosaic AI, or similar ML infrastructure in production
  • Experience with India-scale data platforms (multi-billion events/day, petabyte data, high QPS)
  • Experience with FinOps practices including DBU/compute cost attribution and chargeback modeling
  • Familiarity with data privacy and governance frameworks (DPDP Act, GDPR, CCPA)
  • Must-have skills: Data & ML Platform, Databricks, Platform Architecture
  • Good-to-have skills: MLOps, System Architecture, Retail domain experience
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
0 Employees
Year Founded: 2021

What We Do

Weekday is an AI-powered recruitment platform that helps startups hire top-tier engineering and product talent. By leveraging a massive database of white-collar professionals and advanced outreach tools, the company streamlines the hiring process through automated sourcing, AI-driven resume screening, and white-glove contingency services. Their mission is to modernize recruitment by enabling companies to discover and engage passive candidates efficiently, ensuring high-quality hires for critical roles.

Similar Jobs

Coupa Logo Coupa

Lead Software Engineer

Artificial Intelligence • Fintech • Information Technology • Logistics • Payments • Business Intelligence • Generative AI
Hybrid
Pune, Maharashtra, IND
2500 Employees

BlackRock Logo BlackRock

Associate - EDP

Fintech • Information Technology • Financial Services
In-Office
Mumbai, Maharashtra, IND
25000 Employees

BlackRock Logo BlackRock

Accounts Receivable Specialist

Fintech • Information Technology • Financial Services
In-Office
Mumbai, Maharashtra, IND
25000 Employees

BlackRock Logo BlackRock

Analyst, Cash Operations

Fintech • Information Technology • Financial Services
In-Office
Mumbai, Maharashtra, IND
25000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account