We are seeking an experienced Senior Databricks Data Quality Engineer to lead the design, implementation, and automation of enterprise-scale data quality frameworks within a Databricks environment. The successful candidate will play a key role in establishing data quality controls, profiling frameworks, remediation processes, and AI-assisted quality monitoring across a large-scale data platform consisting of 170+ datasets and over 1,300 Critical Data Elements (CDEs).
This role requires strong expertise in Databricks, PySpark, Delta Lake, MLflow, and modern data quality management practices.
Key ResponsibilitiesData Platform & Databricks Configuration- Configure and manage Databricks workspaces, compute clusters, PySpark notebooks, Delta Lake architecture, and Unity Catalog integrations.
- Design scalable data quality processing frameworks across 170+ datasets and 1,346 prioritized Critical Data Elements (CDEs).
- Develop AI-assisted profiling notebooks using PySpark to establish baseline data quality scores.
- Assess data quality across six key dimensions including:
- Completeness
- Uniqueness
- Validity
- Consistency
- Accuracy
- Timeliness
- Analyze null rates, duplicate records, invalid values, format violations, outliers, and schema drift.
- Design and build a scalable Data Quality Rule Factory using parameterized PySpark functions.
- Enable automated deployment of over 6,700 data quality rules without manual rule-by-rule development.
- Create reusable rule templates across datasets and data quality dimensions.
- Integrate data quality controls within Bronze, Silver, and Gold Delta Lake layers.
- Implement quality gates that prevent data progression unless predefined thresholds are met.
- Develop reusable Databricks Jobs for automated validation and monitoring.
- Build automated data cleansing pipelines for:
- Standardization
- Deduplication
- Schema harmonization
- Deploy MLflow-managed machine learning models for:
- Anomaly detection
- Fuzzy duplicate detection
- Exact duplicate identification
- Ensure explainability of model outputs and support human-in-the-loop validation processes.
- Design failed-record handling frameworks and quarantine Delta tables.
- Capture failure reasons, affected CDEs, rule references, and timestamps.
- Develop automated reprocessing mechanisms for corrected records.
- Build Delta Lake aggregation tables for data quality metrics.
- Deliver data quality KPIs to Power BI dashboards including:
- Dimension-level scores
- Rule pass/fail rates
- SLA adherence metrics
- Configure automated alerting using Databricks SQL Alerts and Azure Monitor.
- Develop predictive models to identify datasets at risk of quality degradation.
- Support AI-assisted Root Cause Analysis (RCA) using profiling outputs and machine learning techniques.
- Export and prepare remediation datasets for prioritization and governance reporting.
Requirements
- Bachelor's degree in Computer Science, Data Engineering, Information Systems, or a related field.
- 5+ years of experience in Data Engineering or Data Quality Engineering.
- 3+ years of hands-on experience with Databricks and PySpark.
- Strong expertise in Delta Lake architecture and data pipeline development.
- Experience with Unity Catalog implementation and governance.
- Hands-on experience with MLflow and machine learning deployment.
- Strong SQL skills and data modeling expertise.
- Experience building enterprise-scale data quality frameworks.
- Experience integrating Databricks with Power BI and Azure services.
- Strong understanding of data governance, metadata management, and data quality dimensions.
- Microsoft Azure certifications.
- Databricks Certified Data Engineer Associate or Professional.
- Experience with enterprise data governance programs.
- Experience implementing AI-assisted data quality and remediation solutions.
- Knowledge of Master Data Management (MDM) principles.
Skills Required
- Bachelor's degree in Computer Science, Data Engineering, Information Systems, or related field
- 5+ years of experience in Data Engineering or Data Quality Engineering
- 3+ years hands-on experience with Databricks and PySpark
- Strong expertise in Delta Lake architecture and data pipeline development
- Experience with Unity Catalog implementation and governance
- Hands-on experience with MLflow and machine learning deployment
- Strong SQL skills and data modeling expertise
- Experience building enterprise-scale data quality frameworks
- Experience integrating Databricks with Power BI and Azure services
- Strong understanding of data governance, metadata management, and data quality dimensions
- Microsoft Azure certifications
- Databricks Certified Data Engineer Associate or Professional
- Experience with enterprise data governance programs
- Experience implementing AI-assisted data quality and remediation solutions
- Knowledge of Master Data Management (MDM) principles
What We Do
Robusta Technology Group (RTG) | Empowering the tech landscape with innovative digital solutions, expertise, and collaboration. Join us to unlock your business' growth potential. #TechForGrowth #DigitalTransformation

