robusta Jobs

Senior Data Quality Engineer (4 Months Contract ) Onsite in UAE - Octopus by RTG

robusta

Senior Data Quality Engineer (4 Months Contract ) Onsite in UAE - Octopus by RTG

Posted Yesterday

Be an Early Applicant

Abu Dhabi, ARE

In-Office

Senior level

Other

The Role

Lead design and automation of enterprise data quality frameworks in Databricks. Build profiling, rule factories, cleansing pipelines, MLflow models for anomaly and duplicate detection, enforce quality gates across Bronze/Silver/Gold layers, manage exceptions, and deliver KPIs and Power BI reporting with Azure integrations.

Summary Generated by Built In

About the Role

We are seeking an experienced Senior Databricks Data Quality Engineer to lead the design, implementation, and automation of enterprise-scale data quality frameworks within a Databricks environment. The successful candidate will play a key role in establishing data quality controls, profiling frameworks, remediation processes, and AI-assisted quality monitoring across a large-scale data platform consisting of 170+ datasets and over 1,300 Critical Data Elements (CDEs).

This role requires strong expertise in Databricks, PySpark, Delta Lake, MLflow, and modern data quality management practices.

Key ResponsibilitiesData Platform & Databricks Configuration

Configure and manage Databricks workspaces, compute clusters, PySpark notebooks, Delta Lake architecture, and Unity Catalog integrations.
Design scalable data quality processing frameworks across 170+ datasets and 1,346 prioritized Critical Data Elements (CDEs).

Data Profiling & Quality Assessment

Develop AI-assisted profiling notebooks using PySpark to establish baseline data quality scores.
Assess data quality across six key dimensions including:

Completeness
Uniqueness
Validity
Consistency
Accuracy
Timeliness

Analyze null rates, duplicate records, invalid values, format violations, outliers, and schema drift.

Data Quality Rule Framework

Design and build a scalable Data Quality Rule Factory using parameterized PySpark functions.
Enable automated deployment of over 6,700 data quality rules without manual rule-by-rule development.
Create reusable rule templates across datasets and data quality dimensions.

Pipeline Quality Enforcement

Integrate data quality controls within Bronze, Silver, and Gold Delta Lake layers.
Implement quality gates that prevent data progression unless predefined thresholds are met.
Develop reusable Databricks Jobs for automated validation and monitoring.

Data Cleansing & AI-Driven Remediation

Build automated data cleansing pipelines for:

Standardization
Deduplication
Schema harmonization

Deploy MLflow-managed machine learning models for:

Anomaly detection
Fuzzy duplicate detection
Exact duplicate identification

Ensure explainability of model outputs and support human-in-the-loop validation processes.

Exception Management

Design failed-record handling frameworks and quarantine Delta tables.
Capture failure reasons, affected CDEs, rule references, and timestamps.
Develop automated reprocessing mechanisms for corrected records.

Data Quality Monitoring & Reporting

Build Delta Lake aggregation tables for data quality metrics.
Deliver data quality KPIs to Power BI dashboards including:

Dimension-level scores
Rule pass/fail rates
SLA adherence metrics

Configure automated alerting using Databricks SQL Alerts and Azure Monitor.

Predictive Data Quality Analytics

Develop predictive models to identify datasets at risk of quality degradation.
Support AI-assisted Root Cause Analysis (RCA) using profiling outputs and machine learning techniques.
Export and prepare remediation datasets for prioritization and governance reporting.

Requirements

Bachelor's degree in Computer Science, Data Engineering, Information Systems, or a related field.
5+ years of experience in Data Engineering or Data Quality Engineering.
3+ years of hands-on experience with Databricks and PySpark.
Strong expertise in Delta Lake architecture and data pipeline development.
Experience with Unity Catalog implementation and governance.
Hands-on experience with MLflow and machine learning deployment.
Strong SQL skills and data modeling expertise.
Experience building enterprise-scale data quality frameworks.
Experience integrating Databricks with Power BI and Azure services.
Strong understanding of data governance, metadata management, and data quality dimensions.

Preferred Qualifications

Microsoft Azure certifications.
Databricks Certified Data Engineer Associate or Professional.
Experience with enterprise data governance programs.
Experience implementing AI-assisted data quality and remediation solutions.
Knowledge of Master Data Management (MDM) principles.

Skills Required

Bachelor's degree in Computer Science, Data Engineering, Information Systems, or related field
5+ years of experience in Data Engineering or Data Quality Engineering
3+ years hands-on experience with Databricks and PySpark
Strong expertise in Delta Lake architecture and data pipeline development
Experience with Unity Catalog implementation and governance
Hands-on experience with MLflow and machine learning deployment
Strong SQL skills and data modeling expertise
Experience building enterprise-scale data quality frameworks
Experience integrating Databricks with Power BI and Azure services
Strong understanding of data governance, metadata management, and data quality dimensions
Microsoft Azure certifications
Databricks Certified Data Engineer Associate or Professional
Experience with enterprise data governance programs
Experience implementing AI-assisted data quality and remediation solutions
Knowledge of Master Data Management (MDM) principles

View all jobs at robusta

View robusta Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

8 Employees

What We Do

Robusta Technology Group (RTG) | Empowering the tech landscape with innovative digital solutions, expertise, and collaboration. Join us to unlock your business' growth potential. #TechForGrowth #DigitalTransformation