Location: Abudhabi - UAE - Onsite
Open to Relocate
Duration: 6 months ( Extendable to One Year)
Experience: 5 to 7 Years
Project start date is 1st July - Immediate joiners will be preferred
Role Overview
The Data Quality Engineer will be responsible for designing, implementing, and operating ADC's enterprise data quality framework within the Databricks platform. The role will deliver automated profiling, quality rule execution, cleansing, monitoring, remediation support, and quality reporting capabilities across 170 datasets and 1,346 prioritised Critical Data Elements (CDEs).
Working closely with Data Modellers, Data Catalogue Specialists, business data owners, and platform engineers, the Data Quality Engineer will establish scalable and reusable quality controls that improve trust, accuracy, completeness, consistency, timeliness, validity, and uniqueness across ADC's data estate.
Key Responsibilities
Databricks Platform Configuration and Administration
- Configure and manage the Databricks environment supporting enterprise data quality operations.
- Establish and maintain:
- Compute clusters.
- PySpark notebook frameworks.
- Delta Lake storage structures.
- Unity Catalog integration.
- Optimise platform performance for large-scale profiling and rule execution across all in-scope datasets and CDEs.
- Implement development, testing, and production deployment standards for data quality assets.
- Design and develop AI-assisted profiling notebooks using PySpark.
- Perform baseline data quality assessments across the six quality dimensions:
- Completeness.
- Accuracy.
- Consistency.
- Validity.
- Timeliness.
- Uniqueness.
- Capture and analyse:
- Null value rates.
- Duplicate records.
- Invalid values.
- Format violations.
- Outliers.
- Schema drift.
- Produce quality profiling outputs for all prioritised CDEs and datasets.
- Design and implement a reusable Data Quality Rule Factory.
- Build parameterised PySpark-based rule templates capable of supporting large-scale rule deployment.
- Enable automated generation and management of approximately 6,730 data quality rules without manual rule-by-rule development.
- Ensure rules are reusable, configurable, and maintainable across multiple datasets and domains.
- Deploy quality rules as reusable Databricks Jobs integrated into Delta Lake processing pipelines.
- Embed quality controls within Bronze, Silver, and Gold processing stages.
- Implement automated quality gates preventing data progression where defined thresholds are not met.
- Maintain rule traceability and execution history for audit and governance purposes.
- Develop automated remediation and cleansing pipelines using PySpark.
- Implement:
- Standardisation routines.
- Data enrichment processes.
- Deduplication logic.
- Schema harmonisation controls.
- Deploy machine learning models managed through MLflow for:
- Anomaly detection.
- Exact duplicate detection.
- Fuzzy matching and duplicate identification.
- Ensure all AI and ML recommendations are explainable, auditable, and routed through human-in-the-loop validation processes where required.
- Design and manage exception handling processes for failed quality records.
- Implement quarantine Delta Lake tables serving as the Failed Record Register.
- Capture and maintain:
- Failure reason.
- Associated CDE.
- Rule reference.
- Processing timestamp.
- Resolution status.
- Develop reprocessing workflows to support correction and controlled re-ingestion of remediated records.
- Develop Delta Lake metric aggregation structures supporting enterprise quality reporting.
- Calculate and publish:
- Data Quality Index (DQI) scores.
- Dimension-level quality metrics.
- Rule pass/fail rates.
- Dataset compliance scores.
- SLA adherence indicators.
- Provide curated outputs to support Power BI quality dashboards and executive reporting.
- Configure automated quality monitoring and alerting mechanisms.
- Implement threshold-based notifications using:
- Databricks SQL Alerts.
- Azure Monitor integrations.
- Develop predictive risk scoring models to identify datasets at risk of future quality degradation.
- Support proactive quality management and operational intervention activities.
- Apply Databricks machine learning and pattern analysis techniques to profiling and rule execution outputs.
- Support AI-assisted root cause analysis across established remediation categories.
- Identify recurring quality issues, systemic defects, and process breakdowns.
- Produce prioritised remediation datasets for business and operational stakeholders.
- Export remediation outputs to Power BI and Excel to support:
- Remediation Tiering Matrix.
- Prioritisation Scoring Models.
- Governance reporting processes.
- Collaborate with Data Modellers and Data Catalogue Specialists to ensure quality controls align with authoritative data definitions and metadata standards.
- Support DDA and DGE governance processes by producing required quality artefacts and evidence.
- Maintain documentation, version control, and audit trails for all quality assets, rules, models, and processes.
- Participate in quality reviews, governance forums, and stakeholder workshops.
- Strong experience designing and implementing enterprise Data Quality frameworks.
- Advanced Databricks engineering experience.
- Strong PySpark development skills.
- Experience with:
- Delta Lake.
- Unity Catalog.
- Databricks Workflows and Jobs.
- Databricks SQL.
- Experience building scalable data validation and quality rule frameworks.
- Knowledge of machine learning techniques for anomaly detection and data quality monitoring.
- Experience using MLflow for model management and deployment.
- Strong understanding of data governance, metadata management, and data lifecycle processes.
- Experience integrating data quality metrics into reporting platforms such as Power BI.
- Knowledge of cloud-based data engineering and modern lakehouse architectures.
- Configured Databricks Data Quality environment.
- Enterprise Data Quality Rule Factory.
- AI-assisted profiling notebooks and baseline assessment outputs.
- Automated data quality validation and gating processes.
- Data cleansing and remediation pipelines.
- Failed Record Register and reprocessing workflows.
- Data Quality metric aggregation tables.
- DQI reporting feeds and dashboard datasets.
- Predictive quality monitoring and alerting solutions.
- Root Cause Analysis and remediation support datasets.
- Governance, audit, and compliance artefacts supporting DDA and DGE reviews.
Skills Required
- 5 to 7 years experience in relevant data engineering or data quality roles
- Strong experience designing and implementing enterprise Data Quality frameworks
- Advanced Databricks engineering experience (environment configuration, clusters, Unity Catalog)
- Strong PySpark development skills
- Experience with Delta Lake and Delta Lake table design (including quarantine/failed record registers)
- Experience with Databricks Workflows, Jobs, and Databricks SQL
- Experience building scalable data validation and quality rule frameworks (parameterised rule templates)
- Experience using MLflow for model management and deployment
- Knowledge of machine learning techniques for anomaly detection, duplicate detection, fuzzy matching
- Experience integrating data quality metrics into reporting platforms such as Power BI
- Strong understanding of data governance, metadata management, and data lifecycle processes
- Knowledge of cloud-based data engineering and modern lakehouse architectures
- Onsite work in Abu Dhabi / ability to relocate
Datamatics Technologies Compensation & Benefits Highlights
The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Datamatics Technologies and has not been reviewed or approved by Datamatics Technologies.
-
Flexible Benefits — Feedback suggests flexible timings and work-from-home options are available in some roles. This flexibility is highlighted as part of the employment experience across certain postings and materials.
-
Wellbeing & Lifestyle Benefits — Feedback suggests flexibility around time off and remote work supports work-life balance. These elements can help offset leaner cash components for some individuals.
Datamatics Technologies Insights
What We Do
Datamatics Technologies (DMT) was established in Dubai. We specialize in providing onsite and offshore professional services, covering the full spectrum of Data Analytics and Data Science domains. Our experience of working with diverse industry sectors such as Telecoms, Finance, Government and Manufacturing, across multiple regions enables us to engage and deliver for our clients with confidence. We can offer our full portfolio of services through resource augmentation, managed services, both on T&M or fixed price financial arrangements. Through our end-to-end managed services offering we enable our clients to cut down costs, increase profitability and focus on value addition to their core business activities. Our project and delivery management team are certified in Agile, PMI and ITIL to ensure the planning and execution are carried out using industry best practices. We are working with our clients across Middle East and Africa Region.







