Equifax Inc.

Lead - Data Engineer

Reposted 6 Hours Ago

Be an Early Applicant

Bangalore, Bengaluru Urban, Karnataka, IND

In-Office

Senior level

Fintech • Consulting

The Role

As a Lead Data Engineer, you will design and implement data architectures, manage ETL/ELT pipelines, and optimize SQL queries to ensure efficient data integration and analytics across the Azure ecosystem.

Summary Generated by Built In

Equifax is where you can power your possible. If you want to achieve your true potential, chart new paths, develop new skills, collaborate with bright minds, and make a meaningful impact, we want to hear from you.

Synopsis of the role

As a Lead Data Engineer, you will serve as the technical backbone of our data ecosystem, spearheading the design and implementation of high-performance data architectures using Azure Databricks and PySpark. You will be responsible for orchestrating complex, scalable ETL/ELT pipelines within Azure Data Factory, ensuring seamless data integration. By leveraging your mastery of SQL and distributed computing, you will optimize large-scale datasets to drive advanced analytics and business intelligence initiatives.

What you’ll do

As a Lead Data Engineer, you are expected to drive the technical roadmap and execution of our data strategy. Your role will encompass the following core responsibilities:
Medallion Architecture Implementation: Design and maintain a multi-layered data lakehouse (Bronze, Silver, Gold) to ensure data quality, lineage, and structural refinement from raw ingestion to business-ready assets.
Delta Lake Development: Build and optimize high-performance tables using Delta Lake, leveraging features like ACID transactions, schema enforcement, and time travel to ensure data reliability.
Star Schema Data Modeling: Architect robust dimensional models and Star Schemas in the Gold layer to simplify data access and optimize query performance for downstream BI tools.
Data Governance with Unity Catalog: Implement and manage centralized Unity Catalog configurations to enforce fine-grained access control, data discovery, and comprehensive lineage across the Azure workspace.
Scalable PySpark Engineering: Develop, test, and deploy complex data transformation logic using PySpark, ensuring efficient distributed processing and resource utilization within Databricks clusters.
End-to-End Pipeline Orchestration: Create and monitor sophisticated ETL/ELT workflows using Azure Data Factory (ADF), integrating diverse data sources into a unified cloud ecosystem.
Data Mart Development: Build specialized, high-performance Data Marts tailored to specific business domains, enabling self-service analytics and rapid decision-making for stakeholders.
Advanced SQL Optimization: Write and tune complex SQL queries for data analysis and validation, ensuring that data processing logic is both performant and cost-effective.
Performance Tuning & CI/CD: Lead efforts in cluster configuration, partition tuning, and the automation of deployment pipelines using Azure DevOps to ensure high availability and continuous delivery.

What experience you need

Total Data Engineering Experience: Minimum of 8+ years in data engineering, data warehousing, or Database development roles.
Azure Ecosystem Expertise: At least 4+ years of hands-on experience specifically building and deploying production-grade solutions on Azure.
Databricks & PySpark Mastery: A minimum of 3+ years leading projects that utilize Azure Databricks and PySpark for large-scale distributed data processing.
Lead/Architectural Experience: At least 2+ years in a "Lead" or "Senior" capacity, with documented experience in designing end-to-end data architectures (e.g., transitioning a legacy system to a Medallion architecture).
SQL Proficiency: 6+ years of advanced SQL development, including performance tuning, complex window functions, and stored procedure optimization.
Production Pipeline Delivery: Proven track record of deploying at least 3-5 enterprise-scale data pipelines using Azure Data Factory (ADF) from inception to production.
Education/Certifications: Bachelor’s / Master’s degree in CS or related field, Possession of at least one relevant professional certification, such as Microsoft Certified: Azure Data Engineer Associate (DP-203) or Databricks Certified Data Engineer Professional.

What could set you apart

1. Advanced Databricks Optimization & Lakehouse Features

Databricks SQL & Serverless: Experience migrating traditional SQL workloads to Databricks SQL Warehouses to reduce latency and overhead.

Delta Live Tables (DLT): Proven ability to implement declarative data pipelines that handle task orchestration, monitoring, and quality constraints automatically.
Liquid Clustering: Mastery of the latest replacement for Z-Ordering to optimize data layout and query performance without manual partition management.

2. DevOps & Infrastructure as Code (IaC)

Terraform/Bicep: Experience deploying entire Azure environments (Resource Groups, Storage Accounts, Databricks Workspaces) using IaC to ensure environment parity across Dev, QA, and Production.
Unit Testing for Spark: Experience using frameworks like pytest or chispa to validate PySpark logic, ensuring a robust CI/CD cycle rather than "testing in production."

3. Comprehensive Data Governance & Security

Unity Catalog Migration: Experience leading a migration from legacy system to Unity Catalog, including managing Identity Federation and Cross-Workspace sharing.
Fine-Grained Security: Implementation of Row-Level Security (RLS) and Column-Level Masking directly within Databricks to comply with strict privacy regulations (GDPR/CCPA).

4. Real-Time & Hybrid Processing

Structured Streaming: Building production-grade, low-latency pipelines that process data in real-time or near-real-time from Azure Event Hubs or IoT Hubs.
Change Data Capture (CDC): Implementing efficient CDC patterns (using tools like Debezium or ADF's built-in CDC) to sync on-premise relational databases with the Delta Lake in near-real-time.

5. Cost Governance & FinOps

DBU & Cost Management: A track record of implementing Databricks Cluster Policies and tagging strategies to monitor and reduce DBU (Databricks Unit) consumption.
Optimization of ADF Triggers: Knowledge of when to use "Tumbling Window" vs. "Schedule" triggers and optimizing Integration Runtimes to minimize execution costs.

#India

We offer a hybrid work setting, comprehensive compensation and healthcare packages, attractive paid time off, and organizational growth potential through our online learning platform with guided career tracks.

Are you ready to power your possible? Apply today, and get started on a path toward an exciting new career at Equifax, where you can make a difference!

Primary Location:

IND-Bangalore-Equifax-Analytics

Function:

Function - Data and Analytics

Schedule:

Full time

Top Skills

Azure

Azure Data Factory

Bicep

Databricks

Delta Lake

Pyspark

SQL

Terraform

View all jobs at Equifax Inc.

View Equifax Inc. Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Atlanta, GA

16,742 Employees

What We Do

At Equifax (NYSE: EFX), we believe knowledge drives progress. As a global data, analytics, and technology company, we play an essential role in the global economy by helping financial institutions, companies, employers, and government agencies make critical decisions with greater confidence. Our unique blend of differentiated data, analytics, and cloud technology drives insights to power decisions to move people forward. Headquartered in Atlanta and supported by nearly 15,000 employees worldwide, Equifax operates or has investments in 24 countries in North America, Central and South America, Europe, and the Asia Pacific region. For more information, visit Equifax.com.