Make an impact with NTT DATA
Join a company that is pushing the boundaries of what is possible. We are renowned for our technical excellence and leading innovations, and for making a difference to our clients and society. Our workplace embraces diversity and inclusion – it’s a place where you can grow, belong and thrive.
Position Summary
We are seeking a Lead Data Engineer with 5+ years of big data and cloud data architecture expertise to design and modernize data platforms for large-scale transformation programs. This role combines deep technical hands-on expertise with architectural leadership, requiring mastery of big data frameworks (Spark, Flink, Kafka), cloud data platforms (BigQuery, Dataplex, Dataproc, Dataflow), and modern data stack technologies (Trino, Cloud Composer/Airflow, BigLake). The successful candidate will own data architecture design for both lift-and-shift migrations and cloud-native modernization, establish data governance frameworks, and provide technical leadership to engineering teams. You will guide data platform migrations, ensure data quality and validation, and mentor engineers on best practices.
Key Responsibilities
Data Architecture Design & Modernization: Design target data architectures for OSS lift-and-shift and cloud-native modernization scenarios. Make architectural decisions balancing performance, cost, scalability, and operational excellence. Design for multi-cloud flexibility where applicable.
Data Platform Migration Strategy: Own migration approach for big data platforms including HDFS to GCS migration, Hadoop ecosystem modernization, and data warehouse transformations. Establish phased migration patterns and validation strategies.
GCP Data Platform Architecture: Design BigQuery, Dataplex, BigLake, and data lakehouse architectures. Establish data organization patterns, access controls, and metadata management. Optimize for cost, query performance, and data discovery.
Big Data Processing Architecture: Design Spark, Flink, and Dataflow pipelines for batch and streaming data processing. Establish processing patterns, optimization strategies, and cost management approaches. Design for scale, fault tolerance, and performance.
Data Orchestration & Workflow Design: Design Cloud Composer and Airflow DAGs for complex data workflows. Establish orchestration patterns, error handling, monitoring, and retry strategies. Ensure reliable, maintainable workflow execution.
Data Governance & Metadata Management: Design data governance frameworks including data ownership, access controls, and metadata standards. Establish data lineage tracking, data catalogs, and governance policies. Guide governance implementation across platforms.
Data Quality & Validation: Establish data quality frameworks and validation strategies for migrations. Define quality rules, reconciliation criteria, and acceptance thresholds. Implement data quality monitoring and alerting.
Hive Metastore & Metadata Architecture: Design migration strategies for Hive Metastore to BigLake Metastore/Dataplex. Manage table schema migration, partition strategy optimization, and metadata preservation. Ensure compatibility with existing tools.
Query Engine Architecture & Trino Integration: Design query engine architectures supporting multiple data sources. Establish Trino/Presto configurations for federated query access. Ensure compatibility with existing SQL tools and applications.
Streaming Data Architecture: Design Kafka-based streaming architectures for real-time data ingestion. Establish Kafka topics, partition strategies, and consumer patterns. Design streaming pipelines using Dataflow or Flink.
Data Ingestion & ELT Pipeline Design: Design scalable data ingestion patterns for structured and unstructured data. Establish ELT/ETL frameworks optimized for cloud platforms. Design connectors and integrations with source systems.
Storage Architecture & Optimization: Design GCS-based storage architectures for data lakes. Establish data organization, partitioning, and lifecycle policies. Optimize storage costs and access patterns.
Data Lakehouse Architecture: Design modern lakehouse architectures combining data lake and warehouse capabilities. Establish table formats, schema management, and ACID transaction support. Design for analytics and ML workloads.
Technical Leadership & Mentoring: Provide technical leadership to data engineering teams. Mentor engineers on data architecture patterns, best practices, and technical decision-making. Conduct design reviews and architecture discussions.
Code Quality & Best Practices: Establish code quality standards, design patterns, and testing frameworks. Review data pipeline code and designs. Drive adoption of best practices for maintainability, testability, and performance.
Performance Optimization & Cost Management: Optimize data pipelines for performance and cost. Profile and tune Spark, Dataflow, and BigQuery workloads. Establish cost monitoring and optimization practices across data platforms.
Documentation & Knowledge Sharing: Create comprehensive data architecture documentation and design guides. Document migration approaches, implementation patterns, and operational procedures. Foster knowledge sharing across teams.
Required Qualifications
Minimum 5 years of professional data engineering experience
Minimum 5+ years of big data architecture and cloud data platform experience
Expert-level proficiency in Apache Spark for large-scale data processing
Strong experience with GCP data platforms (BigQuery, Dataproc, Dataflow, Dataplex, BigLake)
Hands-on experience designing and implementing data pipelines and ETL/ELT processes
Strong knowledge of data lake and lakehouse architectures
Proficiency with workflow orchestration tools (Apache Airflow, Cloud Composer)
Experience with big data technologies (Hadoop, HDFS, Hive, Kafka) and modernization approaches
Understanding of data governance, data lineage, and metadata management
Proficiency in Python and/or Scala for data engineering tasks
Experience with SQL and data warehouse technologies
Strong understanding of data quality and validation frameworks
Excellent communication skills with ability to explain complex data concepts
Experience mentoring and providing technical guidance to engineering teams
Preferred Qualifications & Certifications
Google Cloud Certified — Professional Data Engineer ⭐
Google Cloud Certified — Professional Cloud Architect
Google Cloud Certified — Associate Cloud Engineer
AWS Certified Data Analytics – Specialty
Experience with Trino/Presto for federated query engines
Experience with Apache Flink or other stream processing frameworks
Experience with modern data lakehouse technologies (Delta Lake, Iceberg)
Experience with data cataloging and governance tools
Experience in financial services, healthcare, or large-scale enterprise data environments
Workplace type:
About NTT DATA
NTT DATA is a $30+ billion business and technology services leader, serving 75% of the Fortune Global 100. We are committed to accelerating client success and positively impacting society through responsible innovation. We are one of the world’s leading AI and digital infrastructure providers, with unmatched capabilities in enterprise-scale AI, cloud, security, connectivity, data centers and application services. Our consulting and industry solutions help organizations and society move confidently and sustainably into the digital future. As a Global Top Employer, we have experts in more than 50 countries. We also offer clients access to a robust ecosystem of innovation centers as well as established and start-up partners. NTT DATA is part of NTT Group, which invests over $3 billion each year in R&D.
Equal Opportunity Employer
NTT DATA is proud to be an Equal Opportunity Employer with a global culture that embraces diversity. We are committed to providing an environment free of unfair discrimination and harassment. We do not discriminate based on age, race, colour, gender, sexual orientation, religion, nationality, disability, pregnancy, marital status, veteran status, or any other protected category. Join our growing global team and accelerate your career with us. Apply today.
Third parties fraudulently posing as NTT DATA recruiters
NTT DATA recruiters will never ask job seekers or candidates for payment or banking information during the recruitment process, for any reason. Please remain vigilant of third parties who may attempt to impersonate NTT DATA recruiters—whether in writing or by phone—in order to deceptively obtain personal data or money from you. All email communications from an NTT DATA recruiter will come from an @nttdata.com email address. If you suspect any fraudulent activity, please contact us.
Skills Required
- Minimum 5 years of professional data engineering experience
- Minimum 5+ years of big data architecture and cloud data platform experience
- Expert-level proficiency in Apache Spark for large-scale data processing
- Strong experience with GCP data platforms (BigQuery, Dataproc, Dataflow, Dataplex, BigLake)
- Hands-on experience designing and implementing data pipelines and ETL/ELT processes
- Strong knowledge of data lake and lakehouse architectures
- Proficiency with workflow orchestration tools (Apache Airflow, Cloud Composer)
- Experience with big data technologies (Hadoop, HDFS, Hive, Kafka) and modernization approaches
- Understanding of data governance, data lineage, and metadata management
- Proficiency in Python and/or Scala for data engineering tasks
- Experience with SQL and data warehouse technologies
- Strong understanding of data quality and validation frameworks
- Excellent communication skills with ability to explain complex data concepts
- Experience mentoring and providing technical guidance to engineering teams
- Google Cloud Certified -- Professional Data Engineer (preferred)
- Google Cloud Certified -- Professional Cloud Architect (preferred)
- Google Cloud Certified -- Associate Cloud Engineer (preferred)
- AWS Certified Data Analytics - Specialty (preferred)
- Experience with Trino/Presto for federated query engines (preferred)
- Experience with Apache Flink or other stream processing frameworks (preferred)
- Experience with modern data lakehouse technologies (Delta Lake, Iceberg) (preferred)
- Experience with data cataloging and governance tools (preferred)
- Experience in financial services, healthcare, or large-scale enterprise data environments (preferred)
NTT DATA Compensation & Benefits Highlights
The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about NTT DATA and has not been reviewed or approved by NTT DATA.
-
Fair & Transparent Compensation — Feedback suggests salary bands and grades are clearly defined, making ranges and promotion criteria easier to understand. Standardized HR processes provide visibility into levels across common delivery roles.
-
Healthcare Strength — Feedback suggests the package includes comprehensive medical, dental, and vision options with HSA/FSA eligibility. Global materials emphasize comprehensive insurance and wellbeing as baseline offerings across regions.
-
Wellbeing & Lifestyle Benefits — Flexible work options, including remote/hybrid arrangements, are highlighted as core benefits and can support work–life balance. Some delivery teams report more manageable hours than strategy consultancies, improving perceived value for time.
NTT DATA Insights
What We Do
NTT DATA, Inc. is a trusted global innovator of business and technology services. We're committed to helping clients innovate, optimize and transform for long-term success. Our R&D investments help organizations and society move confidently and sustainably into the digital future. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure, and connectivity








