Data Engineer – AWS + Hadoop

Posted 2 Days Ago
Be an Early Applicant
Bellandur, Bangalore, Karnataka, IND
In-Office
Senior level
Fintech • Financial Services
The Role
Design, build, and optimize scalable batch and streaming ETL/ELT pipelines and AWS-based data lakes/warehouses using Hadoop and Spark. Implement data governance, quality checks, access controls, monitoring, and cost/performance improvements while collaborating with analytics, ML, and BI teams.
Summary Generated by Built In

Job Summary

Synechron is seeking a Data Engineer – AWS + Hadoop to build and optimize scalable data pipelines, data lake solutions, and distributed data platforms. This role supports analytics, machine learning, and reporting by delivering reliable, secure, and cost-efficient data solutions.

Software Requirements

Required

  • AWS: S3, Glue, EMR, Athena, Lambda, Redshift, IAM, CloudWatch

  • Hadoop ecosystem: HDFS, Hive, Spark, Kafka, Oozie and/or Airflow

  • Spark with PySpark and/or Scala

  • SQL, Python or Scala, Shell scripting

  • Kafka and/or Kinesis

  • Airflow and/or AWS Step Functions

  • Git, Docker

  • CI/CD using Jenkins or GitHub Actions

  • Experience with data modeling, partitioning, metadata, and data quality checks

  • Knowledge of security and governance including IAM, encryption, RBAC, and PII handling

Preferred

  • Lake Formation

  • Curated data APIs or analytics views

  • Cost optimization and advanced observability practices

Overall Responsibilities

  • Design and implement ETL/ELT pipelines for batch and streaming workloads

  • Build ingestion frameworks using Kafka/Kinesis and Spark

  • Develop and optimize AWS-based data lakes and warehouses

  • Manage Hadoop ecosystem tools and job orchestration

  • Implement data quality, governance, and access controls

  • Monitor pipelines and improve cost, performance, and reliability

  • Collaborate with analytics, ML, and BI teams to deliver curated datasets

  • Participate in code reviews, documentation, and engineering standards

Technical Skills (By Category)

Programming Languages

Essential: SQL, Python and/or Scala, Shell scripting
Preferred: Advanced PySpark optimization

Databases / Data Management

Essential: Data modeling, schema design, partitioning, metadata management, Redshift, Hive
Preferred: Curated data services and advanced cataloging

Cloud Technologies

Essential: AWS data services including S3, Glue, EMR, Athena, Lambda, Redshift, IAM, CloudWatch
Preferred: Lake Formation and cost optimization strategies

Frameworks and Libraries

Essential: Spark, Kafka/Kinesis, Hadoop ecosystem tools
Preferred: Structured Streaming and reusable ingestion frameworks

Development Tools and Methodologies

Essential: Git, Docker, CI/CD, Airflow or Step Functions, code reviews, monitoring
Preferred: Automated testing for data pipelines

Security Protocols

Essential: IAM, encryption, RBAC, PII handling, secure access controls
Preferred: Fine-grained governance and audit readiness practices

Experience Requirements

  • 7+ years in Data Engineering or related roles

  • Experience with large-scale distributed data systems

  • Strong hands-on background in AWS data services and Hadoop ecosystem tools

  • Experience with batch and streaming pipelines, SQL tuning, and production support

  • Equivalent related experience will also be considered

Day-to-Day Activities

  • Build and maintain batch and streaming pipelines

  • Optimize Spark jobs, SQL queries, and storage patterns

  • Monitor job health, logs, metrics, and data quality

  • Troubleshoot issues and implement preventive fixes

  • Work with analytics, ML, BI, and engineering teams

  • Join planning, design reviews, code reviews, and release activities

Qualifications

Required

  • Bachelor’s degree in Computer Science, Engineering, Information Systems, Mathematics, or related field
    or equivalent practical experience

Preferred

  • AWS or data engineering certifications

  • Ongoing learning in cloud data platforms, governance, and automation

Professional Competencies

  • Strong analytical and problem-solving skills

  • Clear communication and cross-functional collaboration

  • Effective time and priority management

  • Adaptability to evolving technologies and requirements

  • Focus on reliability, data quality, and continuous improvement

S​YNECHRON’S DIVERSITY & INCLUSION STATEMENT
 

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Skills Required

  • AWS (S3, Glue, EMR, Athena, Lambda, Redshift, IAM, CloudWatch)
  • Hadoop ecosystem (HDFS, Hive, Spark, Kafka, Oozie)
  • Spark with PySpark and/or Scala
  • SQL, Python and/or Scala, Shell scripting
  • Kafka and/or Kinesis for ingestion
  • Airflow and/or AWS Step Functions for orchestration
  • Git and Docker
  • CI/CD using Jenkins or GitHub Actions
  • Experience with data modeling, partitioning, metadata management, and data quality checks
  • Knowledge of security and governance including IAM, encryption, RBAC, and PII handling
  • 7+ years in Data Engineering or related roles
  • Experience with large-scale distributed data systems, batch and streaming pipelines, SQL tuning, and production support
  • Bachelor's degree in Computer Science, Engineering, Information Systems, Mathematics, or equivalent practical experience
  • Preferred: Lake Formation
  • Preferred: Curated data APIs or analytics views, cost optimization and observability practices
  • Preferred: AWS or data engineering certifications and ongoing cloud data platform learning
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
12,827 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account