Job Summary
Synechron is seeking a Data Engineer – AWS + Hadoop to build and optimize scalable data pipelines, data lake solutions, and distributed data platforms. This role supports analytics, machine learning, and reporting by delivering reliable, secure, and cost-efficient data solutions.
Software Requirements
Required
AWS: S3, Glue, EMR, Athena, Lambda, Redshift, IAM, CloudWatch
Hadoop ecosystem: HDFS, Hive, Spark, Kafka, Oozie and/or Airflow
Spark with PySpark and/or Scala
SQL, Python or Scala, Shell scripting
Kafka and/or Kinesis
Airflow and/or AWS Step Functions
Git, Docker
CI/CD using Jenkins or GitHub Actions
Experience with data modeling, partitioning, metadata, and data quality checks
Knowledge of security and governance including IAM, encryption, RBAC, and PII handling
Preferred
Lake Formation
Curated data APIs or analytics views
Cost optimization and advanced observability practices
Overall Responsibilities
Design and implement ETL/ELT pipelines for batch and streaming workloads
Build ingestion frameworks using Kafka/Kinesis and Spark
Develop and optimize AWS-based data lakes and warehouses
Manage Hadoop ecosystem tools and job orchestration
Implement data quality, governance, and access controls
Monitor pipelines and improve cost, performance, and reliability
Collaborate with analytics, ML, and BI teams to deliver curated datasets
Participate in code reviews, documentation, and engineering standards
Technical Skills (By Category)
Programming Languages
Essential: SQL, Python and/or Scala, Shell scripting
Preferred: Advanced PySpark optimization
Databases / Data Management
Essential: Data modeling, schema design, partitioning, metadata management, Redshift, Hive
Preferred: Curated data services and advanced cataloging
Cloud Technologies
Essential: AWS data services including S3, Glue, EMR, Athena, Lambda, Redshift, IAM, CloudWatch
Preferred: Lake Formation and cost optimization strategies
Frameworks and Libraries
Essential: Spark, Kafka/Kinesis, Hadoop ecosystem tools
Preferred: Structured Streaming and reusable ingestion frameworks
Development Tools and Methodologies
Essential: Git, Docker, CI/CD, Airflow or Step Functions, code reviews, monitoring
Preferred: Automated testing for data pipelines
Security Protocols
Essential: IAM, encryption, RBAC, PII handling, secure access controls
Preferred: Fine-grained governance and audit readiness practices
Experience Requirements
7+ years in Data Engineering or related roles
Experience with large-scale distributed data systems
Strong hands-on background in AWS data services and Hadoop ecosystem tools
Experience with batch and streaming pipelines, SQL tuning, and production support
Equivalent related experience will also be considered
Day-to-Day Activities
Build and maintain batch and streaming pipelines
Optimize Spark jobs, SQL queries, and storage patterns
Monitor job health, logs, metrics, and data quality
Troubleshoot issues and implement preventive fixes
Work with analytics, ML, BI, and engineering teams
Join planning, design reviews, code reviews, and release activities
Qualifications
Required
Bachelor’s degree in Computer Science, Engineering, Information Systems, Mathematics, or related field
or equivalent practical experience
Preferred
AWS or data engineering certifications
Ongoing learning in cloud data platforms, governance, and automation
Professional Competencies
Strong analytical and problem-solving skills
Clear communication and cross-functional collaboration
Effective time and priority management
Adaptability to evolving technologies and requirements
Focus on reliability, data quality, and continuous improvement
SYNECHRON’S DIVERSITY & INCLUSION STATEMENT
Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.
All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.
Candidate Application Notice
Skills Required
- AWS (S3, Glue, EMR, Athena, Lambda, Redshift, IAM, CloudWatch)
- Hadoop ecosystem (HDFS, Hive, Spark, Kafka, Oozie)
- Spark with PySpark and/or Scala
- SQL, Python and/or Scala, Shell scripting
- Kafka and/or Kinesis for ingestion
- Airflow and/or AWS Step Functions for orchestration
- Git and Docker
- CI/CD using Jenkins or GitHub Actions
- Experience with data modeling, partitioning, metadata management, and data quality checks
- Knowledge of security and governance including IAM, encryption, RBAC, and PII handling
- 7+ years in Data Engineering or related roles
- Experience with large-scale distributed data systems, batch and streaming pipelines, SQL tuning, and production support
- Bachelor's degree in Computer Science, Engineering, Information Systems, Mathematics, or equivalent practical experience
- Preferred: Lake Formation
- Preferred: Curated data APIs or analytics views, cost optimization and observability practices
- Preferred: AWS or data engineering certifications and ongoing cloud data platform learning