NextHire Consulting

Zettamine - Data Engineer (Pyspark)

Reposted 9 Days Ago

Be an Early Applicant

Bangalore, Bengaluru Urban, Karnataka, IND

In-Office

Senior level

Artificial Intelligence • HR Tech • Professional Services • Software

The Role

Design, build, and optimize scalable PySpark-based ETL pipelines using the Apache ecosystem. Ensure production reliability, performance tuning, data quality, lineage, and integration of structured/unstructured sources. Collaborate with data scientists, analysts, and stakeholders; write maintainable Python code and support CI/CD and version control workflows.

Summary Generated by Built In

Company : Zettamine Labs

Job Title: Senior Data Engineer (PySpark)
Experience: 5 to 8 Years
Location: Bangalore
Job Summary:
We are looking for a highly skilled and experienced Senior Data Engineer to join our team in Bangalore. The ideal candidate will have a strong background in building scalable and high-performance data pipelines using PySpark and the Apache ecosystem. This role involves close collaboration with Data Scientists, Analysts, and cross-functional teams to drive robust data solutions.

Key Responsibilities:
Design, develop, and optimize distributed data pipelines using PySpark.
Work with Apache tools such as Hadoop, Hive, HDFS, and others for large-scale data ingestion, transformation, and processing.
Ensure the performance, reliability, and scalability of ETL workflows in production environments.
Collaborate with stakeholders to gather requirements and deliver scalable data solutions.
Implement robust data quality checks and lineage tracking for auditability and transparency.
Handle data integration from diverse structured and unstructured sources.
Utilize Apache NiFi (if applicable) for automated data flow orchestration.
Write clean and maintainable code primarily in Python, with working knowledge of Java.
Participate in architectural discussions and performance tuning initiatives.
Required Skills:
5–7 years of experience in data engineering roles.
Expertise in PySpark for distributed computing and data transformation.
Strong understanding of Apache ecosystem (Hadoop, Hive, Spark, HDFS).
Knowledge of ETL principles, data modeling, and data warehousing concepts.
Experience working with large-scale datasets and optimizing performance.
Hands-on proficiency with SQL and exposure to NoSQL databases.
Solid coding skills in Python, with working knowledge of Java.
Experience with version control (Git) and working in CI/CD environments.

Skills Required

5-7 years of experience in data engineering roles
Expertise in PySpark for distributed computing and data transformation
Strong understanding of Apache ecosystem (Hadoop, Hive, Spark, HDFS)
Knowledge of ETL principles, data modeling, and data warehousing concepts
Experience working with large-scale datasets and optimizing performance
Hands-on proficiency with SQL
Exposure to NoSQL databases
Solid coding skills in Python
Working knowledge of Java
Experience with version control (Git)
Experience working in CI/CD environments
Implement robust data quality checks and lineage tracking
Utilize Apache NiFi (if applicable)

View all jobs at NextHire Consulting

View NextHire Consulting Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

100 Employees

What We Do

NextHire Consulting is an AI-driven recruiting platform that streamlines the hiring process for companies. By leveraging AI agents for sourcing, screening, and interviewing, the platform enables teams to focus on pre-qualified finalists. It provides data-driven insights into candidate soft skills and behavioral styles, aiming to disrupt traditional recruitment models with efficient, automated, and science-based talent acquisition solutions for businesses of all sizes.