Data Engineer - Chennai / Bengaluru

Posted Yesterday
Be an Early Applicant
Hiring Remotely in India
Remote
Senior level
Agency • Information Technology
The Role
Design, build, and maintain scalable ELT/data pipelines and connectors using Airflow, Python, and PySpark. Implement DataOps and CI/CD for data workflows, monitor and troubleshoot pipelines, validate data quality, collaborate with cross-functional teams, and document data processes and governance.
Summary Generated by Built In

Data Engineer : 
Job Description :

  • Develop and maintain data pipelines, ELT processes, and workflow orchestration using Apache Airflow, Python and PySpark to ensure the efficient and reliable delivery of data.
  • Design and implement custom connectors to facilitate the ingestion of diverse data sources into our platform, including structured and unstructured data from various document formats .
  • Collaborate closely with cross-functional teams to gather requirements, understand data needs, and translate them into technical solutions.
  • Implement DataOps principles and best practices to ensure robust data operations and efficient data delivery.
  • Design and implement data CI/CD pipelines to enable automated and efficient data integration, transformation, and deployment processes.
  • Monitor and troubleshoot data pipelines, proactively identifying and resolving issues related to data ingestion, transformation, and loading.
  • Conduct data validation and testing to ensure the accuracy, consistency, and compliance of data.
  • Stay up-to-date with emerging technologies and best practices in data engineering.
  • Document data workflows, processes, and technical specifications to facilitate knowledge sharing and ensure data governance.

Responsibilities:

  • Bachelor's degree in Computer Science, Engineering, or a related field
  • 8 - 10 years experience in data engineering, ELT development, and data modeling.
  • Proficiency in using Apache Airflow and Spark for data transformation, data integration, and data management.
  • Experience implementing workflow orchestration using tools like Apache Airflow, SSIS or similar platforms.
  • Demonstrated experience in developing custom connectors for data ingestion from various sources.
  • Strong understanding of SQL and database concepts, with the ability to write efficient queries and optimize performance.
  • Experience implementing DataOps principles and practices, including data CI/CD pipelines.
  • Excellent problem-solving and troubleshooting skills, with a strong attention to detail.
  • Effective communication and collaboration abilities, with a proven track record of working in cross-functional teams.
  • Familiarity with data visualization tools Apache SuperSet and dashboard development.
  • Understanding of distributed systems and working with large-scale datasets.
  • Familiarity with data governance frameworks and practices.
  • Knowledge of data streaming and real-time data processing technologies (e.g., Apache Kafka).
  • Strong understanding of software development principles and practices, including version control (e.g., Git) and code review processes.
  • Experience with Agile development methodologies and working in cross-functional Agile teams.
  • Ability to adapt quickly to changing priorities and work effectively in a fast-paced environment.
  • Excellent analytical and problem-solving skills, with a keen attention to detail.
  • Strong written and verbal communication skills, with the ability to effectively communicate complex technical concepts to both technical and non-technical stakeholders.

Required Skills – 

DevOps (Heavy), PythonPysparkSql,Airflow, Trino, Hive, Snowflake, Agile Scrum

Good to have– 

Linux,OpenshiftKubernentes, Superset


Skills Required

  • Bachelor's degree in Computer Science, Engineering, or related field
  • 8-10 years experience in data engineering, ELT development, and data modeling
  • Proficiency with Apache Airflow for workflow orchestration
  • Proficiency with PySpark / Apache Spark for data transformation
  • Strong SQL skills and database concepts, query optimization
  • Experience developing custom connectors for data ingestion
  • Experience implementing DataOps principles and data CI/CD pipelines
  • Experience with Snowflake, Trino, Hive or similar data platforms
  • Familiarity with data visualization and dashboarding (Apache Superset)
  • Knowledge of data streaming / real-time processing (Apache Kafka)
  • Experience with version control and code review processes (Git)
  • Experience working in Agile/Scrum cross-functional teams
  • Understanding of distributed systems and large-scale datasets
  • Strong troubleshooting, analytical, and communication skills
  • DevOps experience (CI/CD, automation)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
5,017 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account