The Role
Design, develop, and maintain ETL/data pipelines and data models. Collaborate with stakeholders for requirements, lead data validation/UAT, implement automated data quality checks, and document data integration and architecture.
Summary Generated by Built In
Data Engineer Consultant – Job Description
Job Summary:
Data Engineer (DE) Consultant is responsible for designing, developing, and maintaining data assets and data related products by liaising with multiple stakeholders..
Qualifications and Skills:
- Strong knowledge on Python and Pyspark
- Expectation is to have ability to write Pyspark scripts for developing data workflows.
- Strong knowledge on SQL, Hadoop, Hive, Azure, Databricks and Greenplum
- Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
- Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
- Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
- Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
- Expectation is to have strong problem-solving and troubleshooting skills.
- Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
- Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
- 5-7 years of experience in Data Engineer.
- Proficiency in programming languages commonly used in data engineering, such as Python, Pyspark, SQL.
- Experience in Azure cloud computing platform, such as developing ETL processes using Azure Data Factory, big data processing and analytics with Azure Databricks.
- Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
Responsibilities:
- Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
- Create the data integration and data diagram documentation.
- Lead the data validation, UAT and regression test for new data asset creation.
- Create and maintain data models, including schema design and optimization.
- Create and manage data pipelines that automate the flow of data, ensuring data quality and consistency
Any Graduate
Skills Required
- Strong knowledge of Python and PySpark with ability to write PySpark scripts for data workflows
- Strong SQL skills to query metadata and tables across Oracle, Hive, Databricks, and Greenplum
- Experience with Hadoop, Hive, Spark and distributed computing frameworks
- Experience with Azure cloud platform, including Azure Data Factory and Azure Databricks
- Experience with Databricks and Greenplum
- Familiarity with Hue and ability to run Hive SQL and schedule Apache Oozie jobs
- Experience designing, developing, and maintaining complex ETL processes and data pipelines
- Experience creating and maintaining data models, schema design and optimization
- Experience leading data validation, UAT and regression testing for new data assets
- Ability to establish data quality test cases, procedures and implement automated data validation
- Strong communication, stakeholder collaboration, problem-solving and troubleshooting skills
- Degree in Data Science, Statistics, Computer Science or related field (Any Graduate)
- 5-7 years of experience in Data Engineering
Am I A Good Fit?
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.
Success! Refresh the page to see how your skills align with this role.
The Company