The Role
Design, build, optimize, and deploy large-scale Spark/Scala data processing and ETL pipelines on Cloudera (CDH). Collaborate with data teams, ensure data integrity and security, troubleshoot performance, and implement version control and CI/CD for Spark applications.
Summary Generated by Built In
Key Responsibilities:
- Develop, test, and deploy data processing applications using Apache Spark and Scala.
- Optimize and tune Spark applications for better performance on large-scale data sets.
- Work with the Cloudera Hadoop ecosystem (e.g., HDFS, Hive, Impala, HBase, Kafka) to build data pipelines and storage solutions.
- Collaborate with data scientists, business analysts, and other developers to understand data requirements and deliver solutions.
- Design and implement high-performance data processing and analytics solutions.
- Ensure data integrity, accuracy, and security across all processing tasks.
- Troubleshoot and resolve performance issues in Spark, Cloudera, and related technologies.
- Implement version control and CI/CD pipelines for Spark applications.
Required Skills & Experience:
- Minimum 8 years of experience in application development.
- Strong hands on experience in Apache Spark, Scala, and Spark SQL for distributed data processing.
- Hands-on experience with Cloudera Hadoop (CDH) components such as HDFS, Hive, Impala, HBase, Kafka, and Sqoop.
- Familiarity with other Big Data technologies, including Apache Kafka, Flume, Oozie, and Nifi.
- Experience building and optimizing ETL pipelines using Spark and working with structured and unstructured data.
- Experience with SQL and NoSQL databases such as HBase, Hive, and PostgreSQL.
- Knowledge of data warehousing concepts, dimensional modeling, and data lakes.
- Ability to troubleshoot and optimize Spark and Cloudera platform performance.
- Familiarity with version control tools like Git and CI/CD tools (e.g., Jenkins, GitLab).
Skills Required
- Minimum 8 years of experience in application development
- Strong hands-on experience in Apache Spark, Scala, and Spark SQL
- Hands-on experience with Cloudera Hadoop (CDH) components such as HDFS, Hive, Impala, HBase, Kafka, and Sqoop
- Familiarity with Big Data technologies including Flume, Oozie, and NiFi
- Experience building and optimizing ETL pipelines using Spark for structured and unstructured data
- Experience with SQL and NoSQL databases such as HBase, Hive, and PostgreSQL
- Knowledge of data warehousing concepts, dimensional modeling, and data lakes
- Ability to troubleshoot and optimize Spark and Cloudera platform performance
- Familiarity with version control (Git) and CI/CD tools (Jenkins, GitLab)
Am I A Good Fit?
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.
Success! Refresh the page to see how your skills align with this role.
The Company