Business Area:
EngineeringSeniority Level:
Mid-Senior levelJob Description:
At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source community, Cloudera advances digital transformation for the world’s largest enterprises.
The Data Platform Pillar is the bedrock of Cloudera’s technology, where we design and build the core components that let our customers store, manage, and process data with unmatched scalability, security, and performance.
As a Staff Software Engineer of the Replication Manager team, you will be responsible for designing, developing, and maintaining enterprise-grade data replication solutions that enable seamless data movement across hybrid and multi-cloud environments. You'll work on critical infrastructure that helps Fortune 500 companies manage their data lifecycle and migration strategies.
As the Staff Software Engineer you will lead: (5-7 bullets)
Data Replication Engineering — Design and implement scalable replication services across HDFS, Hive, HBase, Apache Iceberg, and other big data technologies, with strong data consistency and minimal downtime.
Cloud Migration — Lead complex data migration initiatives between on-premises clusters and cloud environments including AWS S3 and Azure ADLS Gen2.
API & Microservices Development — Build robust APIs and microservices for the Replication Manager platform, along with advanced features like bandwidth throttling, scheduling, and policy management.
Distributed Systems Architecture — Design fault-tolerant, petabyte-scale distributed systems with comprehensive monitoring, alerting, and observability capabilities.
Security & Governance — Ensure data security and governance compliance during movement operations, leveraging Apache Atlas for metadata lineage and data discovery.
Product & Innovation — Drive technical decisions for new features, evaluate emerging replication and cloud technologies, and translate business requirements into technical specifications.
Cross-functional Collaboration — Partner with CDP, SRE, and field engineering teams to integrate replication capabilities, resolve customer escalations, and improve system reliability.
Technical Mentorship — Guide junior engineers on best practices, conduct code reviews, and contribute to technical documentation and customer-facing materials.
We’re excited about you if you have: (Minimum Qualifications)
12+ years in software engineering with strong proficiency in Java, Scala, or Python, and deep hands-on experience with the Apache Hadoop ecosystem (HDFS, Hive, HBase, YARN).
Solid experience with modern data formats including Apache Iceberg, Delta Lake, and Hive tables with ACID support, alongside streaming technologies like Kafka and Pulsar.
Practical experience across AWS, Azure, and GCP storage services, with working knowledge of containerization tools like Docker and Kubernetes.
Proven ability to architect large-scale distributed systems with a strong grasp of data consistency models, CAP theorem, microservices, and API design.
Familiarity with security protocols and data governance frameworks to ensure compliant and trustworthy data operations.
Well-versed in agile SDLC, CI/CD pipelines, automated testing, Git-based code review workflows, and observability tooling including Prometheus, Grafana, and the ELK stack.
You may also have: (Preferred Qualifications)
Experience with Apache Ranger and Apache Atlas for data governance and metadata management
Understanding of Apache Iceberg table format and its replication challenges in hybrid cloud environments
Knowledge of enterprise backup and disaster recovery solutions
Previous experience in data migration or ETL pipeline development
Contributions to open-source big data projects
Experience in customer-facing roles or supporting enterprise customers
Advanced degree in Computer Science, Engineering, or a related field
What you can expect from us:
Generous PTO Policy
Support work life balance with Unplugged Days
Flexible WFH Policy
Mental & Physical Wellness programs
Phone and Internet Reimbursement program
Access to Continued Career Development
Comprehensive Benefits and Competitive Packages
Paid Volunteer Time
Employee Resource Groups
EEO/VEVRAA
#LI-SV1
Skills Required
- 8+ years in software engineering
- Strong proficiency in Java, Scala, or Python
- Deep hands-on experience with Apache Hadoop ecosystem (HDFS, Hive, HBase, YARN)
- Solid experience with modern data formats including Apache Iceberg, Delta Lake
- Practical experience across AWS, Azure, and GCP storage services
- Ability to architect large-scale distributed systems
- Familiarity with security protocols and data governance frameworks
- Experience in Agile SDLC, CI/CD pipelines, automated testing
Cloudera Compensation & Benefits Highlights
The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Cloudera and has not been reviewed or approved by Cloudera.
-
Leave & Time Off Breadth — Time off includes generous PTO and holidays plus recurring company‑wide Unplugged Days that provide regular recharge time. Volunteer time off and flexible scheduling options further expand usable leave.
-
Healthcare Strength — Health coverage spans comprehensive medical, dental, and vision alongside EAP, wellness sessions, and U.S. gym reimbursement. These elements position healthcare as a strong anchor within the package.
-
Strong & Reliable Incentives — Compensation often includes variable incentives and long‑term incentive programs with annual bonuses commonly offered. Sales and other revenue roles show competitive on‑target earnings when goals are met, reinforcing the incentive structure.
Cloudera Insights
What We Do
At Cloudera, we believe that data can make what is impossible today, possible tomorrow. We empower people to transform complex data into clear and actionable insights. Cloudera delivers an enterprise data cloud for any data, anywhere, from the Edge to AI. Powered by the relentless innovation of the open source community,
.png)






