As a Senior Data Reliability Engineer, you will be responsible for architecting, scaling, and optimizing enterprise-grade data platforms, including large-scale data lakes and data warehouses built from multiple disparate data sources. This role requires deep expertise in cloud databases, data infrastructure reliability, observability, and automation, with a strong focus on operational excellence, performance, and resilience.
Responsibilities:
- Own the reliability, availability, scalability, and performance of PostgreSQL RDS environments across production and non-production systems.
- Lead proactive monitoring and observability initiatives for PostgreSQL RDS instances, leveraging tools such as CloudWatch, Prometheus, Grafana, and other enterprise monitoring platforms.
- Drive advanced PostgreSQL performance tuning, including query optimization, indexing strategies, parameter tuning, and capacity planning.
- Architect and optimize database backup, disaster recovery, and failover strategies to ensure business continuity and minimal downtime.
- Own the reliability and operational excellence of Debezium and Kafka Connect ecosystems, ensuring robust real-time data ingestion and delivery.
- Lead troubleshooting and optimization of ETL workflows and data pipelines, ensuring scalability, reliability, and fault tolerance across data platforms.
- Oversee Apache Airflow workflow orchestration, ensuring high reliability, SLA adherence, and operational efficiency of production DAGs.
- Design and implement Infrastructure as Code (IaC) solutions using tools such as Terraform, Crossplane, and automation frameworks to streamline deployments and operational tasks.
- Lead incident response, root cause analysis, and post-incident reviews for critical production issues.
- Define and enforce database security standards, including access controls, encryption policies, compliance adherence, and periodic security audits.
- Partner closely with engineering, DevOps, and data platform teams to optimize data architecture and improve overall platform reliability.
- Mentor junior engineers and drive best practices across database reliability engineering and cloud data operations.
- Identify and lead continuous improvement initiatives focused on reliability, automation, scalability, and operational maturity.
Skills:
- Deep expertise in PostgreSQL administration and performance tuning, preferably in AWS RDS environments.
- Strong experience with Debezium, Kafka Connect, ETL frameworks/tools, and enterprise-grade data pipeline architectures.
- Strong hands-on experience with Amazon Redshift, S3, and cloud-native data platforms.
- Expertise in Apache Airflow workflow orchestration and operational management.
- Experience with Apache Spark and large-scale distributed data processing.
- Strong scripting and automation experience using Python, Bash, or similar languages.
- Strong experience in Infrastructure as Code (IaC) using Terraform, Crossplane, or equivalent tools.
- Hands-on experience with monitoring and observability tools such as CloudWatch, Prometheus, Grafana.
- Strong understanding of cloud database security, compliance, and governance frameworks (e.g., GDPR, HIPAA).
- Experience designing highly available, fault-tolerant, and scalable cloud database systems.
Experience and Qualifications:
- Bachelor’s degree in computer science, Information Technology, or a related field (master’s preferred).
- 10–12 years of overall experience in database engineering, cloud data infrastructure, or reliability engineering.
- Minimum 5+ years of hands-on experience with PostgreSQL, including AWS RDS administration.
- Strong experience in cloud-native data platforms and enterprise-scale production environments.
- AWS Certified Database - Specialty or relevant cloud certifications preferred.
Skills Required
- Bachelor's degree in computer science, Information Technology, or a related field (master's preferred)
- 10-12 years of overall experience in database engineering, cloud data infrastructure, or reliability engineering
- Minimum 5+ years of hands-on experience with PostgreSQL, including AWS RDS administration
- Strong experience in cloud-native data platforms and enterprise-scale production environments
- AWS Certified Database - Specialty or relevant cloud certifications
Zeta Compensation & Benefits Highlights
The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Zeta and has not been reviewed or approved by Zeta.
-
Fair & Transparent Compensation — Pay is considered competitive for some roles and markets, with market-aligned offers in certain senior or U.S.-based positions. Overall compensation tends to be seen as fine-to-good rather than top-tier.
-
Parental & Family Support — Parental leave, adoption/fertility support, and childcare coverage are part of the package. These offerings contribute to a well-rounded family support mix even if specifics vary by location.
-
Wellbeing & Lifestyle Benefits — Flexible hours, paid volunteer time, public-transport incentives, concierge services, and workplace perks are highlighted. These lifestyle-oriented perks add breadth beyond core pay and health coverage.
Zeta Insights
What We Do
Founded in 2015, Zeta is a provider of next-gen credit card processing platform. Zeta’s cloud-native and fully API-enabled stack offers a comprehensive range of capabilities, including processing, issuing, lending, core banking, fraud detection, and loyalty programs. With a strong focus on technology, Zeta has over 1700+ employees and contractors, with more than 70% dedicated to technology roles. Operating across the US, UK, Middle East, and Asia, Zeta has served a global customer base of 35+ clients who have issued over 15 million cards on Zeta's platform to date. Backed by prominent investors such as Softbank Vision Fund 2 and Mastercard, Zeta has raised $280 million, at a valuation of $1.5 billion.







