Responsibilities:
- System Reliability: Ensuring the reliability of software systems by designing, implementing, and maintaining scalable and reliable infrastructure.
- Automation: Developing automation tools and scripts to streamline operational tasks, reduce manual intervention, and improve overall system efficiency.
- Incident Response and Resolution: Monitoring system performance and responding to incidents promptly to minimize downtime and ensure high availability.
- Capacity Planning: Analyzing system usage patterns and forecasting future capacity needs to ensure that the infrastructure can handle current and future demands.
- Performance Optimization: Identifying and addressing performance bottlenecks in software systems through optimization and tuning.
- Infrastructure as Code (IaC): Implementing infrastructure as code practices, using tools like Terraform or Ansible, to define and manage infrastructure in a version-controlled and automated manner.
- Monitoring and Logging: Implementing and maintaining monitoring and logging solutions to gain insights into system behavior, troubleshoot issues, and proactively address potential problems.
- On-Call Support: Participating in an on-call rotation to respond to incidents outside of regular working hours and ensure 24/7 system availability
- Security: Collaborating with security teams to implement and maintain security best practices in infrastructure and application
- Disaster Recovery Planning: Developing and maintaining disaster recovery plans to ensure that systems can quickly recover from major outages or failures
- Continuous Improvement: Continuously analyzing system performance, reliability, and incidents to identify areas for improvement and implementing changes to enhance overall system resilience.
Skills:
- Programming Languages: Proficiency in one or more programming languages, commonly Python, Go, Shell, Bash.
- Automation and Scripting: Strong automation skills using tools like Ansible, Puppet, Chef, or custom scripts. Knowledge of Infrastructure as Code (IaC) tools like Terraform
- Containerization and Orchestration: Experience with containerization technologies like Docker and container orchestration platforms like Kubernetes.
- Cloud Computing: Proficiency in any of the cloud platforms such as AWS, Azure, or Google Cloud Platform, and knowledge of managing infrastructure in the cloud.
- Monitoring and Logging: Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK stack) and logging frameworks to track system performance and troubleshoot issues.
- Networking: Understanding of networking concepts, protocols, and troubleshooting skills.
- Security: Knowledge of security best practices, including encryption, access controls, and vulnerability management.
- Continuous Integration/Continuous Deployment (CI/CD): Understanding and implementation of CI/CD pipelines for automated testing and deployment.
- Load Balancing: Experience in incident response, troubleshooting, and resolution.
- Version Control: Proficient use of version control systems like Git.
Experience and Qualifications:
- 2-4 years of experience in site reliability engineering.
- B.Tech/M.Tech in computer science, information technology or a related field.
- Having experience working for a product organization is a plus.
- Certifications from cloud service providers like AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or Microsoft Certified is a plus
Top Skills
What We Do
Founded in 2015, Zeta is a provider of next-gen credit card processing platform. Zeta’s cloud-native and fully API-enabled stack offers a comprehensive range of capabilities, including processing, issuing, lending, core banking, fraud detection, and loyalty programs. With a strong focus on technology, Zeta has over 1700+ employees and contractors, with more than 70% dedicated to technology roles. Operating across the US, UK, Middle East, and Asia, Zeta has served a global customer base of 35+ clients who have issued over 15 million cards on Zeta's platform to date. Backed by prominent investors such as Softbank Vision Fund 2 and Mastercard, Zeta has raised $280 million, at a valuation of $1.5 billion.






