Site Reliability Engineer
About the job
The Red Hat OpenShift Site Reliability Engineering Platform team (SRE-P) is looking for a Site Reliability Engineer to join our global team. In this role, you will play a key role to contribute to solutions that make Red Hat OpenShift Dedicated scalable, featureful, resilient, and secure, while maintaining a balance between development and operations work. Red Hat OpenShift is enterprise Kubernetes and SRE is Red Hat's team that develops and operates Red Hat OpenShift Dedicated as a public cloud service for large enterprise customers. You'll be contributing to the design and development of automation software to provision, upgrade, monitor, and heal a large global fleet of Red Hat OpenShift clusters deployed across multiple public clouds. You'll participate in a follow-the-sun on-call rotation and help lead incident management, root cause analysis, and continuous improvement activities, managing engineering efforts against a service level agreement (SLA) and error budget. Red Hat OpenShift SRE-P is a growing, sophisticated, global, fast-paced team inside the world's open source leader with constant opportunities to learn new skills and innovate new solutions to meet our customers' demands. As an SRE on this team, you'll directly contribute to Red Hat's success in the rapidly growing Kubernetes-as-a-Service market. Successful applicants must reside in a state where Red Hat is registered to do business.
What you will do
- Design and write automation software to provision, upgrade, monitor, and heal a large global fleet of Red Hat OpenShift clusters deployed across multiple public clouds
- Identify single points of failure and other high-risk architecture issues; propose and implement more resilient and scalable solutions
- Participate in product release cycles, deploying code to integration, staging, and production environments, integrating with continuous integration and continuous delivery (CI/CD) tooling, monitoring, and change management
- Perform software updates, peer code reviews, testing, and common vulnerabilities and exposures (CVE) analysis; respond to security threats
- Interact with automated monitoring and healing infrastructure to ensure healthy environments
- Provide engineering support to Red Hat's global technical support team to resolve customer issues
- Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediating problems in our environment
- Participate in a follow-the-sun on-call rotation, including periodic weekend and holiday on-call duties
What you will bring
- 3+ years of software engineering experience using object-oriented languages; golang is a plus
- 3+ years experience managing Linux-based systems in a public cloud like Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure
- 3+ years experience with enterprise systems monitoring; knowledge of Prometheus is a plus
- 3+ years experience with enterprise configuration management like Ansible, Puppet, or Chef
- 1+ years experience delivering hosted cloud services
- 1+ year experience with Kubernetes
- 1+ year experience with containers on Linux
- Superior communications skills and experience working directly with and presenting to customers
- Demonstrated ability to quickly and accurately troubleshoot systems issues
- Solid understanding of standard TCP/IP networking and common protocols like DNS and HTTP
About Red Hat
Red Hat is the world's leading provider of enterprise open source software solutions, using a community-powered approach to deliver reliable and high-performing Linux, hybrid cloud, container, and Kubernetes technologies. Red Hat helps customers integrate new and existing IT applications, develop cloud-native applications, standardize on our industry-leading operating system, and automate, secure, and manage complex environments. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500. As a strategic partner to cloud providers, system integrators, application vendors, customers, and open source communities, Red Hat can help organizations prepare for the digital future.
Benefits
- Comprehensive medical, dental, and vision coverage
- Flexible Spending Account - healthcare and dependent care
- Health Savings Account - high deductible medical plan
- Retirement 401(k) with employer match
- Paid time off and holidays
- Paid parental leave plans for all new parents
- Leave benefits including disability, paid family medical leave, and paid military leave
- Additional benefits including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more!
Note: These benefits are only applicable to full time, permanent associates at Red Hat located in the United States.