Sr Site Reliability Engineer at The Walt Disney Company (Orlando, FL)
Sorry, this job was removed at 9:36 a.m. (CST) on Thursday, April 28, 2022
By clicking Apply Now you agree to share your profile information with the hiring company.
Do you want to be part of a team that creates magic for millions of guests? Behind the scenes, the Retail Technology Operations team helps provide magical digital and physical experiences applying the latest technology; and our Site Reliability Engineers provide expert engineering services in the cloud, automation, and reliability to support the innovation and operation of The Walt Disney Company. We are passionate about ensuring our systems provide the best guest experience! You will protect and improve the automation and systems that run Disney's experiences and services with a focus on availability, latency, and automation while embracing a DevOps culture.
You will report to Sr Manager-Technology
Responsibilities :
The Site Reliability Engineering Team within the Retail Technology Operations focuses on improving our services and systems availability and performance. This is done through driving observability around our services, designing and implementing tooling and automation and working with our business and development teams to bring data-driven solutions. As a Senior Site Reliability Engineer, you will create automation for our services, expanding observability of the environment, participating in infrastructure design discussions, working with teams to implement complete solutions and resolving issues across environments.
Basic Qualifications :
Required Education :
You will report to Sr Manager-Technology
Responsibilities :
The Site Reliability Engineering Team within the Retail Technology Operations focuses on improving our services and systems availability and performance. This is done through driving observability around our services, designing and implementing tooling and automation and working with our business and development teams to bring data-driven solutions. As a Senior Site Reliability Engineer, you will create automation for our services, expanding observability of the environment, participating in infrastructure design discussions, working with teams to implement complete solutions and resolving issues across environments.
- You will oversee our current cloud and SaaS services - install, upgrade, maintain all necessary middleware components, work with cloud vendors to integrate APIs and automation tools
- You will work to compile a runbook that identifies all known, potential risks and incidents, and have well-defined procedures to reduce or eliminate the risk if they occur, you will also have on-call and incident resolution responsibilities
- You will deploy and manage new modern cloud technologies using infrastructure-as-code, self-healing, security automation patterns, instrumented and monitored
- You will participate in implementation of complex engineering solutions across Retail Technologies
- Manage and escalate delivery impediments, risks, and changes tied to the engineering programs to the partners
- Work with our development teams to ensure smooth operational transition of solutions, and to improve existing solutions post turnover
Basic Qualifications :
- Experience programming in one or more of: Python, Ruby, Java, Go, Rust, C/C++ (3 years)
- Experience with Cloud/PaaS/SaaS Environments (e.g. AWS, Azure, Google Cloud Compute) (3 years)
- Proficient, collaborative, and experienced in building reliable, scalable, enterprise systems (5 years)
- UNIX/Linux administration, troubleshooting, performance tuning, & security (3 years)
- Lead technical projects, working with project managers to ensure smooth delivery (2 years)
- Experience working with Security Operations teams to design security into solutions and avoid existing issues (2 years)
- Understanding of observability principles (monitoring, logging, tracing, alerting), tools and practices that promote observability (3 years)
- Experience with continuous integration tools (e.g.Gitlab, AWS CodeBuild, CodeDeploy, CodePipeline, Azure DevOps) (3 years)
- Configuration management and orchestration (e.g. Terraform, Cloud Formation, Ansible, Chef) (3 years)
- Excellent written and verbal communications; ability to develop presentations
Required Education :
- Bachelor's degree in applicable field
Read Full Job Description