Lead Site Reliability Engineer - Automation
At Disney, we're storytellers. We make the impossible, possible. We do this through using and developing the latest technology and pushing the envelope to bring stories to life through our movies, products, interactive games, parks and resorts, and media networks. Now is your chance to join our experienced team that empowers unparalleled magic for audiences around the world.
This position is for an eager Lead Site Reliability Automation Engineer to play an important role on our Parks Commercial Systems team to help improve practices, promote and onboard new technologies, solve complex problems, and integrate next-generation digital platforms.
Site Reliability Engineering (SRE) combines software and systems engineering disciplines to build and operationalize large-scale, massively distributed, fault-tolerant systems. SREs are experienced engineers that improve the resiliency of production systems and reduce operational toil using a data driven approach.
Responsibilities :
You Will:
- The Lead Automation SRE will help maintain our existing automations
- You will make improvements and upgrades on them, and create new automations to reduce toil, and save time from manual effort. This includes consulting, architecting, developing, and operationalizing infrastructure, applications, automation, creating telemetry for monitoring, and engineering high reliability and reinforcing operational best practices.
- You are passionate about constantly learning, applying technology to solve complex problems, highly motivated, optimistic, proactive, self-driven, and a creative thought leader.
Basic Qualifications :
You Have:
- Fluent in core scripting languages and advanced skills in programming languages (e.g. Python, NodeJS, Golang, etc) with an ability to build test coverage for all software being developed.
- Expertise with Linux and command line interfaces (CLI's) and code editors like VS Code
- Experience with a major Application Performance Monitoring (APM) tool (e.g. AppDynamics, New Relic)
- Experience with Rundeck
- Networking skills and protocols (e.g. HTTP, TLS, SSH, DNS)
- Experience with Distributed Systems and Container Platforms (e.g. ECS and Docker)
- Experience with Source Control Management systems (e.g. Github and Gitlab) and managing users and repos, and Git Automation pipelines.
- Experience with RESTful web service calls, and JSON
- Expertise in cloud hosting services (AWS, Google Cloud, Azure), cloud databases, cloud tools & API's, and cloud security, preferably with AWS.
- Experience with CI pipelines, and build tools, such as Jenkins
- Diagnose simple to complex automation problems, and our PCS SRE Applications.
- Demonstrate exceptional troubleshooting methodology, including the ability to author and instruct new methodologies to the SRE team.
- Able to evaluate new systems and/or infrastructure solutions for technical feasibility against known requirements and standards.
- Excellent verbal communication skills, Good problem-solving skills, Attention to detail
Required Education :
- Bachelor degree or equivalent work experience
Additional Information :
#DISNEYTECH
#LI-AF2