Technical Operations, System Operations Administrator at The Walt Disney Company
The Systems Operations team’s mission is to provide resiliency and availability of the critical infrastructure in our various data centers and co-locations that power the media distribution of all of Disney Streaming’s world-class products.
The Sysops System Administrator is the cornerstone of the System Operations team. We are a 24x7x365 team tasked with watching over our monitoring tools looking out vigilantly for issues that could crop up in our infrastructure. If an issue is found, we provide initial triage and escalation using our curated documentation. When not dealing with any active issues, time is spent improving our documentation as well as our processes. We are the interface point for many of the engineering teams and outside third party providers of hardware and services.
- Monitor for and react to alerts within the environment providing level 1 support
- Provide clear communication/escalation/follow up/closure of alerts and maintenances to multiple teams within the organization
- Support various engineering and other internal teams involved with maintaining our server infrastructure
- Complete general day to day tickets (user add/removal, monitoring adjustments, general permission related issues, etc.)
- Support high profile content releases and events
- Write and improve team knowledgebase
Basic Qualifications :
- 1+ years of systems administration experience with a UNIX operating system (Centos, Ubuntu, Solaris, etc..)
- Familiarity with Ansible
- Familiarity with version control tools (Github)
- Familiarity with monitoring tools such as Icinga, Nagios, Sensu, Datadog
- Familiarity with at least one of the following languages (Bash, Python, GoLang)
- Have familiarity with task ticketing systems (JIRA, SNOW)
- Comfortable with hardware OOB user interfaces (Dell, Super Micro)
- Comfortable with basic network and DNS concepts.
- Comfortable with web, proxy, and application servers (Jetty, Tomcat, Varnish, Apache, Nginx)
- Strong problem solving and time management skills
- Familiarity with Amazon Web Services
- Experience with Splunk, Graphana, ELK
- Experience with Docker, Kubernetes, Terraform, AWX
- Experience with Datadog / New Relic