Senior Site Reliability Engineer (K9 Team)
Our Opportunity:
Chewy is looking to hire Site Reliability Engineers in Bellevue, WA, Boston, MA, Fort Lauderdale, FL or Minneapolis, MN locations. Site Reliability Engineers are a cross between system and software engineers who are responsible for all operational aspects of Chewy’s e-commerce platform. The team is responsible for designing, building, monitoring, and maintaining the infrastructure of our internet-facing and internal services. We're looking for engineers who want to be a part of developing infrastructure software, maintaining it, and scaling Chewy’s technology stack. Come help us build a bigger and better Chewy as a Site Reliability Engineer. You will be part of a small family within Chewy that has a huge impact on our incredible growth. Ideal candidates will possess the ability to discuss complex technical concepts with a diverse audience across all areas of the organization. They will remain calm under pressure and always strive to add structure to high-pressure, fast paced tasks or projects.
What You’ll Do:
- Focus on service stability and reliability by working with application owners to set SLOs, "Error Budget" and backup and DR strategies
- Complete understanding of operational tools and concepts, such as alerting, monitoring, logging and health checks
- Perform capacity planning and production readiness assessment
- Embed with product teams during the design and requirements phase of new product development through to initial production launch
- Identify requirements for other operational teams (release engineering, automation, etc.) during application development phase
- Be a technology and DevOps evangelist for the rest of the company
- Participate in on-call rotation for level 3 support escalations
What You’ll Need:
- At least 5 years of experience working in an SRE role or similar
- Hands on experience with orchestration and system configuration tools such as Ansible, Puppet, Chef, Terraform [preferred], etc.
- Minimum 5+ years of experience in building and managing applications in public cloud platforms like AWS (preferred), GCP or Azure
- Expert in building and maintaining highly available applications including redundancy, fail over, scalability, monitoring and performance
- Highly skilled in identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues
- Solid understanding/experience of web services, databases and relating infrastructure/architecture
- Experience working with open source community (troubleshooting, patch submission, etc.)
- Demonstrated 5+ years of Linux System Administration
- Experience with CI tools such as Bamboo, Jenkins, CircleCI
- Ability to organize, troubleshoot and continuously learn
- Previous experience working within controls such as SOX, PCI, etc
- Bachelor’s Degree (MIS or CS preferred) or equivalent work experience
- Position may require travel
Bonus:
- AWS Certified Solutions Architect
- Advanced Terraform knowledge and orchestration using Jenkins
- Datadog Integration expertise for container
Chewy is committed to equal opportunity. We value and embrace diversity and inclusion of all Team Members.
If you have a disability under the Americans with Disabilities Act or similar law, or you require a religious accommodation, and you wish to discuss potential accommodations related to applying for employment at Chewy, please contact HR at chewy dot com
To access Chewy’s Privacy Policy, which contains information regarding information collected from job applicants and how we use it, please click here: https://www.chewy.com/app/content/privacy).