Senior Site Reliability Engineer
Our Opportunity:
Chewy is seeking Senior Site Reliability Engineers. Chewy is THE go-to online shopping destination for all things pet and we are continuously striving to delight pet parents with a seamless experience across our platforms. The SRE team works with various teams across the organization to make their service more resilient against failures through applying common patterns and practices, and scale them up to keep up with the ever-increasing growth and demand. This includes facilitating resiliency testing, game day exercises and chaos testing to uncover risks and weaknesses before they lead to large scale production issues.
Do you enjoy working in a fast-paced environment, solving complex technical problems, and delivering innovative solutions? If you have a passion for solving complex problems unique to running large, highly scalable, resilient systems, we would love for you to join us! The role will have tremendous visibility in the technology & business organization of Chewy. This is a high-profile position that will have exposure across the entire business, influencing the vision and implementation of architecture, design and features of Chewy’s technical platform.
What You’ll Do:
- Contribute to the development of our self-service chaos platform.
- Enable engineering teams to make their services more reliable by identifying, creating, and deploying engineering practices, processes, and solutions.
- Establish monitoring tools and management dashboards integrated into platforms with best practice notifications and response processes.
- Define and document best practices and strategies regarding application deployment and infrastructure maintenance.
- Educate teams on the implementation of new cloud-based initiatives, providing associated training as required.
- Employ exceptional problem-solving skills, with the ability to see and solve issues before they affect business productivity.
- Improve availability, reliability, and observability of Chewy services and reduce the burden of human toil with tooling and automation.
What You’ll Need:
- 7+ years of experience in software engineering, SRE or performance engineering role.
- Programming experience in one or more of Python, Go, Shell, Java, and JavaScript/React.
- 5+ years of hands-on experience designing and developing scalable, high performing and fault-tolerant applications for large enterprises.
- Expertise in developing executive friendly dashboards based on observable metrics in IT systems (KPIs, Incident Trends, MTTR, MTTD etc.).
- Hands-on working experience with issue tracking tools and source control systems (GitHub).
- Experience with Infrastructure tools, container technology (Docker), public cloud providers (AWS, Google Cloud, Azure), configuration and deployment management (Terraform, Ansible), continuous delivery infrastructure (e.g., Jenkins) and orchestration (Kubernetes, Fargate).
- Excellent understanding of micro-services architecture, design patterns, and standard methodologies with an eye towards scale, automation, resiliency, and high availability.
- Experienced with telemetry tooling and observability systems such as: Prometheus, Splunk, DataDog, Grafana.
- Leverage automation to improve deployments and updates, speed up problem detection/resolution, and ensure safe and quick rollback when problems occur.
- A Bachelor’s degree in Computer Science or related field or equivalent experience.
- Position may require travel.
Bonus:
- CDN & DNS experience is a plus.
- Incident management and on-call experience.
- Experience contributing to the architecture and design (architecture, design patterns, resiliency and scaling) of new and current systems.
- Expertise in ITSM process & tools like JIRA, PagerDuty and experience with ServiceNow ITOM, ITSM Modules that focuses Incident, Problem and Change Management.
If you have a disability under the Americans with Disabilities Act or similar law, or you require a religious accommodation, and you wish to discuss potential accommodations related to applying for employment at our company, please contact [email protected].
To access Chewy’s Privacy Policy, which contains information regarding information collected from job applicants and how we use it, please click here: Chewy Privacy Policy.
If you have a disability under the Americans with Disabilities Act or similar law, and you need an accommodation during the application process or to perform these job requirements, or if you need a religious accommodation, please contact [email protected].
If you have a question regarding your application, please contact [email protected].
Chewy is committed to equal opportunity. We value and embrace diversity and inclusion of all Team Members.
To access Chewy’s Privacy Policy, which contains information regarding information collected from job applicants and how we use it, please click here: https://www.chewy.com/app/content/privacy).