At Chewy, our mission is to be the most trusted and convenient destination for pet parents and partners, everywhere.
Minneapolis–Saint Paul, MN

Associate Director, Software Engineering - Observability at Chewy (Minneapolis, MN)

Sorry, this job was removed at 11:35 a.m. (CST) on Tuesday, October 11, 2022
Find out who's hiring in Minneapolis, MN.
See all Developer + Engineer jobs in Minneapolis, MN
By clicking Apply Now you agree to share your profile information with the hiring company.

Our Opportunity:

Chewy is hiring an Associate Director, Software Engineering - Observability in Boston or Minneapolis. The primary focus of this team is ensuring stays up. This team is responsible for systemic risk identification, handling the lifecycle of an incident, chaos engineering and resiliency consulting. In this role, you will be responsible for improving availability and resiliency of Chewy applications and services.

This is a high-profile position that will have exposure across the entire business, influencing the vision and implementation of architecture, design, and features for this new and growing line of business. As part of a dynamic team, this role offers a tremendous opportunity for professional growth in the leading online pet retailer in the US. Reporting directly to the Director of Software Engineering, the role will allow you to act as an individual contributor while leading a team of strong engineers. The role will have tremendous visibility in the technology & business organization of Chewy.

What You'll Do:

  • Cross-functional engagement with other engineering teams, managing issues when they happen, as well as promoting reliability and resilience practices throughout the organization
  • Manage and lead a team(s) of 5-10 experienced team of Software Engineers with in-depth operational knowledge of the entire Chewy stack, and who act as first responders and leaders during an incident
  • Design, develop and implement chaos engineering practices
  • Recruit and hire high-performing engineers and mentor the growth of the existing team
  • Drive incidents to resolution by collaborating with multiple engineering teams
  • Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks
  • Ensure timely and consistent communication to facilitate a clear understanding of ongoing projects and their prioritization within the organization
  • Standardize RCA process and holding teams accountable to deliver on preventative and corrective actions
  • Establish strong working relationships at all organizational levels and across functional teams
  • Must be able to identify and manage priorities within the context of overall corporate objectives
  • Create a dynamic, collaborative, and fun team environment

What you'll need:

  • Bachelor’s in Computer Science, Electrical/Computer Systems Engineering or a similar math or engineering discipline.
  • 10+ years in Performance Engineering, Observability, Resiliency and Chaos Engineering of largescale latency sensitive enterprise applications.
  • 7+ years of experience in Engineering Management
  • Expertise in ITSM process & tools like JIRA, PagerDuty and experience with ServiceNow ITOM, ITSM Modules that focuses Incident, Problem and Change Management
  • Expertise in developing executive friendly dashboards based on observable metrics in IT systems (KPIs, Incident Trends, MTTR, MTTD etc.)
  • Hands-on working experience with standard DevOps tools, build automation tools (Jenkins), issue tracking tools and source control systems (GitHub)
  • Experience working with AWS offerings such as ECS, EC2, Lambda, Fargate, S3, DynamoDB, and API Gateway
  • Solid understanding of Docker & Kubernetes or similar container-based architectures
  • Excellent understanding of micro-services architecture, design patterns, and standard methodologies with an eye towards scale, automation, resiliency, and high availability
  • Experienced with telemetry tooling and observability systems such as: Prometheus, Splunk, DataDog, Grafana
  • Quick learner with an attitude to explore and learn new tools and framework
  • Strong leadership, interpersonal, influencing, collaboration and negotiation skills
  • Ability to provide thought leadership, create roadmaps and strategies for practice improvement
  • Must be more than proficient at both written and spoken English
  • Position may require travel



  • Prior roles in Ecommerce or in technology companies
  • Advanced Degree
  • Experience working across fully automated stacks in a CI/CD ecosystem
  • Knowledge of high availability proxy and load balancers (HAProxy)
  • Knowledge of Content Delivery Networks (Akamai)
  • Experience operating distributed streaming services such as Apache Kafka

If you have a disability under the Americans with Disabilities Act or similar law, and you need an accommodation during the application process or to perform these job requirements, or if you need a religious accommodation, please contact [email protected].

If you have a question regarding your application, please contact [email protected].

Chewy is committed to equal opportunity. We value and embrace diversity and inclusion of all Team Members.

To access Chewy’s Privacy Policy, which contains information regarding information collected from job applicants and how we use it, please click here:

See More
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

What are Chewy Perks + Benefits

Chewy Benefits Overview

We offer competitive salaries and 401k, unlimited time off, comprehensive medical, dental, and vision benefits, in addition to wellness programs, online communities, and resources for improved physical and mental health... enabling you to be your best self - in and outside of work. With mentorship programs, employee resource groups, cross-functional job trainings, events, and customized development tracks for advancement, we're proud to help develop and promote our team members from within.

True to our business, we're pet-friendly and have fun pet-related perks like Paw-ternity leave for new pup parents and Chewy employee discounts. We offer countless volunteer opportunities, recreational club teams, company outings, happy hours, and team events to enable you to bond with fellow Chewtopians and have some fun!

Volunteer in local community
Partners with nonprofits
Open door policy
OKR operational model
Team based strategic planning
Pair programming
Open office floor plan
Flexible work schedule
Remote work program
We're currently 100% remote due to caution and care for the health & well-being of our team. Post-pandemic, we plan to operate in a combination of onsite and remote, with logistics still being defined
Dedicated diversity and inclusion staff
Highly diverse management team
Mandated unconscious bias training
Diversity employee resource groups
Hiring practices that promote diversity
Health Insurance & Wellness Benefits
Flexible Spending Account (FSA)
We offer a commuter transit, parking, and dependent care FSA.
Disability insurance
Dental insurance
Vision insurance
Health insurance
Life insurance
Pet insurance
Wellness programs
Mental health benefits
Financial & Retirement
401(K) matching
Company equity
Performance bonus
Child Care & Parental Leave Benefits
Childcare benefits
Generous parental leave
Family medical leave
Return-to-work program post parental leave
Vacation & Time Off Benefits
Unlimited vacation policy
Paid holidays
Paid sick days
Office Perks
Commuter benefits
Company-sponsored outings
Free snacks and drinks
Some meals provided
Company-sponsored happy hours
Pet friendly
Recreational clubs
Chewy sponsors office sports leagues year-round.
Relocation assistance
Professional Development Benefits
Job training & conferences
Lunch and learns
Cross-functional lunch and learns.
Promote from within
Mentorship program
Online course subscriptions available
Customized development tracks

More Jobs at Chewy

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about ChewyFind similar jobs like this