Lead Site Reliability Engineer

Posted 19 Hours Ago
Be an Early Applicant
Marlborough Center, Town of Marlborough, CT
109K Annually
5-7 Years Experience
Retail
The Role
As a Lead Site Reliability Engineer, you will design and manage Java-based microservices, build monitoring systems, improve e-commerce infrastructure, and apply SRE principles to ensure reliability, performance, and security. You will lead initiatives for system enhancement and conduct root-cause analyses for production incidents, all while working in a high-pressure environment.
Summary Generated by Built In

Join our team of more than 34,000 team members, supporting our members and communities in our Club Support Center, 235+ clubs and eight distribution centers. BJ’s Wholesale Club offers a collaborative and inclusive environment where all team members can learn, grow and be their authentic selves. Together, we’re committed to providing outstanding service and convenience to our members, helping them save on the products and services they need for their families and homes.

The Benefits of working at BJ’s

BJ’s pays weekly

Eligible for free BJ's Inner Circle and Supplemental membership(s)*

Generous time off programs to support busy lifestyles* 

o Vacation, Personal, Holiday, Sick, Bereavement Leave, Jury Duty

Benefit plans for your changing needs*

o Three medical plans**, Health Savings Account (HSA), two dental plans, vision plan, flexible spending

401(k) plan with company match (must be at least 18 years old)

*eligibility requirements vary by position

**medical plans vary by location

As a Lead Site Reliability Engineer, you will be responsible for designing, building, monitoring, and continuously improving our ecommerce platform's infrastructure and processes. Leveraging your expertise in observability tools such as New Relic, Scalyr/Splunk, bash scripts, and Python scripts, you will play a pivotal role in ensuring the reliability and performance of our Java microservices-based architecture.

Key Responsibilities:

  • Design and manage Java based microservices, bash scripts, Redis, High-Availability design, while strictly adhering to Site Reliability Engineering (SRE) principles.
  • Thrive in high-pressure environments, working swiftly and reliably to maintain system integrity and meet service level objectives (SLOs) and service level indicators (SLIs).
  • Proactively identify and address potential issues before they impact operations, utilizing observability tools like New Relic, Scalyr/Splunk, bash scripts, and Python scripts.
  • Lead initiatives to enhance current systems and implement innovative solutions in collaboration with a fast-paced, mission-driven team, focusing on the implementation of SRE best practices.
  • Conduct thorough root-cause analyses for production incidents and generate high-quality RCA reports, leveraging SRE methodologies to prevent recurrence.
  • Apply software engineering principles to rectify operational challenges and optimize system performance, with a specific focus on implementing SRE-driven solutions.
  • Ensure the availability, latency, performance, efficiency, and security of our infrastructure, adhering rigorously to SRE principles and best practices.
  • Design and maintain robust production monitoring systems to ensure timely detection and resolution of issues, following SRE guidelines for effective monitoring and alerting.
  • Utilize a diverse array of tools to troubleshoot performance and stability issues effectively, employing SRE methodologies to identify and mitigate bottlenecks.
  • Evaluate and enhance application and environment security measures, integrating SRE-driven security practices into the development and deployment pipelines.
  • Provide support for globally distributed, multi-cloud (public and/or private) environments, implementing SRE strategies for resilience and fault tolerance.
  • Automate repetitive tasks at scale to streamline operational workflows and enhance efficiency, focusing on the implementation of SRE-driven automation solutions.
  • Adhere to change management processes during implementations and utilize version control for application infrastructure, following SRE principles for reliable and auditable change management.
  • Foster a SRE mindset throughout the organization, promoting collaboration and shared responsibility for reliability and performance

Qualifications:

  • Bachelor's Degree in Computer Science or related field, or foreign equivalent.
  • Demonstrated curiosity and self-drive to tackle complex challenges and drive change in a diverse organizational landscape.
  • Excellent written and verbal communication skills, with the ability to effectively communicate with engineering management, developers, and leadership.
  • Proven ability to adapt to new technologies and learn quickly.
  • Minimum of 5 years of experience in Site Reliability Engineering (SRE) or related roles.

Job Conditions:

  • Collaborate within a diverse and global team environment.
  • Participate in cross-training with other team members across different regions.
  • Rotate in an on-call schedule as required to ensure 24/7 availability and support for critical systems.

In accordance with the Pay Transparency requirements, the following represents a good faith estimate of the compensation range for this position. At BJ’s Wholesale Club, we carefully consider a wide range of non-discriminatory factors when determining salary. Actual salaries will vary depending on factors including but not limited to location, education, experience, and qualifications. The pay range for this position is starting from $109,000.00.

Top Skills

Java
Python
The Company
HQ: Westborough, MA
10,308 Employees
On-site Workplace

What We Do

Headquartered in Westborough, Massachusetts, BJ's Wholesale Club is a leading operator of membership warehouse clubs in the Eastern United States. The company currently operates over 215 clubs and more than 145 BJ's Gas® locations in 17 states.

Explore career opportunities at BJ's and join our team today: www.bjs.com/careers

Jobs at Similar Companies

Optimum Logo Optimum

Manager Software Engineering

AdTech • Digital Media • Internet of Things • Marketing Tech • Mobile • Retail • Software
Hybrid
Toronto, ON, CAN
9000 Employees

Sandbox VR Logo Sandbox VR

Software Engineer

Events • Gaming • News + Entertainment • Retail • Virtual Reality
Hong Kong
1000 Employees

Grocery TV Logo Grocery TV

District Manager - Des Moines, IA

AdTech • Digital Media • Hardware • Marketing Tech • Retail • Software
Easy Apply
Remote
Des Moines, IA, USA
45 Employees

Similar Companies Hiring

Sandbox VR Thumbnail
Virtual Reality • Retail • News + Entertainment • Gaming • Events
US
1000 Employees
Optimum Thumbnail
Software • Retail • Mobile • Marketing Tech • Internet of Things • Digital Media • AdTech
Long Island City, NY
9000 Employees
Grocery TV Thumbnail
Software • Retail • Marketing Tech • Hardware • Digital Media • AdTech
Austin, TX
45 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account