SRE Lead

Posted 3 Days Ago
Be an Early Applicant
2 Locations
In-Office or Remote
Senior level
Insurance
The Role
The SRE Lead will improve system reliability and performance, manage incidents, automate processes, and collaborate with development and operations teams.
Summary Generated by Built In

Our Platforms Team is at the forefront of innovation, creating technology solutions that empower multiple business lines across the organization. We are looking for a senior SRE to be supporting our applications deployed across the globe.
 

As an SRE practitioner, you will work to improve the reliability, availability, and performance of systems and services. You will collaborate with development and operations teams to design, implement, and maintain scalable and resilient infrastructure. Your role will involve automating processes, monitoring systems, and responding to incidents to ensure seamless user experiences.


Key Responsibilities:

System Reliability and Performance:

  • Design, build, and maintain scalable and reliable systems.
  • Monitor system performance and proactively address bottlenecks or issues.
  • Implement strategies to improve system uptime and reduce downtime.

Automation and Tooling:

  • Develop and maintain automation tools for deployment, monitoring, and incident response.
  • Create scripts and workflows to reduce manual intervention and improve efficiency.

Incident Management:

  • Respond to system outages and incidents, performing root cause analysis and implementing fixes.
  • Develop and maintain runbooks and documentation for incident response.

Monitoring and Observability:

  • Set up and maintain monitoring tools to track system health and performance.
  • Define and measure Service Level Indicators (SLIs) and Service Level Objectives (SLOs).

Collaboration and Communication:

  • Work closely with development teams to ensure systems are designed with reliability in mind.
  • Collaborate with operations teams to improve deployment processes and system management.

Capacity Planning and Scaling:

  • Analyze system usage and plan for future capacity needs.
  • Implement solutions to handle traffic spikes and ensure scalability.

Continuous Improvement:

  • Identify areas for improvement in system architecture and processes.
  • Advocate for best practices in reliability engineering and DevOps.
Qualifications
  • Strong knowledge of Linux/Unix systems and networking.
  • Proficiency in programming languages such as Python, Ansible, PowerShell, .Net, Java. Etc.
  • Experience with cloud platforms (e.g., Azure, AWS).
  • Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
  • Expertise in monitoring and observability tools (e.g., App Dynamics, App Insights, Dynatrace, Grafana, ELK stack).
  • Understanding of CI/CD pipelines and automation frameworks.
  • Problem-solving skills and ability to perform root cause analysis.
  • Excellent communication and collaboration skills.
  • Experience with distributed systems and microservices architecture.
  • Knowledge of database systems (SQL and NoSQL).
  • Familiarity with incident management frameworks (e.g., ITIL, SRE best practices).
  • Certifications in cloud technologies or DevOps tools.
  • Analytical mindset with a focus on reliability and scalability.
  • Passion for automation and reducing manual work.
  • Ability to work under pressure and handle critical incidents effectively.
  • Commitment to continuous learning and staying updated on industry trends.

Top Skills

Ansible
App Dynamics
App Insights
AWS
Azure
Docker
Dotnet
Dynatrace
Elk Stack
Grafana
Java
Kubernetes
Linux
Powershell
Python
Unix
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Schweiz
27,791 Employees

What We Do

Chubb is the world’s largest publicly traded property and casualty insurance company. With operations in 54 countries and territories, Chubb provides commercial and personal property and casualty insurance, personal accident and supplemental health insurance, reinsurance and life insurance to a diverse group of clients. As an underwriting company, we assess, assume and manage risk with insight and discipline. We service and pay our claims fairly and promptly. The company is also defined by its extensive product and service offerings, broad distribution capabilities, exceptional financial strength and local operations globally. Parent company Chubb Limited is listed on the New York Stock Exchange (NYSE: CB) and is a component of the S&P 500 index. Chubb maintains executive offices in Zurich, New York, London, Paris and other locations, and employs 31,000 people worldwide. Additional information can be found at: chubb.com.

Similar Jobs

In-Office or Remote
4 Locations
165 Employees

Motorola Solutions Logo Motorola Solutions

Software Quality Lead

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote or Hybrid
Penang, MYS
23000 Employees

Motorola Solutions Logo Motorola Solutions

Electrical Engineer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote or Hybrid
Penang, MYS
23000 Employees

Motorola Solutions Logo Motorola Solutions

Electrical & Electronic Engineer (R&D)

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote or Hybrid
Penang, MYS
23000 Employees
1-6 Annually

Similar Companies Hiring

Globe Life Thumbnail
Insurance • Financial Services
McKinney, TX
3000 Employees
MassMutual India Thumbnail
Insurance • Information Technology • Fintech • Financial Services • Big Data
Hyderabad, Telangana
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account