Staff Reliability Engineer

Reposted 8 Hours Ago
Be an Early Applicant
4 Locations
In-Office
128K-191K Annually
Senior level
Fintech • Payments • Financial Services
The Role
The Staff Reliability Engineer will enhance data platform reliability through automation, incident management, and observability in a hybrid work setting.
Summary Generated by Built In
Staff Reliability Engineer - IE07KE

We’re determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we help shape the future.   

         

The Hartford is seeking a highly skilled Senior Reliability Engineer (RE) to join our Enterprise Data Organization. This role is pivotal in applying software engineering principles to operations, ensuring the reliability, performance, and scalability of our foundational data infrastructure, platforms and applications in this organization. You will be instrumental in driving our transition from traditional production support to a modern RE model through automation, toil reduction, and standardized service management.

This role can have a Hybrid or Remote work arrangement. Candidates who live near one of our locations will have the expectation of working in an office 3 days a week (Tuesday through Thursday). Candidates who do not live near an office should maintain their current work arrangement with the expectation of coming into the office as business needs arise

Responsibilities

  • Platform Reliability & Resiliency: Design, build, and maintain highly reliable, scalable, and resilient cloud-based data platforms on AWS and GCP, including core infrastructure and services like Snowflake, EKS, OpenSearch, EMR and Hadoop ecosystems.

  • Automation & Toil Reduction: Champion the RE mandate by identifying manual, repetitive operational tasks (toil) and developing robust automation solutions to eliminate them. This includes automating provisioning, deployment, self-healing and operational tasks.

  • Observability & Monitoring: Implement and manage comprehensive observability solutions (monitoring, alerting, logging, tracing) for the underlying data infrastructure, applications focusing on establishing clear Service Level Indicators (SLIs), Service Level Objectives (SLOs).

  • Incident Response & Management: Act as an escalation point for production incidents, leading incident response, performing deep root cause analysis (RCA), designing error budgets and implementing preventative measures to ensure issues do not recur

  • Standardization & Documentation: Lead the standardization of operational processes and documentation, including the creation and automation of dynamic runbooks and playbooks for consistent and efficient incident resolution and service management.

  • RE Transition: Leads as RE Subject Matter Expert and collaborate with other Platform, Product and Data Engineering Support teams to instill RE best practices, including participation in system design consulting, capacity planning, and deployment pipelines (CI/CD).

Qualifications

  • 10+ year’s overall experience in an Infrastructure, Data or related technology organization with increasing responsibilities as a hands-on technologist.

  • Must have 5+ year experience as an RE, Cloud, DevOps Engineer, or similar role supporting large-scale enterprise infrastructure and applications.

  • Strong scripting and programming skills (Python etc.) for automation and tooling development.

  • Experience with infrastructure-as-code (e.g., Terraform, CloudFormation, Ansible) and CI/CD tools.

  • Experience designing and operating reliable and resilient infrastructure, fail-safe patterns, reliability controls, and observability from a Reliability Engineering (SRE/RE) infrastructure support perspective across cloud and big data platforms (AWS, GCP, Amazon EMR, Hadoop/Spark, OpenSearch, and container orchestration platforms etc.)

  • Familiarity with cloud-native integrations with databases, data integration, and business intelligence platforms (Snowflake, Informatica IDMC, Tableau, and ThoughtSpot etc.)

  • Expertise in setting up and tuning monitoring and alerting systems (e.g., Dynatrace, Splunk, Prometheus, Grafana, Datadog, Open Telemetry etc.).

  • Expertise defining and implementing of DataOps practices

  • Expertise implementing AIOps to monitor, manage and self-heal infrastructure, data platforms, experience implementing machine learning principles for anomaly detection, alerting and runbook automation.

  • Experience with prompt engineering, implementing AWS or Google AI services, AI enabled automation for infrastructure reliability and performance management.

  • Relevant industry certifications preferred (AWS, GCP, Kubernetes, SRE/DevOps frameworks etc.)

This role will have a Hybrid work schedule, with the expectation of working in an office (Columbus, OH, Chicago, IL, Hartford, CT or Charlotte, NC) 3 days a week (Tuesday through Thursday).

Candidates must be authorized to work in the US without company sponsorship. The company will not support the STEM OPT I-983 Training Plan endorsement for this position.

Compensation

The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford’s total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is:

$127,600 - $191,400

Equal Opportunity Employer/Sex/Race/Color/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age

About Us | Our Culture | What It’s Like to Work Here | Perks & Benefits

Skills Required

  • 10+ years overall experience in an Infrastructure, Data or related technology organization
  • 5+ years experience as an RE, Cloud, or DevOps Engineer
  • Strong scripting and programming skills (Python)
  • Experience with infrastructure-as-code
  • Expertise in monitoring and alerting systems
  • Industry certifications (AWS, GCP, Kubernetes) preferred

The Hartford Financial Services Group, Inc. Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about The Hartford Financial Services Group, Inc. and has not been reviewed or approved by The Hartford Financial Services Group, Inc..

  • Retirement Support A 401(k) with matching plus an additional company contribution, alongside an employee stock purchase plan and no‑cost financial planning, signals robust long‑term savings support. HSAs/FSAs and related financial tools further strengthen overall financial well‑being.
  • Leave & Time Off Breadth At least 25 days of PTO to start, options to buy or roll over time, and paid parental leave indicate broad time‑off support. Paid leave for organ and bone marrow donation and generous disability coverage extend protection for significant life events.
  • Healthcare Strength Multiple medical, dental, and vision options with the company covering most medical and dental premiums reflect strong core health coverage. Wellness programs, fitness reimbursements, well‑being credits, and accessible behavioral health services expand depth and accessibility.

The Hartford Financial Services Group, Inc. Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Hartford, Connecticut
20,002 Employees
Year Founded: 1810

What We Do

Human achievement is at the heart of what we do. We put our belief into action by not only ensuring individuals and businesses are well protected, but by going even further – making an impact in ways that go beyond an insurance policy

Similar Jobs

Domino Data Lab Logo Domino Data Lab

Site Reliability Engineer

Artificial Intelligence • Machine Learning
Easy Apply
Remote or Hybrid
US
200 Employees
200K-230K Annually

Babylist Logo Babylist

Staff Engineer

eCommerce • Healthtech • Kids + Family • Retail • Social Media
Easy Apply
Remote or Hybrid
United States
300 Employees
227K-272K Annually

AlphaSense Logo AlphaSense

Site Reliability Engineer

Artificial Intelligence • Fintech • Machine Learning • Natural Language Processing • Business Intelligence
Remote or Hybrid
United States
2000 Employees
150K-225K Annually

Ping Identity Logo Ping Identity

Site Reliability Engineer

Cloud • Security • Software
Remote or Hybrid
USA
2300 Employees
136K-170K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account