Site Reliability Engineer

Posted 2 Days Ago
Hiring Remotely in FL, USA
Remote
Senior level
Insurance
The Role
Design, build, and maintain highly available cloud-native architectures across Azure and AWS. Implement IaC, observability, SLO/SLI/error budgets, automated remediation, incident response, and resilience patterns. Collaborate with engineering, security, and operations to ensure SLAs, compliance, cost optimization, and disaster recovery.
Summary Generated by Built In

-

The Site Reliability Engineer (SRE) is responsible for ensuring the availability, scalability, performance, and resiliency of enterprise cloud platforms across Azure, and AWS environments.

This role combines software engineering, automation, and infrastructure expertise to operationalize reliability engineering practices, drive cloud-native resiliency patterns, and enable business-critical applications to meet defined SLAs, SLOs, and compliance requirements.

The SRE partners with engineering, security, and operations teams to implement observability, incident response frameworks, and reliability automation, aligning with enterprise architecture standards and regulatory expectations.

Key Accountabilities/Deliverables:

  • Design and implement highly available, fault-tolerant architectures using cloud-native services (microservices, containers, serverless)

  • Define and operationalize SLOs, SLIs, and error budgets for critical applications and platforms

  • Build and maintain Infrastructure as Code (IaC) (Terraform) to ensure repeatable and compliant deployments

  • Develop automated remediation and self-healing capabilities to reduce MTTR and improve system resilience

  • Establish enterprise-level monitoring, logging, and observability frameworks (Datadog, Azure Monitor, CloudWatch, OpenTelemetry, Azure Application Insights)

  • Drive cost optimization (FinOps) initiatives, including resource utilization tracking and rightsizing recommendations

  • Support DR/BCP strategy execution, including failover testing and regional isolation validation

  • Collaborate with application teams to embed reliability engineering practices into CI/CD pipelines

Technical Knowledge and Understanding:

  • Strong expertise in cloud platforms (Azure, AWS)

  • Deep understanding of cloud-native architecture patterns (microservices, containers (Azure Container Apps/AKS/EKS), serverless (Azure Functions/AWS Lambda))

  • Proficiency in Infrastructure as Code (Terraform, ARM/Bicep)

  • Experience with observability platforms (Datadog, Azure Monitor, Azure Application Insights)

  • Knowledge of CI/CD pipelines and GitOps practices

  • Expertise in system reliability concepts:

    • SLI / SLO / SLA management

    • Chaos engineering

    • High availability & fault isolationFamiliarity with security, compliance, and regulatory controls (SOC, ISO, cloud security frameworks)

Experience:

  • 5+ years experience in Site Reliability Engineering, DevOps, or Cloud Engineering

  • Proven experience supporting mission-critical production systems at scale

  • Hands-on experience with incident management and on-call operations

  • Experience implementing automated monitoring, alerting, and remediation frameworks

  • Exposure to regulated environments (insurance, financial services) preferred

  • Demonstrated ability to work across cross-functional architecture, engineering, and operations teams

Applicants must be authorized to work for any employer in the U.S.  We are unable to sponsor or take over work authorization sponsorship now or in the future for this position. 

-

At Core Specialty, you will receive a competitive salary and opportunities for professional development and advancement.  We offer medical, dental, vision, and life insurances; short and long-term disability; a Company-match of 100% of a 6% contribution 401(k) plan; an Employee Assistance Plan; Health Savings Account, Flexible Spending Account, Health Reimbursement Account, and a wellness program

Skills Required

  • 5+ years experience in Site Reliability Engineering, DevOps, or Cloud Engineering
  • Expertise with Azure and AWS cloud platforms
  • Proficiency with Infrastructure as Code (Terraform, ARM, Bicep)
  • Hands-on experience with observability platforms (Datadog, Azure Monitor, CloudWatch, OpenTelemetry, Azure Application Insights)
  • Experience with containers and orchestration (AKS, EKS, Azure Container Apps) and serverless (Azure Functions, AWS Lambda)
  • Experience defining and operationalizing SLOs/SLIs/SLAs and error budgets
  • Practical experience with incident management, on-call operations, and automated remediation/self-healing
  • Familiarity with CI/CD and GitOps practices
  • Familiarity with security, compliance, and regulatory controls (SOC, ISO, cloud security frameworks)
  • Exposure to regulated environments (insurance, financial services)
  • Authorization to work for any employer in the U.S.; employer will not sponsor visas
  • Experience with cost optimization / FinOps initiatives
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Birmingham, AL
390 Employees

What We Do

Core Specialty, through its subsidiary insurers, offers a diversified range of property, casualty, and marine insurance products for small to mid-sized businesses. We have the capital to take on risk, the underwriting talent in place, decisive leadership team, infrastructure, and a proven track record of making things happen – fast! When you’re ready to solve your toughest insurance needs, we’re ready to get it done for you. We free customers up to focus on their business by taking the load of complicated specialty insurance off their hands. We break down the walls of bureaucracy to provide optimal underwriting solutions for brokers. We’re ready, equipped, and motivated to get the job done, efficiently and professionally, by empowering experts with what they need to move quickly on behalf of customers and their brokers. Our specialty focus is the essential part of our identity. It is at our core

Similar Jobs

DraftKings Logo DraftKings

Site Reliability Engineer

Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics
Remote or Hybrid
United States
6400 Employees
200K-250K Annually

MongoDB Logo MongoDB

Site Reliability Engineer

Big Data • Cloud • Software • Database
Easy Apply
Remote or Hybrid
7 Locations
5550 Employees
127K-249K Annually

Domino Data Lab Logo Domino Data Lab

Site Reliability Engineer

Artificial Intelligence • Machine Learning
Easy Apply
Remote or Hybrid
US
200 Employees
200K-230K Annually

Coinbase Logo Coinbase

Site Reliability Engineer

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Easy Apply
Remote
USA
4700 Employees
218K-257K Annually

Similar Companies Hiring

Globe Life Thumbnail
Insurance • Financial Services
McKinney, TX
3000 Employees
MassMutual India Thumbnail
Big Data • Fintech • Information Technology • Insurance • Financial Services
Hyderabad, Telangana
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account