-
The Site Reliability Engineering (SRE) Lead is responsible for leading the SRE team to ensure the reliability, scalability, and performance of the organization’s critical systems and services. This role involves managing the team’s day-to-day operations, developing automation strategies, implementing best practices, and collaborating with development and operations teams to optimize the entire software lifecycle. The ideal candidate is a highly skilled engineer with strong leadership capabilities, capable of driving improvements in system reliability, monitoring, and incident response.
Key Accountabilities/Deliverables:
-
Lead, mentor, and manage a team of SREs, fostering a culture of reliability, collaboration, and continuous improvement.
-
Oversee the availability, performance, and scalability of services, ensuring that systems are reliable, efficient, and meet established SLAs.
-
Develop and implement automation strategies to reduce manual intervention, improve efficiency, and minimize downtime.
-
Lead incident response efforts, ensuring timely resolution of production issues and minimizing impact on customers. Conduct post-incident reviews to identify root causes and implement preventive measures.
-
Design, implement, and maintain robust monitoring and alerting systems to ensure real-time visibility into the health of production environments.
-
Perform capacity analysis and forecasting to ensure systems can handle growth and peak demand without degradation.
-
Work closely with development, DevOps, and infrastructure teams to integrate reliability engineering practices into the software development lifecycle.
-
Identify performance bottlenecks and work on tuning systems for optimal performance, including database, application, and infrastructure optimizations.
-
Ensure that systems and processes adhere to security and compliance standards, integrating security best practices into SRE activities.
-
Provide regular updates and reports to leadership on system performance, incidents, and improvement initiatives.
Technical Knowledge and Understanding:
-
In-depth knowledge of CI/CD pipelines, release management, and software lifecycle processes.
-
Exceptional leadership and team management skills, with the ability to motivate and develop high-performing teams.
-
Strong problem-solving and analytical skills, with a focus on data-driven decision-making.
-
Excellent communication skills, with the ability to articulate technical issues clearly and effectively to both technical and non-technical stakeholders.
-
Ability to manage multiple priorities in a fast-paced, high-stakes environment.
Experience:
-
Bachelor’s degree in Computer Science, Engineering, Information Technology, or a related field.
-
Master’s degree preferred.
-
7+ years of experience in site reliability engineering, DevOps, or software engineering, with at least 3 years in a leadership role.
-
Strong expertise in cloud platforms (e.g., AWS, Azure, GCP), containerization (Docker, Kubernetes), and infrastructure as code (Terraform, Ansible).
-
Proven experience with monitoring and observability tools (e.g., Prometheus, Grafana, Splunk, Datadog) and incident management frameworks.
-
Strong programming skills in languages such as Python, Go, or Java, with a focus on building automation and reliability tools.
-
Experience with security best practices and compliance requirements in an SRE context.
Seeking candidates that can work a hybrid schedule from Cincinnati, OH or Dallas, TX.
Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over sponsorship of an employment Visa for this position.
#LI-Hybrid
-
At Core Specialty, you will receive a competitive salary and opportunities for professional development and advancement. We offer medical, dental, vision, and life insurances; short and long-term disability; a Company-match of 100% of a 6% contribution 401(k) plan; an Employee Assistance Plan; Health Savings Account, Flexible Spending Account, Health Reimbursement Account, and a wellness program
Top Skills
What We Do
Core Specialty, through its subsidiary insurers, offers a diversified range of property, casualty, and marine insurance products for small to mid-sized businesses.
We have the capital to take on risk, the underwriting talent in place, decisive leadership team, infrastructure, and a proven track record of making things happen – fast!
When you’re ready to solve your toughest insurance needs, we’re ready to get it done for you.
We free customers up to focus on their business by taking the load of complicated specialty insurance off their hands.
We break down the walls of bureaucracy to provide optimal underwriting solutions for brokers.
We’re ready, equipped, and motivated to get the job done, efficiently and professionally, by empowering experts with what they need to move quickly on behalf of customers and their brokers.
Our specialty focus is the essential part of our identity.
It is at our core