Site Reliability Engineer

Signify Health

| Remote

Sorry, this job was removed at 1:39 p.m. (CST) on Monday, June 27, 2022

View 28998 Jobs

Find out who’s hiring remotely Nationwide

See all Remote jobs Nationwide

View 28998 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Signify Health is looking for a Site Reliability Engineer to join our SRE and Release Management team. Keys to our SRE culture include teamwork, inquisitiveness, problem-solving, critical-thinking, transparency, and diversity. We are looking for a dedicated SRE who enjoys building and running distributed systems at scale in an AWS environment and appreciates the challenges and trade-offs to be made when building and deploying systems to production deployments, monitoring, scheduling, and load balancing.

We work closely with software and systems engineers to drive adoption of modern reliability practices like SLOs, error budget policies, actionable alerts, incident retrospectives, chaos testing, and end-to-end ownership.

You will discover ample opportunities for growth in many areas such as improved technology skills, effective leadership, dedicated mentorship, creative design, strong communication skills, teamwork, and more. Simply put, as an SRE, you will help Signify Health leverage high system availability and service reliability via best of breed observability techniques. Are you up for the challenge?

Responsibilities

Design, develop, and implement software that improves the stability, scalability, availability, and latency of Signify Health products
Implement application/infrastructure observability solutions and perform maintenance to ensure desired application availability
Real-time service management inclusive of building monitoring for the golden signal SLIs, establishing, negotiating SLOs with the business, building alerting, creating playbooks and runbooks for services in conjunction with development teams, product owners and support
Triage and decompose incidents into smaller pieces, identify probable root causes using skills gained through debugging code, operating networks, building hardware, or in other, entirely unrelated domains
Work closely with software engineers to build reliable, performant systems

Basic Qualifications

Bachelors degree in relevant technical field of study or an equivalent combination of experience and training
5 or more years of relevant professional experience
Knowledge of standard methodologies related to security, performance, and disaster recovery
Strong Knowledge working with Database Admin / Management (Examples: RDBMS, RDS, Various SQL, MongoDB, etc)
Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues.
Demonstrated ability to work across teams and functions to influence design, operations and deployment of highly available software
Strong analytical skills in support of production issue resolution and root cause identification.
Strong organizational skills to manage a variety of work areas and cross team engagements.
Strong experience working on high data volume applications managed with modern Infrastructure-as-Code methodologies/tooling.
Experience with container technologies and orchestration platforms (Docker, Kubernetes, Rancher, Cloud Foundry)
Experience managing and using CI/CD tech stack systems (Bamboo, Azure DevOps, Jenkins, CircleCi)
Experience implementing a highly scalable/distributed CiCD Pipeline.
Experience working with monitoring and observability tools (We use New Relic and OpsGenie )

Some Preferred Qualifications

Strong knowledge of programming/scripting languages (Python, Bash, Groovy, Golang, IaC (Terraform). Software Engineers looking to get into SRE/Devops are encouraged to apply.
Understanding of IT capacity management to ensure that IT resources are sufficient to meet future needs. Able to map IT resources to meet current and future requirements
Prior Database administration background (RDBMS, RDS, Snowflake, SQL, Oracle, MongoDB, PostgreSQL etc)

About Us:

Signify Health is a leading healthcare platform that leverages advanced analytics, technology, and nationwide healthcare provider networks to create and power value-based payment programs. Our mission is to transform how care is paid for and delivered so that people can enjoy more healthy, happy days at home.

We’re focused on activating the home as a key part of the care continuum, lessening dependence on facility-centric care, preventing adverse events and facilitating holistic condition management to address individuals’ total clinical, behavioral and social care needs.

Our solutions support value-based payment programs for payors, providers and other healthcare organizations by aligning financial incentives around health outcomes. We meet people where they are, helping them stay healthy and independent at home and supporting their recovery homeward as part of an episode of care.

To learn more about how we’re driving outcomes and making healthcare work better, please visit us at www.signifyhealth.com.

More Information on Signify Health

Signify Health operates in the Healthtech industry. The company is located in Dallas, TX and Ellsworth AFB, SD. Signify Health was founded in 2017. It has 2219 total employees. It offers perks and benefits such as Flexible Spending Account (FSA), Disability insurance, Dental insurance, Vision insurance, Health insurance and Life insurance. To see all 12 open jobs at Signify Health, click here.

Read Full Job Description

Site Reliability Engineer

Similar Jobs