Site Reliability and DevOps Engineering Lead

Posted Yesterday
Be an Early Applicant
Hiring Remotely in United States
Remote
131K-197K Annually
Senior level
Information Technology • Consulting
The Role
Lead and grow a Platform/DevOps team to ensure a highly available, performant, secure clinical SaaS platform. Own SRE practices (SLIs/SLOs, error budgets), CI/CD and release automation, incident leadership, observability, capacity planning, vendor governance, and platform strategy. Drive automation, reliability engineering, and AI-enabled pipeline optimization while participating in on-call rotation and cross-team collaboration.
Summary Generated by Built In
Micromedex by Merative is a trusted clinical decision support solution used by clinicians in thousands of hospitals, health systems, payers, and government agencies worldwide. For over 50 years, we’ve delivered evidence-based drug, toxicology, and disease information to help clinicians make confident, timely decisions and educate patients at the point of care. Today, Micromedex is evolving. With a modernized homepage and AI-powered search, clinicians can now find precise answers faster—supported by rigorously validated, evidence-based content. Our portfolio includes drug reference, IV compatibility, pediatric dosing, toxicology databases, and integrated calculators, all accessible via web and mobile. By combining authoritative content with intuitive, AI-enhanced tools, Micromedex empowers healthcare organizations to improve medication safety, reduce adverse events, and deliver better patient outcomes.
Micromedex is seeking a highly skilled Platform Reliability & DevOps Engineering Lead who combines deep hands-on expertise in cloud services, infrastructure, and automation with a strong architectural understanding of distributed, high-availability systems.
You will lead the platform team, ensuring our mission-critical clinical platform is highly available (24×7), performant, scalable, and secure.
This role is both strategic and hands-on: you will define and drive the platform reliability and DevOps strategy, continuously improving system resilience and CI/CD capability, while partnering closely with engineering teams and vendors to embed operational excellence across the software lifecycle.
You will be accountable for the end-to-end reliability, operability, and delivery capability of the Micromedex platform, unifying Site Reliability Engineering, DevOps, and CI/CD ownership into a single platform function. This includes owning platform reliability outcomes, DevOps enablement, and delivery pipelines to support scalable, high-availability systems and faster, safer releases.
You are passionate about automation, proactive in addressing reliability and performance challenges, and committed to maintaining the trust of clinicians worldwide through resilient system design, strong operational discipline, and rapid incident response.

Responsibilities:

People & Team Leadership 

  • Lead, mentor, and grow Platform / DevOps engineers 

  • Build a high-performing Platform team 

  • Drive accountability for platform reliability and delivery outcomes 

  • Lead vendors to deliver capabilities in production.  

Production Engineering & Platform Operations 

  • Ensure platform capabilities accelerate product delivery, remove bottlenecks.  

  • Defines and enforces platform engineering standards and DevOps practices across all teams and vendors 

  • Lead capacity planning, performance optimization, and cost efficiency 

  • Define operational standards, runbooks, and reliability practices  

  • Accountable for platform reliability outcomes at enterprise/product level 

Platform Strategy and Leadership  

  • Act as technical authority across platform, reliability, and delivery  

  • Define platform strategy and roadmap  

  • Govern delivery across internal teams and vendors 

Platform Reliability Ownership  

  • Own SLIs, SLOs, and error budgets  

  • Lead resilience engineering, observability, and failure design  

  • Drive proactive risk reduction and continuous improvement 

  • Own incident management frameworks and continuous improvement 

CI/CD and Release Engineering  

  • Own end-to-end pipeline architecture and release automation  

  • Standardize, secure, and fully automate pipelines  

  • Drive continuous integration, delivery, and validation practices 

Incident Leadership  

  • Lead Sev1 response, escalation, and recovery  

  • Own RCA and drive systemic fixes (not point fixes) 

Introduce AI-enabled pipeline optimization and quality gates 

  • Embed AI into monitoring, risk prediction, and CI/CD optimization  

  • Drive automation to reduce operational toil and improve decision-making 

Required Skills:

  • Bachelor’s degree in computer science, Engineering, or a related field. 

  • 6-10 years of hands-on experience in software operations, DevOps and Site Reliability Engineering, including managing large-scale, mission-critical systems. 

  • Clear and confident communication skills with ability to lead teams and collaborate effectively across engineering, product, and architecture teams. 

  • Proven track record ensuring high availability and performance in production environments, with expertise in fault-tolerant, distributed system design.  

  • Excellent understanding of modern software delivery pipelines and DevOps practices, including CI/CD, configuration management, and version control (Git). 

  • Exceptional problem-solving skills, with experience diagnosing complex system issues under pressure and driving them to resolution. 

  • Strong proficiency in at least one programming or scripting language (e.g., Python, Bash, or Java) for automation and tool integration. 

  • Self-driven and proactive, with a passion for automating manual processes and continuously improving systems to enhance reliability and team productivity. 

Key Skills and Experience:

Proven experience:  

  • Releasing into and running mission-critical, high-availability SaaS platforms 

  • Technically leading a Platform team and influence stakeholders and vendors.  

  • Stakeholder engagement across Product, Architecture, and Operations 

Deep expertise in:  

  • Site Reliability Engineering (SLI/SLO, error budgets, incident management) 

  • DevOps operating models and platform engineering (engineering transformation)  

  • CI/CD architecture and release automation   

  • Cloud, Systems & Infrastructure (DB2, Oracle, Infinispan, OpenLiberty) 

  • Automation-first engineering with proven usage of AI (self-healing, triage) 

  • Java application platforms and runtimes (performance tuning, troubleshooting, production operations) 

Strong experience with:  

  • Cloud platforms (Azure preferred) 

  • Distributed systems and fault-tolerant architectures 

  • Performance Tuning and Scaling 

  • Database optimisation (DB2, Oracle, PostgreSQL) 

  • Multi-region / active-active environments 

  • Monitoring, logging, tracing frameworks 

  • Experience embedding reliability practices into the SDLC 

Hands-on with:  

  • DB2, Oracle, Infinispan, OpenLiberty, Azure 

  • Infrastructure as Code (Terraform or similar) 

  • Containerisation and orchestration (Docker/Kubernetes) 

Work Environment 

This is a remote-first role, collaborating daily with global teams across engineering, product, architecture, and DevOps. The SRE/DevOps Lead Engineer will interact with colleagues across multiple time zones and must occasionally flex working hours to ensure smooth handoffs and incident coverage. Participation in an on-call rotation is expected as part of our commitment to 24×7 support of a clinical-grade platform. We are a fast-paced, collaborative environment that values continuous learning, proactive problem-solving, and the sharing of ideas. Minimal travel may be required for periodic team on-sites or company engineering summits. 

Compensation

The salary range provided in this job posting is intended to reflect the general market value for the position. The actual salary offered may vary based on factors such as the candidate’s experience, qualifications, skills, and the specific requirements of the role. This range may also be subject to change as market conditions evolve. We encourage open communication throughout the interview process to discuss compensation expectations. For base-salary + commission sales roles, the range represents On-Target Earnings.

Min – Max :

$131,381.86 - $197,072.78 (USD)

Benefits

The benefits described represent the current offerings at our organization, however, benefits are subject to change and may vary by location and employment status.  We strive to provide a comprehensive benefits package that supports our employees’ health, wellness, and financial goals.  Please note that benefits may be discussed in more detail during the hiring process.

  • Remote first / work from home culture

  • Flexible vacation to help you rest, recharge, and connect with loved ones

  • Paid leave benefits

  • Health, dental, and vision insurance

  • 401k retirement savings plan

  • Infertility benefits

  • Tuition reimbursement, life insurance, EAP – and more!



It is the policy of Merative to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, Merative will provide reasonable accommodations for qualified individuals with disabilities.

Merative participates in the federal E-Verify program to confirm the identity and employment authorization of all newly hired employees. For further information about the E-Verify program, please click here: http://www.uscis.gov/e-verify/employees

Skills Required

  • Bachelor's degree in computer science, engineering, or related field
  • 6-10 years hands-on experience in software operations, DevOps, and Site Reliability Engineering
  • Proven track record ensuring high availability and performance in production, fault-tolerant distributed system design
  • Experience with SLIs, SLOs, error budgets, and incident management
  • Ownership of CI/CD architecture, release automation, and modern software delivery pipelines (CI/CD, configuration management, version control)
  • Strong proficiency in at least one programming/scripting language (Python, Bash, or Java)
  • Experience releasing into and running mission-critical, high-availability SaaS platforms
  • Technical leadership of a Platform team and ability to influence stakeholders and vendors
  • Deep experience with cloud, systems & infrastructure (DB2, Oracle, Infinispan, OpenLiberty)
  • Database optimisation experience (DB2, Oracle, PostgreSQL)
  • Experience with multi-region / active-active environments and performance tuning/scaling
  • Hands-on experience with Infrastructure as Code (Terraform or similar)
  • Containerisation and orchestration experience (Docker, Kubernetes)
  • Experience with monitoring, logging, and tracing frameworks and observability practices
  • Automation-first engineering mindset with proven usage of AI for self-healing or triage
  • Java application platform and runtime experience (performance tuning, production operations)
  • Cloud platform experience (Azure preferred)
  • Clear and confident communication skills and stakeholder engagement ability

Merative Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Merative and has not been reviewed or approved by Merative.

  • Wellbeing & Lifestyle Benefits Remote and hybrid flexibility supports work–life balance. Flexible schedules and remote-first norms are positioned as a relative bright spot across many roles.
  • Leave & Time Off Breadth Flexible or unlimited PTO policies are available in many roles alongside paid sick time. Parental leave adds further breadth to time-off options.
  • Fair & Transparent Compensation Base pay is considered fair to good in select roles, with some positions characterized as having solid compensation. Competitive ranges are evident for certain senior product and architecture roles.

Merative Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Ann Arbor, MI
1,585 Employees
Year Founded: 2022

What We Do

Merative is a data, software and technology partner for the health and government social services industries, working with providers, health plans, employers, life sciences companies and governments. With trusted technology and human expertise, the company works with clients to drive real progress. Merative helps clients orient information and insights around the people they serve to improve decision-making and performance. Merative, formerly IBM Watson Health, became a new standalone company as part of Francisco Partners in 2022. Learn more at merative.com

Similar Jobs

Circle Logo Circle

Senior Analyst, AML Know Your Customer

Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
In-Office or Remote
23 Locations
1050 Employees
98K-128K Annually

Xero Logo Xero

Associate Enterprise Partner Success Manager

Cloud • Fintech • Information Technology • Machine Learning • Software
Remote or Hybrid
Florida, USA
4500 Employees

Xero Logo Xero

Associate Enterprise Partner Success Manager

Cloud • Fintech • Information Technology • Machine Learning • Software
Remote or Hybrid
North Carolina, USA
4500 Employees

Xero Logo Xero

Associate Enterprise Partner Success Manager

Cloud • Fintech • Information Technology • Machine Learning • Software
Remote or Hybrid
Georgia, USA
4500 Employees

Similar Companies Hiring

Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account