Site Reliability Engineer

Reposted Yesterday
San Francisco, CA
In-Office
Senior level
Artificial Intelligence • Healthtech
The leading AI workflow automation platform built for healthcare
The Role
The Site Reliability Engineer will enhance system reliability, define observability standards, respond to incidents, and collaborate with engineering teams on performance and compliance improvements.
Summary Generated by Built In

About Plenful

Plenful is on a mission to transform healthcare operations from the inside out. Fresh off our $50M Series B and backed by Bessemer Venture Partners, Notable Capital, TQ Ventures, Susa/Kivu Ventures, and other leading investors, we’re building the category-defining AI agentic operating platform that healthcare teams rely on to operate smarter, faster, and more efficiently. Our technology empowers healthcare operators across hospital and health systems, pharmacies and payors to eliminate manual work, reduce administrative burden, and improve compliance, all while unlocking critical revenue to fund programs for their in-need patient populations.


Built by healthcare operators for healthcare operators, Plenful is driven by a deep understanding of the challenges facing today’s care teams. We’re passionate about equipping healthcare workers with world-class tools that deliver real, measurable impact, and we’re proud to serve leading healthcare organizations across the country. If you’re excited to help shape the future of healthcare, we’d love to meet you. Apply now to join our growing team.


About the role
We’re hiring an SRE to join our engineering team at Plenful and take ownership of the reliability and performance of the systems that power our product. You’ll work across our distributed workflow engine, serverless pipelines, containerized services and Postgres based data layer. This role reports into engineering leadership and will influence how we build, scale and operate our platform as we continue to grow.

You’ll bring strong technical judgment, calm problem solving during incidents and a practical approach to improving reliability. You’ll collaborate closely with backend, ML and DevOps engineers and help shape a culture where operational excellence is clear, repeatable and shared across the team.
What you’ll do
Reliability, Observability and Performance:

  • Maintain and evolve alerting so engineers receive clear, actionable signals for anomalies, latency regressions and reliability risks.
  • Define observability standards across metrics, logs and tracing with a focus on reliability, performance and customer impact instead of vanity data.
  • Investigate performance bottlenecks across our distributed systems including serverless task execution, containerized services, workflow orchestration and Postgres.
  • Lead incident response, coordinate root cause analysis and ensure reliability improvements are fully implemented and measured.

Infrastructure and Platform Operations:

  • Improve the reliability of our distributed task processing, including autoscaling behavior, execution patterns, retry logic, rate limiting and failure isolation.
  • Support the stability of our serverless pipelines that process high volume workloads across multiple execution layers.
  • Partner with backend and ML teams on designing resilient mechanisms for scheduling, queueing and workflow execution.
  • Maintain efficient and predictable resource usage across compute, networking and storage.

Security, Compliance and Operational Excellence:

  • Support security and compliance work including patching, audit readiness and vulnerability management.
  • Participate in the on-call rotation and respond to production incidents quickly and calmly with a focus on restoring stable service and clear communication.
  • Contribute to blameless postmortems, drive follow through on fixes and ensure learnings are documented for future engineers.
What we’re looking for
  • 5+ years of professional engineering experience in a B2B, SaaS company.
  • Strong experience operating production systems in cloud environments, ideally AWS.
  • Hands-on experience with serverless compute patterns, containerized services, distributed workflows and Postgres.
  • Solid understanding of observability tooling, performance debugging and system behavior under load.
  • A high ownership mindset, empathy for teammates, straightforward communication and a one team attitude.
  • Comfortable working in a fast paced startup environment with a bias for action and thoughtful engineering judgment.
Plenful perks
  • Comprehensive Benefits Package: Enjoy unlimited PTO, fully covered health insurance (medical, dental, and vision), meal stipend, health & wellness stipend, 401(k) matching, and stock options.
  • Mission-Driven, World-Class Team: Join an exceptional group of professionals aligned around a meaningful mission and committed to making an impact.
  • Opportunities for Growth: Strengthen your partnership expertise through collaboration with experienced, high-performing leaders across the organization.
  • Flexible Work Environment: Employees based in the Bay Area enjoy two days per week in a brand-new downtown San Francisco office. Employees based in other cities enjoy a fully remote work environment with the ability to travel for collaboration.

Top Skills

AWS
Containerized Services
Distributed Workflows
Observability Tooling
Postgres
Serverless Compute
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
San Francisco, CA
43 Employees

What We Do

Plenful is a healthcare AI-powered workflow automation platform for pharmacy and healthcare operations, streamlining manual and administrative tasks to reduce costs and drive revenue. Trusted by leading pharmacy and healthcare teams, Plenful provides highly configurable automation solutions for 340B auditing and savings identification, document data entry, inventory management, and other high ROI use-cases.

Backed by Bessemer Venture Partners, TQ Ventures, Mitch Rales (Cofounder & Chairman of Danaher), Susa Ventures, Waterline Ventures, and other leading healthcare and software investors

Similar Jobs

DFIN Logo DFIN

Site Reliability Engineer

Fintech • Software
Remote or Hybrid
United States
1750 Employees

Anduril Logo Anduril

Site Reliability Engineer

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Costa Mesa, CA, USA
6000 Employees
124K-231K Annually

Wells Fargo Logo Wells Fargo

Software Engineer

Fintech • Financial Services
Hybrid
2 Locations
205000 Employees
37K-70K Hourly

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Hybrid
2 Locations
8697 Employees
119K-170K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account