Sr. Site Reliability Engineer

Posted 11 Hours Ago
Hiring Remotely in United States
Remote
145K-200K Annually
5-7 Years Experience
Artificial Intelligence • Healthtech • Machine Learning • Software • Business Intelligence
AKASA is building the future of healthcare with AI.
The Role
As a Senior Site Reliability Engineer, you will collaborate with Infrastructure and Platform teams to implement monitoring best practices, manage infrastructure using tools like Terraform and Kubernetes, and respond to incidents. You will design visualizations and manage alert systems, ensuring service reliability and optimizing operational processes. Your role involves leveraging telemetry data and collaborating with developers to maintain resilient applications and improve user experiences.
Summary Generated by Built In

About AKASA


AKASA is the preeminent provider of generative AI solutions for the healthcare revenue cycle. The company has raised more than $205M in funding from investors such as Andreessen Horowitz, BOND, and Costanoa Ventures. 


Named one of the fastest-growing GenAI startups to watch by AIM Research, we’re solving the biggest challenges in the financial infrastructure of healthcare. Transaction volume through the AKASA Platform has grown consistently, with a ~2.5x year-over-year increase in the last year. The AKASA customer base represents more than $90B in net patient revenue and includes the most innovative health systems in the country, like Stanford and Johns Hopkins.


Our founding team includes Silicon Valley leaders who have founded or been founding team members of multiple companies with successful exits. Our CEO was ranked among the “Top 50 Healthcare Technology CEOs” by the Healthcare Technology Report. We have been recognized as one of “America’s Best Startup Employers” by Forbes, “Most Innovative Digital Health Startups” by CB Insights, “Best Companies for Remote Workers” by Quartz, and “Best Places to Work” by Fortune, Modern Healthcare, and Built-In, along with being certified as a “Great Place to Work” for the past four years in a row. 


Learn more at www.AKASA.com.


We are building the future of healthcare with AI. Everyone is welcome. As an inclusive workplace, we are committed to building an environment where our employees are comfortable bringing their authentic selves to work.


Join us!


About the Role

 

In this role, you will work closely with both Infrastructure and Platform team members to integrate best practice monitoring into our applications. Your focus will be on developing high-quality runbooks for incident management, ensuring that our response procedures are efficient and effective. You will be responsible for building high-quality visualizations and meaningful alerting systems that provide clear, actionable insights into system performance and health.

 

As an SRE, you will manage and optimize our infrastructure using tools like Terraform, GitHub CI/CD, and Kubernetes. You will respond to incidents, troubleshoot production issues across the entire stack, and implement automation to streamline operational processes. Your role will involve designing and maintaining core infrastructure to support our users, ensuring our SaaS products run smoothly and efficiently.

 

Additionally, you will be proactive in identifying potential issues before they become outages, leveraging your expertise in telemetry data collection, querying, and monitoring using tools such as Grafana, Prometheus/Mimir, OpenSearch, and Sentry. You will collaborate with development teams to embed reliability and best practices into the software development lifecycle, ensuring robust and resilient applications.

 

Your contributions will be vital in scaling our monitoring infrastructure, enhancing system reliability, and ensuring seamless user experiences. By continuously improving our infrastructure and processes, you will help AKASA deliver high-quality, dependable services to our customers.

 

AKASA is based in South San Francisco. As a company, we embraced remote work. We consider ourselves experts in working collaboratively wherever our team members reside.

What You'll Do

  • Incident Response: Lead an on-call rotation (PagerDuty) to respond to incidents impacting system availability.
  • Application Architecture: Dive deep into our application architectures and work with engineering teams on best practices for monitoring, reliability, and scalability.
  • Infrastructure Management: Manage our infrastructure using Terraform, GitHub CI/CD, and Kubernetes.
  • Proactive Monitoring: Develop monitoring solutions that alert based on symptoms rather than outages.
  • Documentation: Document every action to turn findings into repeatable processes and automation.
  • Process Improvement: Enhance operational processes (such as deployments and upgrades) to ensure reliability and efficiency.
  • Infrastructure Development: Design, build, and maintain core infrastructure to support our applications effectively.
  • Troubleshooting: Troubleshoot and resolve production issues across various services and levels of the stack.
  • Growth Planning: Strategically plan and scale AKASA’s monitoring

Skills & Qualifications

  • Monitoring: Proficient in visualizing, monitoring, and alerting on telemetry data (logs, metrics, & traces) using tools such as Grafana, Prometheus/Mimir, OpenSearch, Sentry, and similar technologies.
  • Containerization & Infrastructure: Experience with Docker, Kubernetes, Terraform, or similar technologies.
  • Programming Skills: 5+ years of professional experience using Python, Go, Java, or similar
  • Linux/Unix Proficiency: Proficient with Linux and Unix Shell
  • Collaboration: Excellent collaboration and asynchronous communication skills.
  • Documentation: Committed to thorough documentation to streamline learning and processes.
  • Proactive Attitude: Proactive and enthusiastic attitude towards identifying and fixing issues.
  • Agility: Ability to deliver quickly, iterate fast, and adapt to changing requirements.
  • Version Control: Proficient in using Git/GitHub for version control.

Nice to haves

  • Cloud Platforms: Experience with AWS (preferred), Google Cloud, or Azure.
  • Networking: Understanding of networking principles and protocols.
  • Security: Knowledge of security best practices in infrastructure management.
  • Performance Tuning: Experience in performance tuning and optimization

What We Offer

  • Unlimited paid time off (PTO)
  • Expansive coverage for health, dental, and vision
  • Employer contribution to Health Savings Accounts (HSA)
  • Generous parental leave policy
  • Full employee coverage for life insurance
  • Company-paid holidays
  • 401(K) plan

Compensation

  • Based on market data and other factors, the salary range for this position is $145,000-$200,000 + Equity. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.
  • The above represents the expected salary range for this job requisition. Ultimately, in determining your pay, we'll consider your location, experience, and other job-related factors.

We’re committed to doing the best work of our lives, together. Come see if we're the right team for you.


AKASA is a proud equal opportunity employer and we believe that a diverse and inclusive workforce is an imperative. We welcome people of different backgrounds, genders, races, ethnicities, abilities, sexual orientations, and perspectives, just to name a few. We do not discriminate based upon any protected class and we encourage candidates of all identities and backgrounds to apply. AKASA considers qualified applicants regardless of criminal histories in accordance with the San Francisco Fair Chance Ordinance.


AKASA is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at [email protected].

Top Skills

Git
Grafana
Kubernetes
Opensearch
Prometheus
Sentry
Terraform
The Company
HQ: South San Francisco, CA
248 Employees
Remote Workplace
Year Founded: 2019

What We Do

AKASA is the leading developer of AI for healthcare operations. AKASA scales human intelligence with leading-edge AI and ML securely trained on customer data to learn unique systems, continuously adapt to changing environments, and deliver comprehensive automation and analytics for complex workflows. The result is a seamlessly integrated, customized solution that reduces operating costs, frees up staff to do the work they love, and helps health systems allocate resources to where they matter most.

Why Work With Us

AKASA is the only platform that scales human intelligence with leading-edge AI and ML trained on customer data to learn unique systems, continuously adapt to changing environments, and deliver comprehensive automation and analytics for complex workflows. Our culture drives innovation within every team and offers support to any and all ideas.

Gallery

Gallery

Jobs at Similar Companies

bet365 Logo bet365

Trading Assistant

Digital Media • Gaming • Software • eSports • Automation
Denver, CO, USA
6100 Employees
48K-53K Annually

Jobba Trade Technologies, Inc. Logo Jobba Trade Technologies, Inc.

Customer Success Specialist

Cloud • Information Technology • Productivity • Professional Services • Software
Hybrid
Chicago, IL, USA
45 Employees

InCommodities Logo InCommodities

Head of People & Culture - US

Information Technology • Machine Learning • Analytics • Energy • Automation • Renewable Energy
Hybrid
Austin, TX, USA
234 Employees

Similar Companies Hiring

bet365 Thumbnail
Software • Gaming • eSports • Digital Media • Automation
Denver, Colorado
6100 Employees
Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees
InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account