Staff Site Reliability Engineer

Posted 3 Days Ago
Hiring Remotely in San Francisco, CA, USA
In-Office or Remote
136K-180K Annually
Senior level
Big Data • Energy • Big Data Analytics
Decarbonizing global energy with comprehensive, transparent data solutions.
The Role
The Staff Site Reliability Engineer will lead in designing and maintaining cloud infrastructure on GCP, drive IaC strategy, manage Kubernetes operations, ensure security compliance, and mentor engineers.
Summary Generated by Built In

As a Staff Site Reliability Engineer, you will be a key technical leader responsible for the architecture, reliability, and security of our entire cloud infrastructure. You will drive technical direction, mentor engineers, and solve our most complex infrastructure challenges as a hands-on contributor.

You will lead the management of our Google Cloud Platform (GCP) environment, drive our Infrastructure as Code (IaC) strategy, and ensure our Kubernetes-based microservices are deployed seamlessly and securely. You will serve as the expert for scalability, observability, and building the robust, automated systems that power Kevala's continuous deployment pipeline.

The applicant must have current, unrestricted work authorization in the United States. This job is not eligible for visa sponsorship.

What you will be doing

  • Architect & Maintain: Design, build, and maintain our core cloud-native infrastructure on Google Cloud Platform (GCP) following established best practices.
  • Infrastructure as Code (IaC): Lead our IaC strategy, writing and reviewing high-quality Terraform to manage all cloud resources in a repeatable and version-controlled way.
  • Kubernetes Operation: Manage and scale our Google Kubernetes Engine (GKE) clusters, including configuration of ingress, and monitoring components.
  • Champion Security & Compliance: Integrate, implement, and audit security best practices across all infrastructure layers (GCP IAM, GKE policies, network security), ensuring regulatory compliance and leading incident response.
  • Database Reliability: Manage the provisioning, scaling, and reliability of our Postgres databases (e.g., Cloud SQL) and other data stores.
  • Observability: Build and refine our monitoring, tracing, logging, and alerting systems (e.g., OpenTelemetry, Grafana, Google Cloud's operations suite) to ensure high availability.
  • Mentorship and Design: Partner with engineering teams on scalable architecture design. Mentor other engineers on DevOps practices, cloud architecture, and security.

What you need to succeed

  • Experience: 8+ years in a SRE, DevOps, or Infrastructure Engineering role, with a proven track record of operating in a Staff or similar technical leadership capacity.
  • Leadership & Communication: Excellent communication skills with the ability to clearly articulate complex technical decisions, mentor team members, and drive projects to completion.
  • GCP Proficiency: Extensive hands-on experience designing and managing production environments in Google Cloud Platform.
  • Kubernetes (K8s) Expert: Advanced knowledge of Kubernetes and its ecosystem (GKE preferred), including cluster administration and deployment tooling (e.g., Helm).
  • Terraform/IaC: Extensive, production-level experience using Terraform to manage complex cloud environments.
  • Automation: Deep experience with automation tooling and scripting (e.g., Bash, Python, Go) to manage infrastructure and operations at scale.
  • Database Skills: Experience managing and scaling relational databases like Postgres in a production environment.
  • Security Implementation & Auditing: Practical experience designing, implementing, and auditing security controls for cloud infrastructure, networks, and applications (e.g., IAM, network security).

The compensation for this opportunity includes a base salary range of $ 136,000 - $ 180,000, plus equity (stock options). This is our target compensation range and is subject to multiple factors, including level, experience, and location. As you go through our interview process, our recruiter will work with you to identify a competitive base salary within the proposed range and combine it with an equity package that reflects your excitement about joining Kevala.

This is a fully remote role which can be located anywhere within the United States. Please note that actual salaries may vary based on factors including, but not limited to, education, experience, and location.

Skills Required

  • 8+ years in a SRE, DevOps, or Infrastructure Engineering role
  • Extensive hands-on experience in Google Cloud Platform
  • Advanced knowledge of Kubernetes
  • Extensive experience using Terraform
  • Deep experience with automation tooling and scripting
  • Experience managing and scaling relational databases like Postgres
  • Practical experience designing and implementing security controls
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
40 Employees
Year Founded: 2014

What We Do

At Kevala, we are on a mission to decarbonize the global energy economy using the most comprehensive data sets available. We are a group of ambitious intellectuals who embrace unconventional approaches to solving complex problems. We foster a culture where everyone is encouraged to collaborate, create, and support one another in our collective endeavors. As a fast-growing startup, we are looking for individuals who are passionate about the environment and excited to join in on our mission to make energy-related data meaningful, transparent, and broadly accessible.

Why Work With Us

Kevala is unique because it combines advanced analytics, technology, and energy expertise to help modernize the grid and accelerate clean energy adoption. Employees work on meaningful, high-impact challenges in a collaborative, mission-driven environment while helping shape a smarter, more sustainable energy future for communities everywhere today.

Similar Jobs

Dropbox Logo Dropbox

Site Reliability Engineer

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Remote
United States
2500 Employees
223K-302K Annually

ServiceNow Logo ServiceNow

Site Reliability Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Santa Clara, CA, USA
29000 Employees
166K-290K Annually

Sprinter Health Logo Sprinter Health

Site Reliability Engineer

Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth
Remote or Hybrid
2 Locations
500 Employees
160K-255K Annually

Ping Identity Logo Ping Identity

Site Reliability Engineer

Cloud • Security • Software
Remote or Hybrid
USA
2300 Employees
136K-170K Annually

Similar Companies Hiring

Prolaio Thumbnail
Artificial Intelligence • Big Data • Healthtech • Mobile • Wearables • Analytics
Chicago, IL
82 Employees
Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Milestone Systems Thumbnail
Artificial Intelligence • Security • Software • Analytics • Big Data Analytics
Lake Oswego, OR
1500 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account