Senior Site Reliability Engineer

Posted 13 Days Ago
Be an Early Applicant
2 Locations
In-Office or Remote
Senior level
Database
The Role
Lead reliability engineering efforts: define SLIs/SLOs, run incident response and postmortems, automate operations, build CI/CD and deployment strategies, operate containerized workloads on Kubernetes, optimize cloud performance and cost, implement observability, mentor SREs, and drive platform reliability improvements.
Summary Generated by Built In
  • Define and maintain SLIs/SLOs, monitor alignment and error budget usage
  • Lead incident response and postmortems, implement corrective measures
  • Automate operations tasks via tooling (e.g. auto-remediation, scaling rules)
  • Build, improve, and maintain CI/CD pipelines, canary deployments, blue/green strategies
  • Lead technical discussions with customers to align on reliability, scalability, and performance requirements
  • Drive continuous platform improvements across the service lifecycle, including architecture, monitoring, and operational processes
    Implement and extend observability systems (metrics, tracing, log aggregation)
  • Optimize performance and cost by tuning cloud services, autoscaling, resource rightsizing
  • Design, deploy, and operate containerized workloads using Docker and Kubernetes in production environments
  • Collaborate with dev teams to integrate resilience patterns (circuit breakers, bulkheading)
  • Participate in architecture discussions around high availability, disaster recovery
  • Mentor mid and junior SREs; conduct reliability design reviews
  • 5–8 years of experience in a reliability or operations role
  • Cloud-agnostic certification: Terraform Associate, Certified Kubernetes Administrator (CKA), or SRE Foundation
  • Cloud provider certification: Professional-level certification in AWS (Solutions Architect), Azure (Solutions Architect Expert), GCP (Professional Cloud Architect), or Oracle Cloud (Architect Professional)
  • Solid coding skills (Python, Go, or equivalent)
  • Experience with IaC, CI/CD pipelines, and monitoring/observability stacks (Prometheus, Grafana, OpenTelemetry, ELK)
  • Comfortable with observability stacks (Prometheus, Grafana, OpenTelemetry, ELK, Jaeger)
  • Experience working in distributed systems and production scale services
     

Nice-to-have Skills

  • Exposure to multi-cloud data replication or cross-cloud networks
  • Experience with chaos engineering or fault injection
About the Team
Datavail’s Team of Oracle Experts Can Save You Time and Money

As an Oracle Platinum Partner with 17 specializations, we have extensive experience with everything Oracle. Our experts have an average of 16 years of experience. They’ve overcome every obstacle in helping clients manage everything from databases, BI analytics, reporting, migrations, and upgrades to monitoring and overall data management.

You can free up your IT resources to focus on growing your business rather than fighting fires. Our Oracle experts can guide you through strategic initiatives or support routine database management.


Datavail’s Comprehensive Oracle Database Services

Datavail offers Oracle consulting services that allow you to take advantage of all the features of the Oracle database. We can also assist you in designing, implementing, and managing a wide range of Oracle applications.

Oracle Database Managed Services

Datavail’s business focuses on helping you use your data to drive business results through cost-saving services. The success of your business depends on how well you understand and manage your data. Our Oracle managed cloud services give you the power to unleash your organization’s potential. We provide comprehensive and technically advanced support for Oracle installations to ensure that your databases are safe, secure, and managed with the utmost level of care.

Our delivery performance in data management leads the industry. We offer highly trained Oracle database administrators via a 24×7, always on, always available, global delivery model. Datavail’s flexible and client focused services always add value to your organization. Our Oracle database managed services and products include:

Skills Required

  • 5-8 years of experience in a reliability or operations role
  • Cloud-agnostic certification: Terraform Associate, Certified Kubernetes Administrator (CKA), or SRE Foundation
  • Cloud provider professional-level certification: AWS Solutions Architect (Professional) or Azure Solutions Architect Expert or GCP Professional Cloud Architect or Oracle Cloud Architect Professional
  • Solid coding skills (Python, Go, or equivalent)
  • Experience with Infrastructure as Code (IaC) such as Terraform
  • Experience building and maintaining CI/CD pipelines, canary, blue/green deployments
  • Experience with monitoring and observability stacks (Prometheus, Grafana, OpenTelemetry, ELK, Jaeger)
  • Design, deploy, and operate containerized workloads using Docker and Kubernetes in production
  • Experience working in distributed systems and production-scale services
  • Ability to define and maintain SLIs/SLOs and lead incident response and postmortems
  • Exposure to multi-cloud data replication or cross-cloud networks
  • Experience with chaos engineering or fault injection
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Broomfield, CO
263 Employees
Year Founded: 2007

What We Do

A premiere data services company serving clients in North America, Datavail has 1,000 data professionals, data engineers, developers, project managers, consultants, and business experts, supported by industry-leading automation and intellectual property. For more than 17 years, Datavail has worked with thousands of companies spanning different industries and sizes. At Datavail, we look for more than smarts, experience and proficiency. On top of those requirements, we seek people who mesh with our corporate values. We seek brilliance without bravado and know-how without a know-it-all attitude. We hold low ego in high regard, embrace problem-solving as a passion and welcome every day as a new opportunity to learn. We’re flexible and hard working. We’re committed to our clients and colleagues. We help our people grow so they can help our clients grow. That makes us grow so we can help even more customers leverage organizational data for business value. Our Core Values: 1. We desire to serve. 2. We embody flexibility for availability 3. We exemplify low ego. 4. We work hard. 5. We strive for continuous improvement. 6. We are growth-oriented.

Similar Jobs

Circle (circle.so) Logo Circle (circle.so)

Senior Site Reliability Engineer

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
130K-140K Annually

Enumerate Logo Enumerate

Senior Site Reliability Engineer

Professional Services • Software
Remote
11 Locations
120 Employees
4K-5K Annually

Cobre Logo Cobre

Site Reliability Engineer

Fintech • Financial Services
Remote
Colombia
170 Employees

MAS Global Consulting Logo MAS Global Consulting

Senior Site Reliability Engineer

Information Technology • Analytics
Remote
Colombia
166 Employees

Similar Companies Hiring

Apollo.io Thumbnail
Software • Sales • Productivity • Information Technology • Enterprise Web • Database • Artificial Intelligence
US
850 Employees
Perchwell Thumbnail
Mobile • Real Estate • Software • Database • Analytics
New York City, NY
60 Employees
Jellyfish Thumbnail
Big Data • Cloud • Productivity • Software • Database • Analytics • Automation
Boston, MA
225 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account