Senior Site Reliability Engineer (SRE)

Posted 17 Days Ago
Be an Early Applicant
Makkah, SAU
In-Office
Senior level
Cloud • eCommerce • Information Technology • Software
The Role
As a Senior Site Reliability Engineer, you will lead incident management, improve platform performance, and enhance system resilience. Responsibilities include managing high-severity incidents, conducting performance analysis, and mentoring engineers on best practices.
Summary Generated by Built In

As a Senior SRE at Salla, you will lead reliability initiatives, handle complex incidents, improve platform performance, and guide engineering teams toward building resilient systems. You will also participate in the on-call rotation as part of our commitment to platform reliability.

Reliability & Incident Management

  • Lead high-severity incident response and drive post-incident reviews.
  • Troubleshoot complex issues across applications, infrastructure, and networks.
  • Improve MTTR through better monitoring, alerts, and diagnostic tooling.
  • Participate in the on-call rotation supporting production systems.

Performance & Scalability

  • Identify and resolve performance bottlenecks and scaling challenges.
  • Conduct load testing and capacity planning for high-traffic scenarios.

Infrastructure & Operations

  • Enhance cloud-native infrastructure, deployment processes, and automation.
  • Improve resilience, fault-tolerance, and recovery mechanisms across systems.

Observability

  • Build and refine dashboards, alerts, metrics, logs, and traces.
  • Define SLIs/SLOs and improve visibility into system behavior.

Tooling & Automation

  • Develop tools that reduce operational toil and increase reliability.
  • Contribute to infrastructure-as-code, CI/CD pipelines, and GitOps workflows.

Collaboration

  • Work closely with engineering teams to ensure services are robust and production-ready.
  • Mentor engineers on reliability, debugging, and operational best practices.

Bonus Skills

  • Background in large-scale, high-traffic systems.
  • Experience with fault-tolerant design, DR, and HA patterns.
  • Familiarity with SLOs, SLIs, and error budgets.

Location Preference

  • Candidates located within GMT 0 to +6 time zones are preferred to align with team collaboration and on-call coverage.

Requirements
  • Strong experience with Kubernetes, service mesh technologies, and cloud platforms (AWS, GCP, or Azure).
  • Deep understanding of Linux, networking, distributed systems, and load balancing.
  • Hands-on experience with Terraform or similar Infrastructure-as-Code tools.
  • Experience with observability platforms such as Prometheus, Grafana, Loki, Mimir, Elastic, or equivalent.
  • Proficiency in scripting or programming languages such as Bash, Python, or Go.
  • Experience with CI/CD pipelines and GitOps practices.
  • Strong debugging, incident response, and performance analysis skills.

Skills Required

  • Strong experience with Kubernetes, service mesh technologies, and cloud platforms (AWS, GCP, or Azure)
  • Deep understanding of Linux, networking, distributed systems, and load balancing
  • Hands-on experience with Terraform or similar Infrastructure-as-Code tools
  • Experience with observability platforms such as Prometheus, Grafana, Loki, Mimir, Elastic, or equivalent
  • Proficiency in scripting or programming languages such as Bash, Python, or Go
  • Experience with CI/CD pipelines and GitOps practices
  • Strong debugging, incident response, and performance analysis skills
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
0 Employees
Year Founded: 2016

What We Do

Salla is the leading commerce platform in the GCC, built in Saudi Arabia, providing tools and services for merchants to build, run, and grow their online stores.

Similar Jobs

Capco Logo Capco

Scrum Master

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Remote or Hybrid
10 Locations
6000 Employees

Capco Logo Capco

Capital Markets - BA- Arabic Speaker - Riyadh

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Remote or Hybrid
10 Locations
6000 Employees

CrowdStrike Logo CrowdStrike

Regional Sales Manager

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
Saudi Arabia
10000 Employees

Capco Logo Capco

Information Technology Business Analyst

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Remote or Hybrid
10 Locations
6000 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account