Site Reliability Engineer

Posted 4 Days Ago
Be an Early Applicant
Metán, Salta
In-Office
Mid level
eCommerce • Food • Retail
The Role
The Site Reliability Engineer will ensure system reliability and performance, manage incidents, design monitoring systems, and automate processes in a cloud environment.
Summary Generated by Built In

The Company

Toters is an on-demand e-commerce and delivery platform and operates a service that enables customers to get anything in their city at the highest level of convenience.

At Toters, technology is at the heart of everything we do. We have product teams that are working hard every day to create products that make our customers' lives easier. Our engineers are also continuously creating solutions to make our processes more efficient, all in an effort to get to our customers fast and at the best cost. If you are interested in working in a high growth startup environment, and look to be part of a team that will potentially change the way customers shop in the Middle East, apply now.

About the Role

We are looking for a Mid-Level Site Reliability Engineer who will play a critical role in ensuring high availability, performance, and resilience across our production systems. You will be at the heart of operational excellence, leading high-impact incident responses, building proactive monitoring systems, and engineering automation that prevents outages before they happen. If you love solving complex distributed system challenges and thrive in high-pressure environments, this role is for you.

Key Responsibilities

Incident Management & Reliability

  • Act as Incident Commander during major outages, leading real-time diagnosis, communication, and recovery.
  • Own and improve the end-to-end incident management lifecycle, including post-incident reviews and action plans.
  • Drive root cause analysis and proactive reliability improvements to prevent recurrence.

Monitoring & Observability

  • Design and maintain metrics, alerts, and dashboards using Prometheus, Grafana, and New Relic.
  • Implement SLIs/SLOs to monitor service health and drive availability targets (99.99%+ uptime).
  • Integrate log management and distributed tracing with tools like ELK Stack and AWS X-Ray.

Automation & Tooling

  • Develop automation scripts and internal tooling in Python or Node.js to reduce manual ops and accelerate recovery (MTTR improvement).
  • Build self-healing infrastructure using IaC and automation pipelines.
  • Optimize on-call workflows, escalation policies, and runbooks using PagerDuty.

Cloud Infrastructure

  • Operate and improve infrastructure hosted on AWS, ensuring reliability, cost efficiency, and scalability.
  • Collaborate with backend and platform teams to embed SRE best practices across engineering.


Key Qualifications

  • 2–4 years of experience in Site Reliability Engineering, DevOps, or Platform Engineering.
  • Proven success managing production incidents and participating in on-call rotations.
  • Strong hands-on experience with Prometheus, Grafana, and PagerDuty.
  • Proficient in Python or Node.js for automation and tooling.
  • Experience with AWS services (EC2, CloudWatch, ECS/Lambda, IAM, etc.).
  • Solid understanding of Linux systems, networking, and CI/CD pipelines.

Nice to Have

  • Experience as Incident Commander in mission-critical environments.
  • Knowledge of New Relic, Sentry, ELK Stack, or Datadog.
  • Background implementing SLIs/SLOs/Error Budgets (Google SRE model).
  • Familiarity with Docker, Kubernetes, Terraform, or Ansible.
  • Certifications such as:
    • AWS Solutions Architect Associate/DevOps Engineer
    • ITIL Foundation or relevant reliability certifications.

Top Skills

Ansible
AWS
Aws X-Ray
Ci/Cd
Docker
Elk Stack
Grafana
Kubernetes
Node.js
Pagerduty
Prometheus
Python
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Beirut
744 Employees
Year Founded: 2015

What We Do

Enabling last-mile same day delivery of any local product near you. Available for iPhone and Android, the Toters service connects customers with retailers, local couriers, who purchase and deliver goods from any grocery store, restaurant, or other retail shop in your city.

Download the Toters app today or visit www.totersapp.com

Similar Jobs

toters delivery Logo toters delivery

Office Assistant

eCommerce • Food • Retail
In-Office
Metán, Salta, ARG

toters delivery Logo toters delivery

Machine Learning Engineer

eCommerce • Food • Retail
In-Office
Metán, Salta, ARG

toters delivery Logo toters delivery

Senior Back-end Engineer

eCommerce • Food • Retail
In-Office
Metán, Salta, ARG

toters delivery Logo toters delivery

Controller

eCommerce • Food • Retail
In-Office
Metán, Salta, ARG

Similar Companies Hiring

ClickMint Thumbnail
Marketing Tech • Generative AI • eCommerce • AdTech
Malibu, CA
7 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Retail
US

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account