Staff Site Reliability Engineer (SRE) – Engineering Tools | Bengaluru, India | On-Site

Posted Yesterday
Be an Early Applicant
Bengaluru, Bengaluru Urban, Karnataka, IND
In-Office
Senior level
Agency • eCommerce • Design • SEO
The Role
Lead reliability, scalability, and performance of engineering platforms; build automation, IaC, monitoring/observability, and self-healing systems; troubleshoot Linux systems; develop tooling in Python/Golang/Bash; support CI/CD and developer platforms; mentor junior SREs and participate in on-call rotation.
Summary Generated by Built In

Location: Bengaluru, India (On-site mandatory)
Employment Type: Full-time
Industry: AI / Autonomous Systems / Advanced Engineering Infrastructure
Start Date: ASAP

About the Role

We are seeking a Staff Site Reliability Engineer (SRE) – Engineering Tools to support and scale critical engineering platforms that power advanced AI, machine learning, simulation, and autonomous technology development.

This is a senior-level role operating at the intersection of reliability engineering, internal developer platforms, tooling automation, and large-scale infrastructure performance. You will play a key role in ensuring that engineering teams have highly reliable, scalable, and secure systems to accelerate innovation.

Key Responsibilities
  • Own reliability, scalability, and performance of engineering tools and infrastructure platforms

  • Design and implement automation frameworks for system deployment and configuration management

  • Improve observability, monitoring, and self-healing capabilities across engineering environments

  • Troubleshoot complex Linux-based systems and optimize performance

  • Develop automation and internal tooling using Python, Golang, or Bash

  • Implement Infrastructure-as-Code best practices

  • Strengthen security posture across engineering systems

  • Partner with cross-functional teams to streamline development workflows

  • Participate in on-call rotation for critical systems

Required Profile
  • Strong expertise in Linux systems administration and performance optimization

  • Experience with distributed systems and large-scale infrastructure environments

  • Proficiency in Python, Golang, and/or Bash scripting

  • Hands-on experience with configuration management tools (e.g., Ansible)

  • Experience with monitoring and observability platforms (Prometheus, Grafana, Splunk, etc.)

  • Familiarity with container orchestration technologies such as Kubernetes

  • Experience supporting developer platforms, CI/CD tooling, or internal engineering systems

  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience)

  • Significant experience in Site Reliability Engineering, DevOps, or platform engineering (Staff-level seniority)

What Makes This Role Senior / Staff Level
  • Ownership of mission-critical engineering systems

  • Architectural input into reliability and scalability strategy

  • Mentorship of junior SREs and platform engineers

  • Direct impact on high-scale AI and autonomous development environments

What’s on Offer
  • Opportunity to work on cutting-edge engineering infrastructure

  • High-impact role supporting AI and advanced technology platforms

  • Collaborative, engineering-driven culture

  • Competitive compensation and long-term career growth

Skills Required

  • Strong expertise in Linux systems administration and performance optimization
  • Experience with distributed systems and large-scale infrastructure environments
  • Proficiency in Python, Golang, and/or Bash scripting
  • Hands-on experience with configuration management tools (e.g., Ansible)
  • Experience with monitoring and observability platforms (Prometheus, Grafana, Splunk)
  • Familiarity with container orchestration technologies such as Kubernetes
  • Experience supporting developer platforms, CI/CD tooling, or internal engineering systems
  • Implement Infrastructure-as-Code best practices
  • Significant experience in Site Reliability Engineering, DevOps, or platform engineering (Staff-level seniority)
  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent practical experience)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
0 Employees

What We Do

REF Digital is a Montreal-based digital agency that helps businesses thrive in a digital-first economy. They specialize in designing and engineering bespoke e-commerce platforms, apps, and digital experiences engineered for lasting impact. Formed from the digital team of Groupe LG2, the agency combines strategy, technology, and design to help brands navigate the digital economy and propel their online presence to a new level.

Similar Jobs

Akamai Technologies Logo Akamai Technologies

Senior Site Reliability Engineer

Cloud • Security • Software • Cybersecurity
In-Office or Remote
2 Locations
10285 Employees
In-Office
Bengaluru, Bengaluru Urban, Karnataka, IND
576 Employees

Flexera Logo Flexera

Site Reliability Engineer

Big Data • Cloud • Information Technology • Software • Business Intelligence • Cybersecurity
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
2000 Employees
In-Office
Bengaluru, Karnataka, IND
200 Employees

Similar Companies Hiring

PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Artificial Intelligence • eCommerce • Fintech • Payments • Retail • Software • Analytics
US
35 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account