REF Digital Jobs

Staff Site Reliability Engineer (SRE) – Engineering Tools | Bengaluru, India | On-Site

REF Digital

Staff Site Reliability Engineer (SRE) – Engineering Tools | Bengaluru, India | On-Site

Posted 23 Days Ago

Be an Early Applicant

Bengaluru, Bengaluru Urban, Karnataka, IND

In-Office

Senior level

Agency • eCommerce • Design • SEO

The Role

Lead reliability, scalability, and performance of engineering platforms; build automation, IaC, monitoring/observability, and self-healing systems; troubleshoot Linux systems; develop tooling in Python/Golang/Bash; support CI/CD and developer platforms; mentor junior SREs and participate in on-call rotation.

Summary Generated by Built In

Location: Bengaluru, India (On-site mandatory)
Employment Type: Full-time
Industry: AI / Autonomous Systems / Advanced Engineering Infrastructure
Start Date: ASAP

About the Role

We are seeking a Staff Site Reliability Engineer (SRE) – Engineering Tools to support and scale critical engineering platforms that power advanced AI, machine learning, simulation, and autonomous technology development.

This is a senior-level role operating at the intersection of reliability engineering, internal developer platforms, tooling automation, and large-scale infrastructure performance. You will play a key role in ensuring that engineering teams have highly reliable, scalable, and secure systems to accelerate innovation.

Key Responsibilities

Own reliability, scalability, and performance of engineering tools and infrastructure platforms
Design and implement automation frameworks for system deployment and configuration management
Improve observability, monitoring, and self-healing capabilities across engineering environments
Troubleshoot complex Linux-based systems and optimize performance
Develop automation and internal tooling using Python, Golang, or Bash
Implement Infrastructure-as-Code best practices
Strengthen security posture across engineering systems
Partner with cross-functional teams to streamline development workflows
Participate in on-call rotation for critical systems

Required Profile

Strong expertise in Linux systems administration and performance optimization
Experience with distributed systems and large-scale infrastructure environments
Proficiency in Python, Golang, and/or Bash scripting
Hands-on experience with configuration management tools (e.g., Ansible)
Experience with monitoring and observability platforms (Prometheus, Grafana, Splunk, etc.)
Familiarity with container orchestration technologies such as Kubernetes
Experience supporting developer platforms, CI/CD tooling, or internal engineering systems
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience)
Significant experience in Site Reliability Engineering, DevOps, or platform engineering (Staff-level seniority)

What Makes This Role Senior / Staff Level

Ownership of mission-critical engineering systems
Architectural input into reliability and scalability strategy
Mentorship of junior SREs and platform engineers
Direct impact on high-scale AI and autonomous development environments

What’s on Offer

Opportunity to work on cutting-edge engineering infrastructure
High-impact role supporting AI and advanced technology platforms
Collaborative, engineering-driven culture
Competitive compensation and long-term career growth

Skills Required

Strong expertise in Linux systems administration and performance optimization
Experience with distributed systems and large-scale infrastructure environments
Proficiency in Python, Golang, and/or Bash scripting
Hands-on experience with configuration management tools (e.g., Ansible)
Experience with monitoring and observability platforms (Prometheus, Grafana, Splunk)
Familiarity with container orchestration technologies such as Kubernetes
Experience supporting developer platforms, CI/CD tooling, or internal engineering systems
Implement Infrastructure-as-Code best practices
Significant experience in Site Reliability Engineering, DevOps, or platform engineering (Staff-level seniority)
Bachelor's degree in Computer Science, Engineering, or related field (or equivalent practical experience)

View all jobs at REF Digital

View REF Digital Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

What We Do

REF Digital is a Montreal-based digital agency that helps businesses thrive in a digital-first economy. They specialize in designing and engineering bespoke e-commerce platforms, apps, and digital experiences engineered for lasting impact. Formed from the digital team of Groupe LG2, the agency combines strategy, technology, and design to help brands navigate the digital economy and propel their online presence to a new level.