Senior Manager, Site Reliability Engineering (SRE)

Sorry, this job was removed at 06:24 a.m. (CST) on Monday, Apr 13, 2026
Be an Early Applicant
Bangalore, Bengaluru Urban, Karnataka, IND
In-Office
Information Technology • Security • Software
The Role

At SolarWinds, we’re a people-first company. Our purpose is to enrich the lives of the people we serve—including our employees, customers, shareholders, partners, and communities. Join us in our mission to help customers accelerate business transformation with simple, powerful, and secure solutions.

The ideal candidate thrives in an innovative, fast-paced environment and is collaborative, accountable, ready, and empathetic. We’re looking for individuals who believe they can accomplish more as a team and create lasting growth for themselves and others. We hire based on attitude, competency, and commitment. Solarians are ready to advance our world-class solutions in a fast-paced environment and accept the challenge to lead with purpose. If you’re looking to build your career with an exceptional team, you’ve come to the right place. Join SolarWinds and grow with us!

Role Overview:

SolarWinds is looking for a Senior Manager, Site Reliability Engineering (SRE) to lead reliability, scalability, and operational excellence for large-scale, cloud-native, data-intensive SaaS platforms.

This role combines people leadership, technical depth, and operational ownership. You will manage and grow SRE teams responsible for production systems while remaining close to platform architecture, reliability engineering, incident response, and automation strategy.

The ideal candidate has operated distributed systems in production environments and is comfortable guiding teams through complex troubleshooting, reliability improvements, and architectural decisions. This role requires balancing availability, performance, operational efficiency, and engineering velocity across large-scale SaaS services.

Responsibilities:
  • Lead and mentor SRE teams responsible for the reliability, availability, and performance of production SaaS platforms

  • Own and drive production reliability outcomes, including uptime, latency, scalability, capacity planning, and operational readiness

  • Oversee data-intensive distributed systems, including technologies such as ClickHouse, Kafka, ZooKeeper, MySQL, Redis, and Flink

  • Guide and review Kubernetes platform operations at scale, including cluster lifecycle management, upgrades, troubleshooting, and capacity planning

  • Establish and evolve SRE practices, including SLIs/SLOs, alerting strategies, incident management, and post-incident reviews

  • Lead and participate in production incident response, guiding teams through debugging, root cause analysis, and long-term remediation

  • Promote and enforce an automation-first approach, reducing manual operational work through scripting, tooling, and platform improvements

  • Partner with Engineering, Platform, Product, and Security teams to embed reliability into system design and delivery

  • Drive adoption of GitOps, service mesh, and observability practices across teams

  • Lead cloud infrastructure operations across AWS and Azure, ensuring secure, resilient, and cost-effective platform operations

  • Provide technical mentorship and guidance, helping engineers diagnose complex production issues and improve system reliability

Must Have Qualifications
  • Proven experience leading SRE, Platform, or Infrastructure teams supporting production, customer-facing SaaS systems

  • Strong hands-on experience operating Kubernetes clusters in production environments, including:

    • Cluster lifecycle management and upgrades

    • Troubleshooting platform and workload issues

    • Autoscaling and resilience mechanisms (HPA, VPA, KEDA, Cluster Autoscaler, Pod Disruption Budgets)

    • Observability and monitoring (Prometheus, Grafana)

  • Experience operating distributed data platforms in production environments, such as ClickHouse, Kafka, ZooKeeper, MySQL, Redis, or Flink

  • Practical experience with GitOps and service mesh technologies (e.g., Flux, Kustomize, Istio)

  • Strong automation mindset with hands-on experience using Python and/or Go to reduce operational overhead and improve reliability

  • Extensive experience working with AWS and Azure managed services, including EKS/AKS, Aurora, ElastiCache, storage services, load balancers, VPC, and KMS

  • Demonstrated ownership of incident response, root cause analysis, and long-term reliability improvements

  • Ability to collaborate effectively with engineering leadership and cross-functional teams

 

SolarWinds is an Equal Employment Opportunity Employer. SolarWinds will consider all qualified applicants for employment without regard to race, color, religion, sex, age, national origin, sexual orientation, gender identity, marital status, disability, veteran status or any other characteristic protected by law.

All applications are treated in accordance with the SolarWinds Privacy Notice: https://www.solarwinds.com/applicant-privacy-notice

Similar Jobs

Rubrik Logo Rubrik

Senior Engineering Manager

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Cybersecurity • Data Privacy
In-Office
Bangalore, Bengaluru Urban, Karnataka, IND
3000 Employees

Applied Systems Logo Applied Systems

Site Reliability Engineer

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
3040 Employees

Applied Systems Logo Applied Systems

IT Administrator

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
3040 Employees

Applied Systems Logo Applied Systems

Site Reliability Engineer

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
3040 Employees
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Austin, TX
2,299 Employees
Year Founded: 1999

What We Do

SolarWinds is a leading provider of powerful and affordable IT management software. Our products give organizations worldwide—regardless of type, size, or complexity—the power to monitor and manage their IT services, infrastructures, and applications; whether on-premises, in the cloud, or via hybrid models. We continuously engage with technology professionals—IT service and operations professionals, DevOps professionals, and managed services providers (MSPs)—to understand the challenges they face in maintaining high-performing and highly available IT infrastructures and applications. The insights we gain from them, in places like our THWACK® community, allow us to solve well-understood IT management challenges in the ways technology professionals want them solved. Our focus on the user and commitment to excellence in end-to-end hybrid IT management has established SolarWinds as a worldwide leader in solutions for network and IT service management, application performance, and managed services.

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account