Senior Specialist - Cloud SRE - Azure, AKS & DevOps

Reposted 5 Days Ago
Be an Early Applicant
Mumbai, Maharashtra, IND
Hybrid
Senior level
Database
The Role
Lead reliability engineering and managed services for Azure environments, focusing on availability, automation, incident management, and cloud transformation while mentoring teams.
Summary Generated by Built In

Job Title: Senior Specialist (SRE) - Azure, AKS & DevOps

Education: Any Graduate

Experience: 8 to 15 years

Location: Mumbai

 

Key Skills:

 

We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure, AKS, DevOps, Automation, and Enterprise Operations to lead reliability engineering and managed services delivery for production cloud environments.

This role focuses on ensuring 24x7 availability, performance, security, patch compliance, scalability, and automation across Azure-first environments with exposure to AWS/GCP.

You will work closely with customers, internal engineering teams, and leadership to drive cloud transformation, implement SRE best practices, modernize DevOps delivery pipelines, and improve measurable service outcomes.

 

Role Overview

We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure, AKS, DevOps, Automation, and Enterprise Operations to lead reliability engineering and managed services delivery for production cloud environments.

This role focuses on ensuring 24x7 availability, performance, security, patch compliance, scalability, and automation across Azure-first environments with exposure to AWS/GCP.

You will work closely with customers, internal engineering teams, and leadership to drive cloud transformation, implement SRE best practices, modernize DevOps delivery pipelines, and improve measurable service outcomes.

 

Primary Responsibilities

Reliability Engineering & SRE Practices

  • Define and manage SLIs, SLOs, Error Budgets, MTTR, change failure rate, and availability targets. 

  • Continuously improve platform reliability, scalability, resilience, and operational maturity. 

  • Lead Sev-1 / Sev-2 incident management, escalation handling, and RCA reviews. 

  • Conduct blameless postmortems and drive preventive actions. 

  • Build operational runbooks, self-healing automation, and on-call processes. 

  • Participate in architecture reviews for HA, DR, failover, and performance optimization. 

 

Azure Cloud Operations & Engineering

  • Manage enterprise Azure environments including: 

  • Azure Virtual Machines 

  • VM Scale Sets 

  • Azure App Services 

  • Azure Functions 

  • Azure SQL / Managed Instance 

  • Azure Storage 

  • Virtual Networks / NSGs 

  • Application Gateway / WAF 

  • Azure Front Door 

  • Load Balancers 

  • Azure Backup & Site Recovery 

  • Implement Azure Well-Architected Framework best practices. 

  • Drive governance using Management Groups, Policy, RBAC, Key Vault, Defender for Cloud. 

  • Optimize cost using Reserved Instances, rightsizing, budgets, and tagging strategy. 

 

AKS & Container Platform Engineering

  • Design, manage, and optimize Microsoft Azure Kubernetes Service (AKS) clusters. 

  • Manage cluster upgrades, autoscaling, node pools, ingress controllers, storage classes, and security policies. 

  • Support container deployments using Helm, YAML manifests, GitOps workflows. 

  • Improve AKS observability using Prometheus, Grafana, Azure Monitor for Containers. 

  • Ensure platform reliability for microservices workloads. 

 

DevOps, CI/CD & Automation

  • Build and manage CI/CD pipelines using Azure DevOps, GitHub Actions, Jenkins, or GitLab CI. 

  • Implement blue/green, rolling, and canary deployments with rollback strategies. 

  • Automate infrastructure using Terraform, ARM Templates, and Bicep. 

  • Develop scripts/tools using PowerShell, Bash, Python, Go. 

  • Automate patching, backup validation, scaling, compliance checks, and recovery tasks. 

  • Reduce manual operational toil through self-service automation. 

 

Patching, Security & Compliance

  • Own enterprise patch management for Windows/Linux workloads using Azure Update Manager. 

  • Manage maintenance windows and zero-downtime patch strategies. 

  • Implement CIS benchmark, vulnerability remediation, and audit compliance controls. 

  • Secure workloads with Key Vault, Private Link, NSGs, Conditional Access, PIM, Defender. 

  • Support hybrid environments using Azure Arc-enabled servers. 

 

Observability & Monitoring

  • Build and maintain monitoring platforms using: 

  • Azure Monitor 

  • Log Analytics 

  • Application Insights 

  • Grafana 

  • Datadog 

  • New Relic 

  • Prometheus 

  • Build executive dashboards, SRE scorecards, SLA reports, capacity trends. 

  • Tune alerts to reduce noise and improve actionable detection. 

 

Customer Engagement & Leadership

  • Serve as primary technical contact for enterprise customers. 

  • Present monthly service reviews, patch compliance, reliability metrics, and improvement plans. 

  • Mentor L1/L2 engineers and guide technical escalations. 

  • Collaborate with customer architects, security teams, and developers. 

  • Lead cloud modernization and operational excellence initiatives. 

 

Required Qualifications

Experience

  • 8 -10 years in SRE, DevOps, Cloud Engineering, or Production Operations. 

  • Minimum 5+ years hands-on with Microsoft Azure production environments. 

  • Proven experience managing critical enterprise workloads. 

  • Strong customer-facing / managed services background preferred. 

 

Technical Skills

Azure

  • Deep expertise in Azure compute, networking, storage, identity, monitoring, backup, DR. 

  • Strong hands-on with AKS, Azure DevOps, Azure Policy, Key Vault. 

  • DevOps / Automation

  • Terraform, ARM, Bicep, CI/CD pipelines. 

  • PowerShell, Bash, Python scripting. 

  • Containers

  • Kubernetes, Docker, AKS operations. 

  • Monitoring

  • Azure Monitor, Grafana, Datadog, Prometheus, Log Analytics. 

  • Operations

  • Incident management, RCA, patching, performance tuning, DR drills. 

 

Preferred Certifications

  • Microsoft AZ-104 

  • AZ-305 

  • AZ-400 

  • AZ-500 

  • Amazon Web Services Associate / Professional 

  • CKA / Terraform Associate / SRE Foundation 

 

Nice to Have

  • Multi-cloud (AWS / GCP) experience 

  • Chaos engineering 

  • FinOps knowledge 

  • MSP / Managed Services experience 

  • Large-scale enterprise operations 

  • Security / Compliance frameworks (ISO 27001, SOC2, HIPAA, PCI)

 

About UsDatavail is a leading provider of data management, application development, analytics, and cloud services, with more than 1,000 professionals helping clients build and manage applications and data via a world-class tech-enabled delivery platform and software solutions across all leading technologies. For more than 17 years, Datavail has worked with thousands of companies spanning different industries and sizes, and is an AWS Advanced Tier Consulting Partner, a Microsoft Solutions Partner for Data & AI and Digital & App Innovation (Azure), an Oracle Partner, and a MySQL Partner. About the Team
Datavail’s Team of Cloud Experts Can Save You Time and Money
Our Cloud experts are capable to overcome every obstacle in helping clients manage everything from databases, analytics, reporting, migrations, and upgrades to monitoring and overall data management.
You can free up your IT resources to focus on growing your business rather than fighting fires. Our Cloud experts can guide you through strategic initiatives or support routine database management.
Cloud Managed Services
Datavail’s business focuses on helping you use your data to drive business results through cost-saving services. The success of your business depends on how well you understand and manage your data. Our managed cloud services give you the power to unleash your organization’s potential. We provide comprehensive and technically advanced support for Cloud Operation to ensure that your infrastructure is safe, secure, and managed with the utmost level of care.
Our delivery performance in data management leads the industry. We offer highly trained Cloud administrators via a 24×7, always on, always available, global delivery model.
With the combination of a proven delivery model and top-notch experience ensures that Datavail will remain the Cloud experts on demand you desire. Datavail’s flexible and client focused services always add value to your organization.

Skills Required

  • 8 - 10 years in SRE, DevOps, Cloud Engineering, or Production Operations
  • Minimum 5+ years hands-on with Microsoft Azure production environments
  • Proven experience managing critical enterprise workloads
  • Strong customer-facing / managed services background preferred
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Broomfield, CO
263 Employees
Year Founded: 2007

What We Do

A premiere data services company serving clients in North America, Datavail has 1,000 data professionals, data engineers, developers, project managers, consultants, and business experts, supported by industry-leading automation and intellectual property. For more than 17 years, Datavail has worked with thousands of companies spanning different industries and sizes. At Datavail, we look for more than smarts, experience and proficiency. On top of those requirements, we seek people who mesh with our corporate values. We seek brilliance without bravado and know-how without a know-it-all attitude. We hold low ego in high regard, embrace problem-solving as a passion and welcome every day as a new opportunity to learn. We’re flexible and hard working. We’re committed to our clients and colleagues. We help our people grow so they can help our clients grow. That makes us grow so we can help even more customers leverage organizational data for business value. Our Core Values: 1. We desire to serve. 2. We embody flexibility for availability 3. We exemplify low ego. 4. We work hard. 5. We strive for continuous improvement. 6. We are growth-oriented.

Similar Jobs

MongoDB Logo MongoDB

Senior Engineer

Big Data • Cloud • Software • Database
Easy Apply
Remote or Hybrid
India
5550 Employees

BlackRock Logo BlackRock

Designer

Fintech • Information Technology • Financial Services
In-Office
Mumbai, Maharashtra, IND
25000 Employees

BlackRock Logo BlackRock

Platform Engineer

Fintech • Information Technology • Financial Services
In-Office
Mumbai, Maharashtra, IND
25000 Employees

Capco Logo Capco

Data Architect

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Hybrid
Pune, Maharashtra, IND
6000 Employees

Similar Companies Hiring

Apollo.io Thumbnail
Software • Sales • Productivity • Information Technology • Enterprise Web • Database • Artificial Intelligence
US
850 Employees
Perchwell Thumbnail
Mobile • Real Estate • Software • Database • Analytics
New York City, NY
60 Employees
Jellyfish Thumbnail
Big Data • Cloud • Productivity • Software • Database • Analytics • Automation
Boston, MA
225 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account