Senior Manager System Reliability Engineering

Reposted 13 Days Ago
Be an Early Applicant
Hyderabad, Telangana, IND
In-Office
Expert/Leader
Energy • Manufacturing • Solar • Renewable Energy
GE Vernova is accelerating the path to more reliable, affordable, and sustainable energy.
The Role
The Senior Manager of System Reliability Engineering oversees production stability for GridOS SaaS products, managing cloud infrastructure, deployment strategies, and incident response while driving a culture of reliability and excellence.
Summary Generated by Built In
Job Description SummaryAs the Senior Manager of System Reliability Engineering for the GridOS SaaS Products, you will serve as a strategic organizational change agent and the ultimate authority on production stability for our global grid software SaaS portfolio. You will bridge the gap between architectural design and real-world operations, navigating organizational friction to drive a culture of high reliability and engineering excellence. You are the final "Gatekeeper" for production environments, owning the Change Management process and the authority to approve or halt deployments based on system health and responsible for meeting our SLA/SLO targets for critical infrastructure applications.

Job Description

Roles and Responsibilities

Primary Responsibilities

Day 0: Strategic Provisioning & Design

  • Standardize Cloud Infra Provisioning: Lead initiatives to standardize and secure cloud infrastructure by extreme automation and accelerate customer onboarding to our SaaS platform
  • The "Golden Path": Define and architect the standardized "Middle-Mile" software delivery platform (IDP) using Backstage, ArgoCD, and GitHub Actions to eliminate bespoke "unicorn" deployment methodologies.
  • Follow-the-Sun Architecture: Design and manage the global handover protocols and 24/7 operational coverage model across US, Europe, and Asia time zones to ensure seamless support without graveyard shifts.
  • Reliability Targets: Establish enterprise-wide Service Level Objectives (SLOs) and Service Level Indicators (SLIs) aligned with critical user journeys for global utility customers.

Day 1: Release Governance & Deployment

  • Final Approval Authority: Act as the ultimate authority for all production releases, enforcing rigorous change control and validating that all security and performance "quality gates" are met.
  • Progressive Delivery: Oversee the implementation of advanced deployment strategies, including Canary and Blue/Green rollouts, ensuring automated rollback capabilities are verified.
  • SRE Center for Enablement (C4E): Build and mature the C4E to provide coaching, standardized templates, and repeatable patterns that uplift reliability practices across all product teams.

Day 2: Operational Excellence & Optimization

  • Incident Command: Serve as the Lead Incident Commander for high-severity (Sev1/Sev2) events, leading the technical direction, communication, and containment efforts.
  • Blameless Culture: Own the post-incident lifecycle, facilitating blameless Root Cause Analysis (RCA) to ensure systemic fixes replace recurring operational risks.
  • Business Continuity: Manage and validate end-to-end Backup and Disaster Recovery (DR) strategies, including cross-region failover and automated recovery testing.

FinOps & Capacity: Lead financial operations for the SaaS platform, optimizing cloud cost and performing long-term capacity planning based on customer growth.

Required Qualifications

Technical Qualifications

  • Cloud Ecosystem: Deep expertise in AWS core services (EC2, EKS, RDS, S3, IAM) and management tools (CloudTrail, CloudWatch).
  • Orchestration: Advanced mastery of Kubernetes internals and EKS clusters across multi-region architectures.
  • Continuous Delivery: Expert knowledge of ArgoCD, GitHub Actions, and GitOps-first workflows.
  • Automation: Proficiency in Infrastructure as Code (IaC) using Terraform and configuration management via Ansible.
  • Observability: Hands-on experience with Prometheus, Grafana, observability platforms like Splunk or Datadog and OpenTelemetry standard to build comprehensive telemetry pipelines.

Experience & Leadership Qualifications

  • Overall experience above 14 years.
  • Senior Leadership: Minimum 8–10 years of experience in SRE, Platform Engineering, or Production Support for large-scale, distributed SaaS applications.
  • Organizational Influence: Proven ability to navigate organizational friction, influence VP-level stakeholders, and drive quality standards across disparate engineering teams.
  • Mentorship: Experience managing and mentoring a global team of SREs, fostering a culture of high performance and continuous learning.
  • Operational Discipline: Exceptional troubleshooting skills under pressure and a "Fire Marshal" mindset toward investigation and proactive inspection.

Educational Qualifications

  • Masters degree in STEM discipline OR
  • 4 yrs Bachelors degree in STEM discipline + >10 yrs of relevant experience

Desired Characteristics

  • Regulated Environments: Practical knowledge of NERC CIP, SOC2, ISO 27001, or IEC 62443 compliance standards in a SaaS context.
  • Certifications: AWS Certification DevOps Engineer – Professional, AWS Solution Architect - Associate/Professional, CKA (Certified Kubernetes Administrator), or SRE Practitioner Certification.

Additional Information

Relocation Assistance Provided: Yes

Skills Required

  • Deep expertise in AWS core services and management tools
  • Advanced mastery of Kubernetes and EKS clusters
  • Expert knowledge of ArgoCD and GitOps workflows
  • Proficiency in Infrastructure as Code using Terraform
  • Hands-on experience with observability platforms
  • 14+ years overall experience
  • 8-10 years in SRE or Platform Engineering
  • Masters degree in STEM or Bachelors + 10 yrs experience

GE Vernova Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about GE Vernova and has not been reviewed or approved by GE Vernova.

  • Retirement Support The 401(k) plan includes company matching contributions and additional company retirement contributions, with access to Fidelity resources and financial planning consultants. Feedback suggests this structure supports long-term savings beyond a basic match.
  • Parental & Family Support Paid parental leave is available with flexible, continuous or non-continuous usage, and is complemented by adoption resources and Work/Life Connections guidance. Maternity leave is described as extended relative to typical workplace norms.
  • Leave & Time Off Breadth Time-off programs include 12 paid holidays, permissive time off for many salaried roles, and dedicated personal, illness, and caregiving time for U.S. new hires. Some hourly roles start with a defined PTO bank, while other roles may offer unlimited time off.

GE Vernova Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: , Cambridge, MA
75,000 Employees
Year Founded: 2024

What We Do

GE Vernova is a planned purpose-built company on a mission to electrify the planet while simultaneously working to decarbonize it. If we want our energy future to be different…we must be different. Our mission is embedded in our name. We retain our treasured legacy, “GE,” in our name as an enduring and hard-earned badge of quality and ingenuity. “Ver” / “verde” signal Earth’s verdant and lush ecosystems. “Nova,” from the Latin “novus,” nods to a new, innovative era of lower carbon energy that GE Vernova will help deliver. GE Vernova brings together GE’s portfolio of energy businesses including Power, Wind, Electrification and Digital businesses. With focus, GE Vernova is accelerating the path to more reliable, affordable, and sustainable energy, while helping our customers power economies and deliver the electricity that is vital to health, safety, security, and improved quality of life. Together, we have The Energy to Change the World.

Why Work With Us

Join our team, to evolve and grow, surrounded by some of the brightest minds in the industry who help you get better every day. You’ll get the chance to rewrite the rules, work on cutting-edge technology, and be part of a global team for positive change.

Gallery

Gallery

Similar Jobs

Wells Fargo Logo Wells Fargo

Consultant

Fintech • Financial Services
Hybrid
Hyderabad, Telangana, IND
205000 Employees
Hybrid
Hyderabad, Telangana, IND
205000 Employees
Hybrid
Hyderabad, Telangana, IND
205000 Employees
Hybrid
Hyderabad, Telangana, IND
205000 Employees

Similar Companies Hiring

Turion Space Thumbnail
Aerospace • Artificial Intelligence • Hardware • Information Technology • Software • Defense • Manufacturing
Irvine, CA
150 Employees
Fortune Brands Innovations Thumbnail
Manufacturing
Deerfield, IL
2450 Employees
Amalgamated Sugar Thumbnail
Food • Greentech • Agriculture • Industrial • Manufacturing
Boise, Idaho
768 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account