We are seeking a Senior Manager of Kubernetes Observability to provide strategic leadership for the design, standardization, and scaled execution of our enterprise observability ecosystem across Kubernetes and OpenShift platforms, including Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE). This role is responsible for ensuring a robust, unified, and automated observability platform that enables reliability, performance, and operational excellence across all clusters and workloads in hybrid and multi-cloud environments.
As a senior technology leader, you will define the long-term vision and operating model for metrics, logging, tracing, eventing, and monitoring standards across on-prem, cloud-managed, and hosted Kubernetes platforms. You will guide multiple engineering teams to execute consistently against this strategy, ensuring full instrumentation, proactive issue detection, reduced MTTR, and improved platform stability. Through strong architectural direction, organizational alignment, and focused mentorship, you will elevate engineering maturity and ensure developers and SREs have actionable insights that accelerate innovation and support enterprise growth at scale.
Key Responsibilities
Kubernetes Observability Strategy & Operating Model
- Define the target-state vision and multi-year roadmap for observability across Kubernetes, OpenShift, AKS, and GKE, including metrics, logging, tracing, eventing, and alerting standards.
- Establish a unified observability operating model that ensures consistency, scalability, and reuse across on-prem, cloud-managed, and multi-cloud Kubernetes environments.
- Define success metrics and outcomes that measure observability effectiveness, reliability improvements, and reductions in MTTR across all platforms.
Platform Architecture, Standardization & Instrumentation
- Set architectural direction for enterprise observability platforms, tooling, and telemetry pipelines across Kubernetes, OpenShift, AKS, and GKE.
- Establish standardized instrumentation patterns for clusters, workloads, control planes, and platform services, ensuring complete and consistent telemetry coverage regardless of Kubernetes distribution or cloud provider.
- Drive convergence toward unified observability frameworks that abstract provider-specific differences while preserving deep platform insight.
Automation, Telemetry Workflows & Adoption
- Drive automation of observability onboarding and telemetry workflows across Kubernetes, AKS, and GKE to reduce manual effort and accelerate adoption.
- Enable self-service observability capabilities that allow developers and SREs to easily instrument, monitor, and troubleshoot workloads across cloud and on-prem clusters.
- Ensure observability is embedded by default into platform, infrastructure-as-code, and application delivery pipelines.
Reliability, Monitoring & Operational Excellence
- Enable proactive issue detection through scalable alerting frameworks, actionable dashboards, and standardized monitoring practices across all Kubernetes platforms.
- Improve reliability and performance visibility for workloads running on OpenShift, AKS, and GKE, reducing reliance on reactive troubleshooting.
- Partner with SRE and operations teams to continuously improve incident response, post-incident learning, and preventative engineering across hybrid and multi-cloud environments.
Leadership, Organization & Cross-Team Alignment
- Lead, mentor, and develop engineering leaders and teams responsible for observability platform components and services.
- Align platform, SRE, cloud, and application teams around shared observability standards and operational goals across Kubernetes, AKS, and GKE.
- Strengthen cross-team collaboration and engineering rigor to raise overall organizational maturity in observability and operations.
Required Qualifications
- 6+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
- 3+ years of management or leadership experience
- 5+ years of experience in platform engineering, reliability engineering, or observability-focused technical leadership roles, or equivalent demonstrated experience.
- 6+ years of Grafana & Splunk
- 5+ years of experience with Kubernetes observability concepts, including metrics, logging, tracing, eventing, and monitoring platforms, across OpenShift, AKS, and GKE.
Desired Qualifications
- 6+ years of people management or senior technical leadership experience guiding multiple engineering teams.
- Demonstrated success defining and scaling enterprise observability platforms across large, multi-cloud Kubernetes environments.
- Strong understanding of SRE, operational excellence, and reliability engineering practices.
- Experience driving automation and standardization to reduce MTTR and operational toil.
- Proven ability to influence across platform, infrastructure, cloud, and application teams.
- Strong executive communication skills, including the ability to articulate strategy, tradeoffs, and outcomes to senior stakeholders.
Job Expectations
- There is no Visa sponsorship available for this position.
- There is no relocation allowance available for this position
- This position requires working in one of the posted locations in a hybrid environment
Top Skills
What We Do
Wells Fargo & Company (NYSE: WFC) is a leading financial services company that has approximately $2.1 trillion in assets. We provide a diversified set of banking, investment and mortgage products and services, as well as consumer and commercial finance, through our four reportable operating segments: Consumer Banking and Lending, Commercial Banking, Corporate and Investment Banking, and Wealth & Investment Management. Wells Fargo ranked No. 33 on Fortune’s 2025 rankings of America’s largest corporations. Our technology professionals drive innovation, information security, and big data analytics while maintaining a network that handles more than 12 billion customer interactions a year. Join us! Are you looking for more? Find it here. At Wells Fargo, we're more than a financial services leader – we’re a global trailblazer committed to driving innovation, empowering communities, and helping our customers succeed. We believe that a meaningful career is much more than just a job – it’s about finding all of the elements to help you thrive, in one place. Living the Well Life means you’re supported in life, not just work. It means having robust benefits, competitive compensation, and programs designed to help you find work-life balance and well-being. You’ll be rewarded for investing in your community, celebrated for being your authentic self, and empowered to grow. And we’re recognized for it – Wells Fargo once again ranked in the top three – making us the #1 financial services employer – on the 2025 LinkedIn Top Companies list of best workplaces “to grow your career” in the U.S. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic. © 2026 Wells Fargo Bank, N.A. All rights reserved. Member FDIC.
Why Work With Us
We're known for our “Well Life” approach to supporting employees’ career aspirations, work-life balance, and mental and physical health. We ranked in the top 3 on the 2025 LinkedIn Top Companies list – and #1 among financial services companies – as the best workplace “to grow your career” in the U.S.
Gallery
Wells Fargo Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.