Principal Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
Palo Alto, CA, USA
In-Office
233K-336K Annually
Expert/Leader
Artificial Intelligence • Machine Learning
Uniphore is the Business AI Company powering the agentic enterprise.
The Role
Lead platform reliability and automation at scale by building production Go services, Kubernetes operators, multi-cloud infrastructure, and self-service tooling. Provide technical leadership through architecture, code, on-call escalation ownership, incident remediation, and mentorship to elevate engineering teams' operational maturity.
Summary Generated by Built In

Uniphore is one of the largest B2B AI-native companies—decades-proven, built-for-scale and designed for the enterprise. The company drives business outcomes, across multiple industry verticals, and enables the largest global deployments.  
  
Uniphore infuses AI into every part of the enterprise that impacts the customer. We deliver the only multimodal architecture centered on customers that combines Generative AI, Knowledge AI, Emotion AI, workflow automation and a co-pilot to guide you. We understand better than anyone how to capture voice, video and text and how to analyze all types of data.  
  
As AI becomes more powerful, every part of the enterprise that impacts the customer will be disrupted. We believe the future will run on the connective tissue between people, machines and data: all in the service of creating the most human processes and experiences for customers and employees.   

Job Description:
 

What You'll Be a Part Of: 
Uniphore is one of the largest B2B AI-native companies—decades-proven, built-for-scale and designed for the enterprise. The company drives business outcomes across multiple industry verticals and enables the largest global deployments. Uniphore infuses AI into every part of the enterprise that impacts the customer through our multimodal architecture combining Generative AI, Knowledge AI, Emotion AI, workflow automation and co-pilot guidance. 

About the Role: 
We're looking for a Principal Site Reliability Engineer to join our Platform Engineering team — someone equally at home writing production Go as designing and operating cloud infrastructure. The highest-leverage work here isn't a runbook; it's the service that enforces the runbook automatically. You'll write Go that runs in production and multiplies your impact across hundreds of services. 

You'll build the standards, frameworks, automations, agentic workflows, and self-service capabilities that make engineering teams autonomous while maintaining enterprise-grade reliability and security. You won't just define standards — you'll implement them in code: a Kubernetes Operator that enforces service readiness criteria, a service that surfaces SLO health across the fleet, an internal platform service that automates task execution. 

You'll collaborate with feature teams as an expert advisor and standard-setter, helping them build operational maturity while you maintain oversight of our single/multi-tenant, multi-cloud infrastructure. You'll be a bridge between software development and systems operations, focused on large-scale, resilient, automated infrastructure rather than daily firefighting. 

This is a senior individual-contributor role. You will not have direct reports. Your leadership is technical — exercised through architecture, production code, design reviews, and mentorship. This role participates in our on-call rotation, which covers all production systems. As a Principal, you'll own the hardest escalations and use what you learn on-call to drive the architectural fixes that eliminate whole classes of incidents. 

Responsibilities: 

Invention: 

  • Define and execute long-term architectural strategy for our multi-cloud platforms.  

  • Lead hands-on implementation of critical infrastructure projects, focusing on reliability, automation, and performance at scale.  

  • Own multi-year technical roadmaps that establish the vision for infrastructure scalability, reliability, security, and engineering velocity. 

Own: 

  • Provide technical leadership through design reviews and code contributions; set technical direction, eliminate architectural barriers to execution, and drive toward simplicity.  

  • Maintain end-to-end technical stewardship of your systems, keeping execution aligned with architectural vision and best practices.  

  • Act as a key technical advisor to Engineering Leadership and Product Management, influencing the strategic direction of Uniphore 

  • Lead design reviews across Infrastructure with a focus on automation, scalability, and reliability, and align architectural roadmaps across teams.  

  • Partner with Security to build secure-by-default systems and remediate weaknesses.  

  • Own the reliability of the systems under your technical stewardship.  

  • Create the technical clarity — vision, standards, and tooling — that lets feature teams build, own and operate their services. 

  • Participate in fleet-wide on-call, owning critical escalations across all production systems and converting recurring failure modes into permanent architectural fixes. 

Teach: 

  • Establish and evangelize design principles for reliable, secure, scalable systems. 

  • Grow other engineers through technical mentorship, architectural guidance, and design review. 

Requirements: 

  • 10+ years in DevOps/SRE/Platform Engineering, with demonstrated Staff- or Principal-scope impact and a track record of transforming operational models.  

  • Production Go: you write Go regularly, understand its concurrency model, and are comfortable owning Go services in production.  

  • Kubernetes depth: operational expertise plus the ability to extend it — you understand the controller-runtime model and could write or maintain a Kubernetes Operator.  

  • Cloud & infrastructure: expert-level AWS/GCP/Azure, Terraform, and multi-cloud architecture, with strong cost-optimization instincts.  

  • Production excellence: deep incident management, RCA process, and on-call system design experience.  

  • Software engineering fundamentals: API design, testing, observability instrumentation, and service lifecycle ownership — you treat internal tooling with the same rigor as customer-facing software.  

  • Standards & documentation: strong technical writing; you create operational procedures teams can self-execute.  

  • Architecture & planning: RFC/PRD review experience; you catch operational problems at design time.  

  • Collaboration & coaching: you build team capability through tooling and knowledge transfer rather than doing the work for them. 

Nice to Haves: 

  • Building Kubernetes Operators, controllers, or admission webhooks (controller-runtime, kubebuilder).  

  • Contributions to open-source infrastructure tooling.  

  • AWS Solutions Architect Professional or equivalent GCP/Azure certifications.  

  • Kubernetes certifications (CKA, CKAD, CKS).  

  • Platform engineering, developer experience, or internal developer portals (Backstage, etc.).  

  • GitOps patterns (ArgoCD, Flux) and policy-as-code tooling (OPA, Kyverno). 

Why You'll Love This Role: 

  • Your code is your leverage. Solutions you ship multiply across dozens of services and teams — you prevent entire classes of problems rather than patching instances.  

  • You'll shape the platform strategy. You'll drive the transformation from reactive support to strategic platform partnership, with platform engineering embedded in planning to prevent downstream issues.  

  • You'll tackle the hardest problems. Multi-tenant architecture scaling, cross-service observability, and reliability challenges that affect our largest enterprise deployments.  

  • You'll set the bar. Define the standards, incident-management frameworks, and service-ownership model that let teams graduate to full operational independence. 

Hiring Range: $232,900 – $335,811 OTE — for Primary Location Palo Alto, CA


 


Benefits:


In addition to competitive base pay, this position also includes an annual incentive opportunity based on target achievement,  pre-IPO stock options, benefits including medical, dental, vision, 401(k) with a match, and more, plus generous paid time off, paid holidays, paid day off for your birthday and other paid leave policies to support employees through all phases of life.


Location preference:

USA - CA - Palo Alto

Uniphore is an equal opportunity employer committed to diversity in the workplace. We evaluate qualified applicants without regard to race, color, religion, sex, sexual orientation, disability, veteran status, and other protected characteristics.
 
For more information on how Uniphore uses AI to unify—and humanize—every enterprise experience, please visit www.uniphore.com.

Skills Required

  • 10+ years in DevOps/SRE/Platform Engineering with Staff- or Principal-scope impact
  • Production Go experience; regularly write and operate Go services in production
  • Kubernetes operational expertise and ability to extend it (controller-runtime, write/maintain Kubernetes Operators)
  • Expert-level cloud and infrastructure knowledge (AWS, GCP, Azure), multi-cloud architecture, and Terraform
  • Deep incident management, RCA process, and on-call system design experience
  • Software engineering fundamentals: API design, testing, observability instrumentation, and service lifecycle ownership
  • Strong technical writing and ability to create standards and operational documentation
  • Architecture and planning experience (RFC/PRD review) to catch operational issues at design time
  • Collaboration and coaching: grow engineering capability through tooling and knowledge transfer
  • Building Kubernetes Operators, controllers, or admission webhooks (controller-runtime, kubebuilder)
  • Contributions to open-source infrastructure tooling
  • AWS Solutions Architect Professional or equivalent GCP/Azure certifications
  • Kubernetes certifications (CKA, CKAD, CKS)
  • Platform engineering, developer experience, or internal developer portals (Backstage)
  • GitOps patterns (ArgoCD, Flux) and policy-as-code tooling (OPA, Kyverno)

Uniphore Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Uniphore and has not been reviewed or approved by Uniphore.

  • Healthcare Strength Health coverage includes medical, dental, vision, mental‑health resources, and wellness programs, with multiple plan options (including HSA/FSA) indicating robust depth. Plan quality and affordability are highlighted relative to peers.
  • Leave & Time Off Breadth Time off includes generous PTO, paid holidays, and a paid birthday day off. Enhanced parental, caregiver, and bereavement leave extend coverage beyond standard policies.
  • Retirement Support Retirement offerings include a U.S. 401(k) with company match and pension/retirement plans with employer contributions in many countries. These programs support longer‑term financial security alongside core pay.

Uniphore Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Bengaluru, Karnataka
465 Employees
Year Founded: 2008

What We Do

The Business AI Cloud is the only sovereign, composable and secure AI platform that enables businesses to rapidly adopt, significantly transform and immediately unlock the value of their data. Trusted by more than 2,500 of the world’s largest enterprises and recognized by Gartner, Forrester, IDC and the Deloitte Fast 500, Uniphore is where enterprise AI moves from ambitions to production. A Complete, Composable Platform Uniphore is designed to be: Sovereign — run on any public cloud, private cloud or on-premises with full control over your data and AI models. Composable — choose your layer, model, or component—vector DBs, knowledge graphs, data compute, and beyond. Secure — embedded guardrails, observability, and AI security ensure trusted, compliant, and enterprise-grade protection. Trusted at Scale Over 2,000 global businesses — including many of the Fortune 500 — rely on Uniphore every day to drive growth, improve efficiency, and deliver personalized customer experiences. Customers include leaders across industries, like Skechers, LastPass, Atlassian, HP, Allstate, Sony, and more. Industry Recognition Named to Inc.'s Best in Business List Listed on the Deloitte Technology Fast 500 Recognized in reports by Gartner, Forrester, and IDC From Pilot to Production Through strategic collaborations with industry leaders like KPMG, Cognizant, Rackspace, Databricks and Snowflake, Uniphore helps organizations move beyond experimental AI pilots to production-grade deployment — operationalizing AI agents across internal and client-facing workflows at scale.

Similar Jobs

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
San Jose, CA, USA
8697 Employees
193K-275K Annually

The Walt Disney Company Logo The Walt Disney Company

Site Reliability Engineer

Digital Media • Gaming • News + Entertainment • Sports
In-Office
San Francisco, CA, USA
219548 Employees
251K-336K Annually

NVIDIA Logo NVIDIA

Site Reliability Engineer

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
In-Office or Remote
Santa Clara, CA, USA
21960 Employees
248K-397K Annually
In-Office
2 Locations
26259 Employees
147K-230K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account