Principal Observability Architect

Posted 11 Days Ago
Be an Early Applicant
Hiring Remotely in Orlando, FL
Remote or Hybrid
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
We're putting AI to work for people.
The Role
This role involves architecting a multi-tenant Observability Platform-as-a-Service, ensuring real-time monitoring, developer enablement, and integrating AI for enhanced observability and reporting. It requires deep technical leadership and expertise in telemetry and observability systems.
Summary Generated by Built In
Company Description
It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today - ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone.
Job Description
We are seeking a Principal Observability Architect to lead the strategic architecture, evolution, and operationalization of a modern, multi-tenant Observability Platform-as-a-Service (OPaaS) tailored for a hybrid on-prem and cloud-native SaaS product.
You will architect a cloud-agnostic, federated observability platform that supports real-time monitoring, advanced telemetry pipelines, and AI-powered insights to ensure platform reliability, developer productivity, and exceptional customer experiences. This role combines deep technical leadership with a strong focus on developer enablement, platform resiliency, and data governance.
What you get to do in this role:
Platform Architecture & Strategy
  • Lead architecture and roadmap for a multi-region, multi-cloud, multi-tenant observability platform scalable across diverse customer environments and service boundaries.
  • Architect near real-time telemetry ingestion pipelines with low-latency guarantees (seconds) using a mix of streaming and batch processing technologies.
  • Define observability blueprints including telemetry SLAs, data contracts, tenant data isolation, and cost-aware retention strategies for high-cardinality data.
  • Ensure observability systems are cloud-native and container-aware, supporting environments built on Kubernetes, service meshes, and serverless components.

Real-Time Monitoring & Detection
  • Design and implement real-time metrics, logs, traces, and event pipelines with technologies such as:
    • VictoriaMetrics, Prometheus, Grafana, Alertmanager
    • Cribl Stream and Edge for dynamic routing and filtering
    • VictoriaLogs for structured log analysis
  • Embed real-time anomaly detection and signal correlation, with context-aware alerting to reduce noise and MTTR.
  • Integrate with alerting and incident response tools (PagerDuty, Slack, ServiceNow) for automated incident routing and contextual enrichment.
  • Ensure observability of synthetic probes, end-user transactions, and critical SLOs with per-tenant granularity.

Instrumentation, Developer Enablement & CI/CD Integration
  • Standardize OpenTelemetry instrumentation across all services with prebuilt SDKs, language libraries, and semantic conventions.
  • Architect OpenTelemetry deployment patterns (agent-based, sidecar, collector pipelines) with support for Kubernetes, Lambda, and edge environments.
  • Embed observability validation gates into CI/CD workflows (e.g., GitHub Actions, GitLab CI) to enforce telemetry compliance before production rollout.
  • Provide self-service tools, templates, and training to enable developer teams to adopt observability by default.

AI for Observability & Productivity
  • Leverage AI/ML for:
    • Real-time anomaly detection and noise suppression
    • Predictive incident detection and impact forecasting
    • Auto-summarization of alert storms and telemetry bursts
    • Multi-tenant root cause and blast radius correlation
  • Build or integrate LLM-powered tools that support:
    • Natural language querying of live telemetry
    • AI-assisted debugging and dashboard generation
    • Generative runbooks and incident summaries

Data Platform Architecture
  • Architect hot and cold telemetry storage pipelines using:
    • VictoriaMetrics and Cribl for hot-path observability
    • Long-term retention in object storage (e.g., S3, GCS) using open formats (Parquet, JSON)
    • Federated querying engines like Trino for historical and cross-service analytics
  • Implement cost-aware ETL strategies, balancing real-time visibility with storage and ingestion optimization.
  • Incorporate data governance, PII handling, and regional data compliance (e.g., GDPR, SOC2) into telemetry architecture.

SaaS Operations & ITSM Integration
  • Integrate observability into ITSM and incident response systems (e.g., ServiceNow, Jira):
    • Auto-create incidents enriched with correlated traces, logs, and metrics
    • Provide real-time telemetry context in change and problem management flows
  • Deliver customer-facing health dashboards, SLA monitoring, and per-tenant observability insights to support operational excellence and transparency.

Technical Leadership
  • Lead cross-functional collaboration with SRE, Platform, Security, and Engineering teams to evolve observability maturity.
  • Define and document observability patterns, anti-patterns, and escalation workflows.
  • Drive internal R&D around OpenTelemetry, AI in observability, high-cardinality telemetry, and eBPF-based observability tooling.

Qualifications
To be successful in this role you have:
  • Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry.
  • 10+ years in DevOps, SRE, or Observability roles, including 5+ years in architecture or platform engineering.
  • Proven experience designing and operating near real-time observability systems in global-scale SaaS environments.
  • Deep expertise in OpenTelemetry (including collector deployment, semantic conventions, sampling strategies).
  • Experience integrating observability in Kubernetes, microservices, and serverless ecosystems.
  • Hands-on with telemetry data pipelines using Cribl, Prometheus/VictoriaMetrics, and log/trace platforms.
  • Experience embedding telemetry validation in CI/CD workflows.
  • Familiarity with AI/ML for observability (anomaly detection, summarization, impact correlation).
  • Working knowledge of data privacy, retention, and compliance practices in observability.

Nice to Have:
  • Experience with Trino, S3 data lakes, and long-term observability analysis.
  • Experience building customer-facing observability features (dashboards, SLAs, health status pages).
  • Contributions to open-source observability tools or standards.
  • Knowledge of or hands-on experience with Agentic AI systems to drive autonomous remediation, telemetry analysis, or incident response.
  • Relevant certifications (e.g., AWS, GCP, Azure, OpenTelemetry, Observability Practitioner).

GCS-23
Additional Information
Work Personas
We approach our distributed world of work with flexibility and trust. Work personas (flexible, remote, or required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work and their assigned work location. Learn more here . To determine eligibility for a work persona, ServiceNow may confirm the distance between your primary residence and the closest ServiceNow office using a third-party service.
Equal Opportunity Employer
ServiceNow is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, national origin or nationality, ancestry, age, disability, gender identity or expression, marital status, veteran status, or any other category protected by law. In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements.
Accommodations
We strive to create an accessible and inclusive experience for all candidates. If you require a reasonable accommodation to complete any part of the application process, or are unable to use this online application and need an alternative method to apply, please contact [email protected] for assistance.
Export Control Regulations
For positions requiring access to controlled technology subject to export control regulations, including the U.S. Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals. All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities.
From Fortune. ©2025 Fortune Media IP Limited. All rights reserved. Used under license.

Top Skills

Ai/Ml
Cribl
Gcs
Github Actions
Gitlab Ci
Grafana
Kubernetes
Opentelemetry
Pagerduty
Prometheus
S3
Servicenow
Slack
Trino
Victoriametrics

What the Team is Saying

Brady
Hasan
Jamil
Shanequa
Katya
Alexander
Jaime
Pat
Suzanne
Viviana
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Santa Clara, CA
26,000 Employees
Year Founded: 2004

What We Do

As the AI platform for business transformation, we're putting AI to work across organizations — freeing people for work that matters. Making old tech work with new tech. Reaching across departments, from the front office to the back office and every office in between. Our ambition? To become the defining enterprise software company of the 21st century (or "DESCO21C," as we like to call it).

With more than 8,100+ customers, we serve approximately 85% of the Fortune 500®, and we're proud to be a Fortune 100 Best Companies to Work For® and World's Most Admired Companies™.

Explore your future career with us, visit www.servicenow.com/careers.

From Fortune. ©2024 Fortune Media IP Limited. All rights reserved. Used under license.

Why Work With Us

By joining ServiceNow, you are part of an ambitious team of change-makers who have a restless curiosity and a drive for ingenuity. We're committed to helping our people do their best work and live their best lives so we can fulfill our purpose together. At the fastest-growing enterprise software company , you can grow can grow your career faster.

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

ServiceNow Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

At ServiceNow, we lead with flexibility and trust. For some, home is the primary workplace. For those who come into a ServiceNow workplace, you are empowered to make team-guided and individual-led decisions on how and when you use the workplace.

Typical time on-site: Flexible
Company Office Image
HQSanta Clara, CA
CR
Singapore
Company Office Image
JP
Company Office Image
Addison, TX
Amsterdam, NL
Atlanta, GA
Auckland, NZ
Austin, TX
Bangkok, TH
Company Office Image
Bengaluru, IN
Berlin, DE
Company Office Image
Brisbane, QLD
Brussels, BE
Company Office Image
Canberra, AU
Chesterfield, MO
Company Office Image
Chicago, IL
Ciudad de México, MX
Company Office Image
Denver, CO
Company Office Image
Dublin, IE
Düsseldorf, DE
Franklin, TN
Gothenburg, SE
South Korea
Helsinki, FI
Houston, TX
Company Office Image
Hyderabad, IN
Issy-les-Moulineaux, FR
Johannesburg, ZA
Kirkland, WA
Lausanne, CH
Lille, FR
Company Office Image
London, GB
Los Angeles, CA
Lysaker, NO
Madison, WI
Madrid, ES
Melbourne, AU
Company Office Image
Milan, IT
Milwaukee, WI
Minneapolis, MN
Montréal, QC
Mumbai, IN
Munich, DE
Company Office Image
New York, NY
Company Office Image
Novi, MI
Oberbüren, CH
Orlando, FL
Perth, AU
Petah Tikva, IL
Company Office Image
Pleasanton, CA
Riyadh, SA
Rome, IT
Company Office Image
San Diego, CA
Company Office Image
San Francisco, CA
Company Office Image
Santa Clara, CA
São Paulo, BR
Søborg, DK
Solna, SE
Company Office Image
Staines, GB
Sydney, NSW
Tokyo, JP
Toronto, Ontario
Company Office Image
Vienna, VA
Vienna, AT
Company Office Image
Waltham, MA
Washington, DC
Company Office Image
Wellington, NZ
Learn more

Similar Jobs

ServiceNow Logo ServiceNow

Senior Technical Support Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Orlando, FL, USA

ServiceNow Logo ServiceNow

Sr Production Service Engineer - Cloud Operations - Federal

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Orlando, FL, USA

ServiceNow Logo ServiceNow

Architect

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
West Palm Beach, FL, USA

ServiceNow Logo ServiceNow

Consultant

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Orlando, FL, USA

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account