Staff Observability Engineer

Reposted 7 Days Ago
Be an Early Applicant
Bengaluru, Bengaluru Urban, Karnataka, IND
In-Office
Expert/Leader
Artificial Intelligence • Healthtech • Analytics • Biotech
The Role
The Staff Observability Engineer will define and implement observability strategy, design standardized frameworks, collaborate on instrumentation, and conduct operational readiness while ensuring compliance with healthcare standards.
Summary Generated by Built In
Job Description SummaryAs a Staff Software Engineer (Observability), you will be responsible for defining and implementing the observability strategy across PCS Digital Solutions Cloud Applications.

Job Description

Roles and Responsibilities

In this role, you will:

  • Define and evolve the observability vision and roadmap for PCS DS applications
  • Design and implement/integrate standardized observability frameworks (metrics, logs, traces, events, profiling).
  • Collaborate with platform, SRE, and product teams to instrument services using OpenTelemetry and other modern observability tooling.
  • Build and maintain dashboards, alerts, and SLOs that reflect both technical and business health indicators.
  • Evaluate, integrate, and optimize observability agents (e.g., Prometheus, Fluent bit, OTEL and other agents).
  • Design self-remediation solutions leveraging observability tooling.
  • Implement Best Practices for using GenAI tools of Observability platforms.
  • Lead / contribute to incident analysis and postmortem reviews, driving improvements in system resilience and observability coverage.
  • Conduct Operational Readiness Reviews (ORRs) to validate monitoring, alerting, and rollback strategies before go-live.
  • Ensure observability practices align with healthcare compliance standards (e.g., HIPAA, GDPR, HITRUST).
  • Mentor engineers and promote a culture of observability-first development.

Required Qualifications

  • Bachelor’s or master’s degree in computer science, Engineering, or a related technical field.
  • 10+ years of experience in software engineering, SRE, or platform engineering roles.
  • 4+ years of experience in contributing in observability solutions in cloud-native environments (Kubernetes, microservices, serverless).
  • Deep expertise in observability pillars (metrics, logs, traces) and tools like OpenTelemetry, Prometheus, Grafana, Datadog, Dynatrace etc.
  • Strong programming/scripting skills (e.g., Go, Python, Bash, Terraform).
  • Experience with distributed tracingSLO/SLI frameworks, and incident response workflows.
  • Deep expertise in distributed systems, microservices, and cloud platforms (AWS, Azure, GCP).
  • Experience with AI-powered anomaly detection, automated incident response, and cost optimization for observability at scale.
  • Familiarity with SRE practices, chaos engineering
  • Excellent communication and collaboration skills.

Desired Characteristics

  • Experience in healthcare or regulated industries.
  • Knowledge of data privacy and compliance (HIPAA, HITRUST).
  • Experience with cost optimization and telemetry data governance.
  • Contributions to open-source observability projects.

Additional Information

Relocation Assistance Provided: No

Skills Required

  • Bachelor's or master's degree in computer science, Engineering, or a related technical field
  • 10+ years of experience in software engineering, SRE, or platform engineering roles
  • 4+ years of experience in contributing to observability solutions in cloud-native environments
  • Deep expertise in observability tools like OpenTelemetry, Prometheus, Grafana, Datadog, Dynatrace
  • Strong programming/scripting skills
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
50,282 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account