Site Reliability Specialist (Observability & Kubernetes)

Posted 2 Hours Ago
Be an Early Applicant
Hiring Remotely in United States
Remote
119K-145K Annually
Senior level
Information Technology • Software • Consulting
The Role
The Site Reliability Specialist is responsible for managing Everbridge's observability platform, ensuring reliability, scalability, and visibility into system health through tools like Grafana and Kubernetes.
Summary Generated by Built In

At Everbridge, we’re building a resilient, scalable, and secure cloud platform that powers critical services used around the world. We’re looking for a Senior Platform Site Reliability Specialist to own, operate, and evolve our enterprise observability platform.
In this role, you will be responsible for the up-keep, reliability, scalability, and strategic growth of Everbridge’s observability stack, EKS, and supporting services, ensuring our engineering teams have deep visibility into system health, performance, and reliability across a large-scale, cloud-native environment. You will also be working with other cloud technologies within the AWS and GCP areas.

Who we are looking for:

We’re looking for someone who shows up for the team, not just themselves. This role works best for a person who communicates clearly, collaborates easily, and treats interactions with other teams with respect and professionalism. You should be comfortable being involved, offering support, and helping move work forward without ego. We value people who build trust, keep things running smoothly, and make the teams around them better.

 

 

What you'll do:

    Observability Platform Ownership
    • Head the design, operation, and evolution of Everbridge’s observability stack
    • Build and maintain a highly available, scalable observability platform
    • Standardize instrumentation, dashboards, alerts, and SLOs
    • Support incident response, root cause analysis, and capacity planning
    • Grafana Stack & Telemetry
      • Operate and scale Grafana and technology
      • Grafana Loki (logs)
      • Grafana Mimir (metrics)
      • Grafana Tempo (tracing)
      • Grafana Alerting
      • Kubernetes
        • Maintain reliability and security of EKS clusters running observability
        • Manage cluster lifecycle and upgrades
        • Infrastructure as Code & Automation
          • Terraform for infrastructure provisioning
          • HashiCorp Packer
          • Gitlab CI/CD at Scale

What you'll bring:

    • 6+ years in SRE / Platform Engineering
    • Strong Grafana ecosystem experience
    • Kubernetes and Amazon EKS expertise
    • Terraform proficiency

Preferred Qualifications:

    • OpenTelemetry experience
    • Large-scale observability systems
    • Cost optimization experience

The reasonably estimated salary for this role at Everbridge ranges from $118,700 - $145,000 and may also include variable compensation. Actual compensation is based on factors such as the candidate's skills, qualifications, and experience. In addition, Everbridge offers a wide range of best in class, comprehensive and inclusive employee benefits for this role including healthcare, dental, parental planning, and mental health benefits, disability income benefits, life and AD&D insurance, a 401(k) plan and match, paid time off, and fitness reimbursements.
 
Fair Chance Statement US & Canada
We are committed to providing equal employment opportunities in compliance with all applicable Federal, Provincial/State and Local laws, including the California Fair Chance Act and any local County Fair Chance Ordinance (or local equivalent). Pursuant to these and other relevant regulations, we consider qualified applicants with criminal histories in a manner consistent with the law.
 
For roles subject to background checks, the following material job duties may be affected by an applicant’s criminal history:
- Access to sensitive or confidential information, such as financial records, proprietary data, or client information.
- Management of cash, company funds, or other valuable assets.
- Work in environments requiring heightened security measures.
- Compliance with contractual or regulatory requirements specific to the position.
 
We evaluate each applicant's criminal history individually, considering its nature, timing, and relevance to the specific job duties, while maintaining our commitment to fair hiring practices and promoting workplace equity.

About Everbridge

Everbridge empowers enterprises and government organizations to anticipate, mitigate, respond to, and recover stronger from critical events. In today’s unpredictable world, resilient organizations minimize impact to people and operations, absorb stress, and return to productivity faster when deploying critical event management (CEM) technology. Everbridge digitizes organizational resilience by combining intelligent automation with the industry’s most comprehensive risk data to Keep People Safe and Organizations Running™. For more information, visit www.everbridge.com, read the company blog, and follow on Twitter. Everbridge… Empowering Resilience
 
Everbridge is an Equal Opportunity/Affirmative Action Employer. All qualified Applicants will receive consideration for employment without regard to race, creed, color, religion, or sex including sexual orientation and gender identity, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Belfast
1,437 Employees
Year Founded: 2002

What We Do

Keeping People Safe and Businesses Running. Faster. Everbridge, Inc. (NASDAQ: EVBG) is a global software company that provides enterprise software applications that automate and accelerate organizations’ operational response to critical events in order to Keep People Safe and Businesses Running™. During public safety threats such as active shooter situations, terrorist attacks or severe weather conditions, as well as critical business events including IT outages, cyber-attacks or other incidents such as product recalls or supply-chain interruptions, over 5,300 global customers rely on the company’s Critical Event Management Platform to quickly and reliably aggregate and assess threat data, locate people at risk and responders able to assist, automate the execution of pre-defined communications processes through the secure delivery to over 100 different communication devices, and track progress on executing response plans.

Similar Jobs

Fabric Health Logo Fabric Health

Site Reliability Engineer

Artificial Intelligence • Healthtech • Software • Telehealth
Remote
USA
304 Employees
140K-170K Annually

EchoStar Logo EchoStar

Inside Sales Representative

Aerospace • Cloud • Digital Media • Information Technology • Mobile • News + Entertainment • Generative AI
In-Office or Remote
Orlando, FL, USA
14500 Employees
34K-70K Hourly

EchoStar Logo EchoStar

Remote Sales Specialist

Aerospace • Cloud • Digital Media • Information Technology • Mobile • News + Entertainment • Generative AI
In-Office or Remote
Akron, OH, USA
14500 Employees
34K-34K Hourly

EchoStar Logo EchoStar

Inside Sales Representative

Aerospace • Cloud • Digital Media • Information Technology • Mobile • News + Entertainment • Generative AI
In-Office or Remote
Orlando, FL, USA
14500 Employees
34K-70K Hourly

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account