What you'll do:
- Take ownership of the operational availability, security, performance, scalability, efficiency, monitoring, and overall service reliability of Everbridge's data tier.
- Collaborate with Architects, Developers, Quality Engineers, Security Specialists, and Operations Engineers in an Agile environment to design and implement highly reliable data solutions.
- Develop and enhance automated tooling and processes to increase the operability and self-service capabilities of our data tier.
- Apply Database Reliability Engineering (DBRE) principles, emphasizing automation, proactivity, cross-functional collaboration, data-driven decision-making, and a fail-fast/safe culture to continuously improve both technology and team culture.
- Participate in a rotating on-call schedule to troubleshoot and resolve production escalations.
- Have fun while working hard to make a meaningful difference!
What you'll bring:
- At least 3 years of experience in production database reliability, database administration, site reliability engineering, DevOps, or SaaS technical operations.
- Proven hands-on expertise with Terraform, Salt, MongoDB, and CI tools like Jenkins or GitLab.
- A minimum of 3 years of experience working with cloud infrastructure, with AWS preferred.
- At least 1 year of coding experience in one or more programming languages, such as Python, Perl, Java, or Go.
Similar Jobs
What We Do
Keeping People Safe and Businesses Running. Faster.
Everbridge, Inc. (NASDAQ: EVBG) is a global software company that provides enterprise software applications that automate and accelerate organizations’ operational response to critical events in order to Keep People Safe and Businesses Running™. During public safety threats such as active shooter situations, terrorist attacks or severe weather conditions, as well as critical business events including IT outages, cyber-attacks or other incidents such as product recalls or supply-chain interruptions, over 5,300 global customers rely on the company’s Critical Event Management Platform to quickly and reliably aggregate and assess threat data, locate people at risk and responders able to assist, automate the execution of pre-defined communications processes through the secure delivery to over 100 different communication devices, and track progress on executing response plans.







