Site Reliability Engineer IV (Remote)

| Remote
Apply Now
By clicking continue you agree to Built In’s Privacy Policy and Terms of Use.
Company Overview is simplifying how individuals securely prove and share their identity online. With their secure digital identity network, is doing for identity what Visa did for financial transactions. empowers people to fully control their own data through a portable and trusted login so they don’t need to create a new password at each site they visit.

The COVID-19 pandemic has accelerated a massive digital migration for many critical services. These services require a trusted identity to ensure an individual is who they claim to be while keeping out fraud. Identity verification that serves only one organization is costly and time-intensive. Separate passwords for each application add to consumer frustration. With, login and identity credentials move with an individual so they only need to verify once. is a federally-certified identity provider at the highest standards NIST has set for consumer identity verification and login. is one of only four companies in the United States of America certified by the federal government to bind a legal identity to a digital login. 

In addition to providing individuals with complete control over their credentials and data, the company has a “No Identity Left Behind” initiative to expand access and inclusion for all individuals through a video chat verification process. is passionate about building a robust identity network that does not compromise access for hard-to-identify groups.

Role Overview 

We are looking for a Site Reliability Engineer IV (SRE) who will combine software and systems engineering to build and run distributed, fault-tolerant systems at scale. SRE's ensure our services have the appropriate reliability and uptime to protect and promote our customers’ experience.

  • Design, build, implement, and maintain platform tooling that improves reliability across the entire product surface area, to improve the availability, scalability, latency, and efficiency of services
  • Manage end-to-end distributed systems availability and ensure high-performance of applications
  • Build automation solutions to prevent problem recurrence
  • Build visibility into SLIs, SLOs, SLAs, and dependency metrics to manage operational burden and systems reporting
  • Design, build, implement, and maintain observability ecosystem to provide visibility across the platform services and applications
  • Proactively identify risks and develop engineering processes and/or tooling to reduce availability risk
  • Evangelize best practices and mentor service owners on reliability, resiliency, and scalability for new and existing services and/or features
  • Participate in an on-call rotation and hold retroactive root cause analysis meetings, focusing on identifying remediations and product resiliency opportunities
Ideal Qualifications 
  • At least 5 years of experience working in medium or large scale production systems
  • The ability to take a systematic approach to analyzing, troubleshooting, and diagnosing system problems to identify, locate, resolve, and repair problems
  • Experience in software development or systems engineering with code
  • Experience designing for scale and automation-forward ecosystems and solutions
  • Possess a breadth of engineering skills with an interest in service reliability, automation, monitoring, and capacity planning
  • Understanding of modern application architecture (e.g. microservices, EDA)
  • Experience with APM services and solutions (e.g. New Relic, Dynatrace, AppDynamics, Datadog)
  • Experience with time-series observability solutions (e.g. InfluxDB, Prometheus, Grafana)
  • Experience with scaled indexed logging solutions (e.g. Splunk, ElasticSearch, OpenSearch)
  • Experience running and operating Ruby on Rails applications and infrastructure
  • Deep knowledge with major cloud services providers and solutions (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
  • Previous experience working within an reliability engineering culture (e.g. improving reliability through systems engineering automation, chaos testing, synthetics, and process improvement)
  • Experience designing, building, implementing, and operating distributed systems and cloud infrastructure at scale
  • Experience with container computing and container orchestration (e.g. proprietary systems such as Amazon ECS, multi-cloud solutions such as Kubernetes, or Nomad)
  • Experience with configuration management systems (e.g. Ansible, Puppet, Chef, Saltstack, Consul)
  • Experience with virtual networking (e.g. cloud networking, service mesh, SDN)
  • Experience in security automation (e.g. cloud proprietary solutions such as Amazon Secrets Manager or Vault)
  • Experience with infrastructure-as-code (e.g. cloud proprietary solutions such as CloudFormation or Terraform)
  • Strong written communication skills
  • Ability to work in an asynchronous environment
  • Experience in supporting a 24/7 operational infrastructure including on-call rotations

Ideal candidate will thrive in the following culture:

  • Must have an obsession for building quality products 
  • Ability to thrive when there are changing priorities and shifting of gears
  • Strong oral and written communication skills
  • Must be a team player with a strong, self-managing work ethic
  • Must be a self-starter with a passion for security engineering, learning and continuous improvement
Day to Day Life
  • Ensure observability tooling and integrations are providing telemetry and logging statistics across the entirety of systems and applications
  • Enable the Engineering organization the ability to identify and triage operational issues, empowering teams to own and operate autonomously
  • Contribute to defining and executing on the Observability Roadmap in maintaining and modernizing cloud-native observability within the organization
  • Integrate telemetry and logging frameworks to the cloud platform
  • Evaluate new and existing observability technologies to ensure capabilities are inclusive of black box solutions (e.g. COTS) as well as Engineering-created software
  • Manage distributed system and application scaling activity directly (as applicable) as well as in an advisory capacity on behalf of Engineering development teams

Note that candidates must be located in the continental U.S. Covid Vaccination Requirement has a mandatory vaccination requirement where not prohibited by applicable federal or state law.

All current and future employees are required to receive their COVID-19 vaccinations, unless a reasonable accommodation is approved. Employees not in compliance with this policy will be placed on leave and will be terminated if no valid reason for not getting the COVID-19 vaccine is provided.  

Purpose: In accordance with's duty to provide and maintain a workplace that is free of known hazards, we are adopting this policy to safeguard the health of our employees and their families; our customers and visitors; and the community at large from COVID-19 that may be reduced by vaccinations. This policy will comply with all applicable laws and is based on guidance from the Centers for Disease Control and Prevention and local health authorities, as applicable.

Reasonable Accommodation: Current and future employees in need of an exemption from this policy due to a medical reason, or because of a sincerely held religious belief must submit a completed Request for Accommodation form to the human resources department to begin the interactive accommodation process as soon as possible after vaccination deadlines have been announced (September 13th) and an offer of employment has been made. Accommodations will be granted where they do not cause undue hardship or pose a direct threat to the health and safety of others.

Vision: To be the world's leading digital identity network empowering people to control their own information and to prove their credentials across all channels: online, call center, and in-person.

Mission: To make the world a more trusted place by delivering the highest level of security with the least amount of friction at the lowest possible cost. 

People: We have an audacious mission. We aim to fix the identity layer of the internet. Billions of people will live better lives with more trust and convenience thanks to We are like Special Forces. We take on the most difficult challenges with amazing teammates. Core Values: *Don't be a jerk. *Always compete. *Ask questions like a 5-year old. *Inspire people with your passion. *Make something better every day. *Treat each customer like your favorite family member. *Own your mistakes so you can learn from them. *Details are everything. *Communicate like a scientist. *Be truthful (even when it's hard). *Reflect's values in your actions. *Act like an owner. Career Site & Culture Deck: maintains a work environment free from discrimination, where employees are treated with dignity and respect. All employees share in the responsibility for fulfilling our commitment to equal employment opportunity. does not discriminate against any employee or applicant on the basis of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. adheres to these principles in all aspects of employment, including recruitment, hiring, training, compensation, promotion, benefits, social and recreational programs, and discipline. In addition,'s policy is to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations and ordinances where a particular employee works. Upon request we will provide you with more information about such accommodations.
Please review our Privacy Policy, including our CCPA policy, at If you provide with any personally identifiable information you confirm that you have read and agree to be bound by the terms and conditions set out in our Privacy Policy. participates in E-Verify.
More Information on operates in the Cloud industry. The company is located in McLean, VA. was founded in 2010. It has 1600 total employees. To see all 76 open jobs at, click here.
Read Full Job Description
Apply Now
By clicking continue you agree to Built In’s Privacy Policy and Terms of Use.

Similar Jobs

Apply Now
By clicking continue you agree to Built In’s Privacy Policy and Terms of Use.
Save jobView's full profileFind similar jobs