Site Reliability Engineer
BillGO is looking for mid- to senior-level Site Reliability Engineers on our Reliability team.
The Reliability team focuses on the Deploy, Operate and Monitor portions of the DevOps lifecycle. We are a Software Engineering team focused on building tooling and pipelines to facilitate the productionization of BillGO’s software components. We help engineering teams meet their uptime and availability goals via instrumentation, tools and infrastructure, and analytics. We primarily focus on Observability, Incident Management and Capacity Planning, but will also dive into Chaos Engineering as we progress.
Responsibilities and goals
- Instrument applications and infrastructure to facilitate Observability
- Work with teams to define, measure and ultimately meet their SLOs and SLAs
- Partner with teams to refine their incident management procedures
- Design and implement tooling, processes and infrastructure required to meet our SLOs and SLAs
- Regularly participate in design reviews and code reviews
Requirements
- 4+ years professional experience
- Fluent in one or more programming languages
- BA/BS in Computer Science or related fields
Strongly recommended
- A passion for high quality code at scale
- Experience working in payments
- Experience in the Production Engineering space, specifically for API-based microservices
- Experience in Java, but Python & Golang are also nice to have
- Experience in one or more database technologies (Relational, NoSQL, etc.)
- Experience with microservices on AWS and/or Azure
- Experience implementing observability and scalability techniques at scale (logging, time-series metrics, distributed tracing, service meshes, etc.)