Site Reliability Engineer
Eliminate Crime. Build Community.
Flock Safety provides the first public safety operating system that empowers private communities and law enforcement to work together to eliminate crime. We are committed to protecting human privacy and mitigating bias in policing with the development of best-in-class technology rooted in ethical design, which unites civilians and public servants in pursuit of a safer, more equitable society.
Our Safety-as-a-Service approach includes affordable devices powered by LTE and solar that can be installed anywhere. Our technology detects and captures objective details, decodes evidence in real-time and delivers investigative leads into the hands of those who matter.
While safety is a serious business, we are a supportive team that is optimizing the remote experience to create strong and fun relationships even when we are physically apart. Our flock of hard-working employees thrive in a positive and inclusive environment, where a bias towards action is rewarded. Flock Safety is headquartered in Atlanta and operates nationwide. We have raised approximately 250M in venture capital including a recent Series D round by Andreessen Horowitz. Now surpassing a 1B valuation, Flock is scaling quickly and seeking the best and brightest to help us meet our goal of eliminating crime in the United States by 25% in the next three years.
About the opportunity
This role falls within our growing SRE organization, responsible for building and supporting our products from a software perspective. DevOps engineers are responsible for maintaining the uptime of services and keeping Flock production systems running smoothly. They also work on our build, deploy, metrics, alerting, and logging tooling and infrastructure. They work with other engineers to ensure the system is capable of handling significant growth as we scale over time. This role will specifically support our Machine Learning organization and will be geared on building, deploying, codifying the infrastructure, build metrics, and alerting for ML projects and systems.
Some challenges you’ll tackle
- Make sure the system is running and in line with internal SLIs and SLOs
- Collaborate with Platform, Machine Learning, and Hardware teams on multifaceted projects that interact with our system
- Manage the infrastructure for ML projects and systems
- Assess new technologies as needed, balancing technical needs and business impact
- Refine our CI/CD process, improving the rate we can deliver new code to production in a reliable and efficient manner
- Collaborate on creating a robust monitoring platform for our services and their underlying infrastructure, aiming to alert on symptoms and not outages
- Be part of an on-call process to resolve availability incidents and work towards preventing these incidents from ever happening
- Use best practices when creating and managing AWS resources (e.g. security groups, VPCs)
- Manage containers and orchestration, using Docker and Kubernetes
Nice to haves
- Experience in an SRE role with an understanding of monitoring, troubleshooting, and disaster recovery
- Proficiency with infrastructure as code and/or configuration management (we use Terraform)
- Experience with managing monitoring dashboards using tools like Grafana and Prometheus to create actionable alerts
Why join the Flock?
When you join the Flock, you are joining a diverse team of passionate, ambitious, intelligent people that puts the team over self. We offer competitive salary, benefits, and the opportunity to grow your career at a fast-paced, high growth startup. We genuinely care about the well-being of our employees both in and out of the office and understand the importance of work/life balance. We’d love for you to join us in the fight to eliminate non-violent crime, one neighborhood at a time.