Site Reliability Engineer - TECHN01841
Bullhorn is the leading global software provider for the staffing and recruitment industry. More than 10,000 companies rely on Bullhorn’s cloud-based platform to power their staffing processes from start to finish. Through our incredible products and services, we create raving fan customers, resulting in company growth that consistently offers new opportunities for our talent to advance their careers. 25% of our global workforce gets promoted or moves into a new role every year, expanding their skills and working with new people. Bullhorn is large enough to provide these exciting opportunities but small enough to maintain the energy of a startup, and we’re consistently ranked as a great place to work for our strong culture and rewarding career opportunities.
Our commitment to our employees: Every Bullhorn employee has a sense of belonging, a voice that is heard, and a clear path to success. Bullhorn offers unlimited planned vacation, great opportunities for career development, quarterly paid volunteer days through its philanthropic group Bullhorn Cares, and an open invitation to Bullhorn Allies groups, which celebrate and cultivate diversity and inclusion for all employees.
Our in-office employees enjoy a casual, collaborative environment with weekly catered-in lunch and breakfast, and quarterly social events. While working from the comfort of their own homes, our remote employees are provided a full equipment package with all the tools they need to perform their role. We use Zoom, Slack, and other tools to stay connected while we are remote.
Why this job is important
Through acquisitions and organic growth, Bullhorn’s infrastructure and operations engineering teams are scaling rapidly and globally. To support that growth, we need a Site Reliability Engineer to drive improvements in log analysis, operational readiness, dashboards, capacity planning, and performance. Bullhorn has a broad technology portfolio full of products and services that need to interact seamlessly and at global scale for our Bullhorn One initiative to be a success. Achieving that goal will require a keen focus on telemetry and observability into every aspect of our infrastructure and applications.
As Site Reliability Engineer, a typical day might include:
Gathering and analyzing metrics from logs, dumps, and tools to assist in incident response, RCA, and blameless post-mortems
Actively participating in and driving elements of our continuous improvement processes
Partnering with development teams to improve services through rigorous testing and release procedures
Participating in system design consulting, platform management, and capacity planning
Building automations and practices to reduce toil and continually improve our team’s ability to operate effectively
Collaborating with other teams across Architecture & Technical Operations to prioritize critical product improvements
Balancing feature development speed and reliability with well-defined service level objectives (SLOs) and service level indicators (SLIs)Collaborating with Software and Architecture teams to continually improve operational understanding of system interactions and applications dependencies
Building application health dashboards and finding new ways to surface product issues quickly
Collaborating with Systems & DevOps engineers to improve our monitoring and alerting for key insight without “noise”.
As Site Reliability Engineer, your objectives would include:
Drive the continuous improvement of our visibility into holistic system health, influencing key KPIs for measuring and reporting progress.
Create/maintain SRE dashboards for each Bullhorn product & critical service
Implement software and tools to manage platform infrastructure and applications
Improve reliability, quality, and release velocity of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
Ensure we are aware of our capacity boundaries and are properly scaling systems to achieve consistent application performance
This job might be for you if you have:
7 years experience with software engineering, software development, or system operations
Comfortable in working in a Unix/Linux shell, ability to write shell scripts, and understanding of Linux internals
Proven subject matter expertise in log analysis (ELK preferred), dashboarding, and APM tools
Experience designing, building, and operating large-scale production systems
Proven experience with Python scripting (preferred)
Experience working with a variety of open-source databases (MySQL, Postgres, Redis, Cassandra, etc.)
Experience in both using and enhancing monitoring and observability with tools (ELK, Splunk, Sumologic or similar)
Experience automating infrastructure, testing, and deployments using tools like Ansible, Chef, Puppet or Terraform
Expertise in optimizing public cloud infrastructure, software, and tools (AWS & Azure)
Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
Bonus points for:
Bachelor’s degree in computer science or other highly technical, scientific discipline
You have 3+ years working experience in global, multi-cloud SaaS environments
Coding experience beyond simple scripts (Ability to program with one or more languages, such as Python, Java, Ruby, or JavaScript
Experience with Site Reliability Engineering and / or DevOps engineering.
Experience with containers, such as Docker, Rancher, Helm, and Kubernetes
Understanding core concepts of Chaos Engineering, even if you haven't yet implemented it yourself.
Bullhorn is committed to our core values and we are looking for people who exhibit these traits:
- Service - You go beneath the surface to solve problems.
- Energy - You build up your teammates and leave people positively charged.
- Ownership - You take action and own up to your mistakes.
- Speed & Agility - You go around obstacles and demonstrate urgency.
- Being Human - You consider other people's perspectives, laugh, and have fun.
Bullhorn is fully committed to equal opportunities. We aim to create a working environment free from discrimination. This means all job applicants and employees will receive equal treatment regardless of age, disability, gender reassignment, marriage, civil partnership, pregnancy, race, religion, or belief, gender or sexual orientation.