Senior Site Reliability Engineer at Community
Community powers direct relationships and one-on-one conversations between Leaders and their Members through text messaging at scale. Launched in 2019 and headquartered in Santa Monica, Calif., Community is breaking new ground in trusted marketing and communications channels by connecting Leaders--global pop culture stars, local community organizers, small business owners and brands--to their Members to drive conversations that convert into actions, sales, revenue and more.
Join us at www.community.com @incommunity
About the role and your impact
Community is looking for a high performance, experienced Senior Site Reliability Engineer to be a part of our growing engineering team. You will join the team that has ownership of the overall performance and reliability of Community’s services, robustness of the deployment pipeline, as well as timely and effective incident response and resolution. In this role, you will be focused on working with teams to improve observability, scale our CI/CD processes, and keep all user-facing services and other production systems running smoothly.
You will be working with a distributed, remote-first team that spans across North America, Europe, and beyond. Last, but not least - we are growth-oriented, both as a company and as individuals: we take learning seriously and invest in growing our skills and our teams capabilities.
What You’ll Do
- Proactively monitor, measure, and improve all areas of infrastructure and services.
- Scale solutions from proofs-of-concept to full production systems.
- Help drive development team ownership and best practices in observability (monitoring, tracing, alerting, logging, run-books) and high availability software engineering.
- Design and build the tools, frameworks, systems, and processes that our engineers use to build, integrate, deploy, scale, and manage their software.
- Automate tasks across the full CI/CD lifecycle to create an efficient developer experience and reduce manual toil.
- Contribute to capacity planning, demand forecasting, software performance analysis, and systems tuning.
- Participate in an on-call rotation to mitigate site disruption.
- Minimize the risk of reliability-related failure outcomes as pertaining to durability, availability, performance, and correctness.
- Collaborate effectively with other engineers and help overall knowledge sharing across our organization
What You’ll Bring
- 2+ years in a SRE role, with a focus on tooling, automation and distributed systems development.
- Strong software development skills in at least one programming language.
- A desire to stay on the cutting edge of infrastructure and automation technologies.
- Production experience with infrastructure frameworks like Mesos, Terraform, Kubernetes, and experience with RabbitMQ
- Production experience with AWS environments.
- Experience with SQL and NoSQL databases.
- Experience with configuration management tools like Puppet, Chef, or Ansible.
- In-depth understanding of DevOps culture, SRE principles, and Agile methodologies.
- Ability to debug code and troubleshoot service failures.
- Superb communication skills, both written and verbal.
Community is proud to be an equal opportunity employer. We commit ourselves to inclusivity across race, gender identity, sexual orientation, religion, body size, disability, age, and class - in everything we do.