Staff Site Reliability Engineer

Posted 19 Days Ago
Hiring Remotely in USA
Remote
Senior level
Social Impact
The Role
As a Staff Site Reliability Engineer, you will lead and mentor a team, ensuring the reliability, scalability, and security of the platform. Responsibilities include designing AWS infrastructure, collaborating with developers for performance optimization, automating tasks, and developing monitoring systems to handle incidents efficiently.
Summary Generated by Built In
Overview:

Crisis Text Line provides free, 24/7, high-quality text-based mental health support and crisis intervention by empowering a community of trained volunteers to support people in their moments of need.

Our mission is at the intersection of empathy and innovation — we promote mental well-being for people wherever they are.

Our vision is an empathetic world where nobody feels alone.

Our core values are at the heart of all we do: connect with empathy, center equity, get it done together, and reflect and evolve.

 

Why you should join our team: 

  • Our work is transforming the way people in pain access support at their fingertips
  • Our work is innovative in the crisis response space
  • Our dynamic, fun, and diverse culture
  • Our meaningful cause, led by empathy and innovation
  • Our strong values at the center of all we do
  • Our commitment to diversity, equity and inclusion
  • Our commitment to engagement and belonging
  • Our commitment to our employees and their holistic wellbeing
  • Our  value of work/life balance
  • Our growth mindset and prioritize professional development
  • Our leaders who truly care

What you'll be doing:

As a Staff Site Reliability Engineer (SRE), reporting to the Senior Engineering Manager of SRE/Infrastructure, you will be a key technical leader ensuring the reliability, scalability, and security of our platform. In this role, you will play a strategic part in architecting, building, and maintaining the tooling that empowers our software engineering teams and managing the infrastructure that supports our staff and volunteers in delivering the Crisis Text Line service. You will collaborate closely with developers to drive performance optimization, implement best practices, and ensure a secure environment. With a significant focus on enhancing engineer productivity through automation and streamlined workflows, you’ll directly contribute to our mission of supporting texters, volunteers, and staff. This position requires extensive experience in infrastructure management, automation, and Site Reliability Engineering (SRE) practices.

Responsibilities:

  • Assisting to lead and mentor a team of 5 SREs, fostering a collaborative and innovative work environment.
  • Working closely with the 3 staff in TechOps/Security on enforcement of security best practices across the infrastructure and development processes.
  • Design, implement, and maintain our highly available and scalable AWS infrastructure that powers our service.
  • Collaborate with developers to optimize application performance and reliability.
  • Develop and maintain monitoring, logging, and alerting systems to ensure system health and performance.
  • Automate repetitive tasks and processes to improve efficiency and reduce manual intervention.
  • Respond to and resolve incidents, minimizing downtime and ensuring quick recovery.
  • Support and encourage a diversity of backgrounds, voices, and perspectives on the engineering team
  • Proactively communicate expectations, progress, and issues to engineers, product managers, and other colleagues with clarity and kindness, delivering and receiving feedback respectfully 
  • Spread knowledge, provide mentorship, and promote technical best practices
  • Learn both independently and from your colleagues, stretch yourself, and grow as an engineer and teammate
  • Write and review high-quality, easy-to-read, and testable code that follows best practices
  • Manage  time successfully by focusing on priorities, delivering on deadlines, and asking for help when stuck
  • Providing engineering input and estimating work both during refinement and architecture design.
  • Participate in retrospectives and post-mortems to improve our processes and operations
  • Conduct regular security audits and vulnerability assessments, addressing any identified issues.
  • Stay up-to-date with industry trends and emerging technologies, recommending and implementing improvements as needed.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related field (Master's degree preferred). 
  • Proven experience as a Staff SRE or in a similar SRE role, with experience in observability and chose engineering. strong focus on infrastructure and DevOps in a software delivery capacity.
  • Experience maintaining the reliability of online SaaS/PaaS with a 7/24 schedule.
  • Proficiency in AWS and infrastructure as code (e.g., Terraform, CloudFormation).
  • Strong scripting and automation skills (e.g., Python) and in-depth knowledge of containerization and orchestration (e.g., Docker, Kubernetes).
  • Proven experience in implementing CI/CD pipelines and tools (GIthub Actions) and observability tools (Datadog).
  • A commitment to ethical practices, data privacy, and security.
  • Solid understanding of network protocols, security principles, and best practices.
  • Excellent problem-solving skills and the ability to work under pressure, with strong communication skills to collaborate effectively with cross-functional teams.
  • Ability to learn quickly and manage your time successfully by focusing on priorities, delivering on deadlines, and asking for help when needed.
  • Strong communication skills, with the ability to collaborate effectively with cross-functional teams.
  • Demonstrates an understanding of essential computer science principles and how to apply them to solve problems. This including basic data structures, control structures and functions

Preferred Qualifications:

  • Master’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • Experience implementing Failure Injection / Chaos Engineering practices.
  • Cloud Solution Architect certifications or completed training (e.g. AWS Cloud Practitioner Essentials and/or AWS Certified Solutions Architect - Associate) GCP or Azure.
  • Strong experience with AWS Solution Architecture across Next.js, Go, PHP APIs, GraphQL, Databricks, and AI/ML workloads.
  • Knowledge of compliance and regulatory standards (e.g., GDPR, HIPAA, ISO 27001, SOC2, etc.).
  • Experience in a non-profit or mission-driven organization.

Reliable High-Speed Internet Required: Must have a stable high-speed internet connection to support seamless remote collaboration, virtual meetings, online job tasks, etc.

The full salary range for this position, across all United States geographies, is $135,520 - $178,060 per year. The upper portion of the salary range is typically reserved for existing employees who demonstrate strong performance over time. Starting salary will vary by location, qualifications, and prior experience; during the interview process, candidates will learn the starting salary range applicable for their location. We pay competitively in the tech-forward nonprofit space and offer a robust benefits package.

Only candidates in the following states will be eligible for employment: CA, CO, CT, FL, GA, HI, IL, IN, IA, MD, MA, MI, MO, NJ, NM, NY, NC, OH, PA, TN, TX, UT, VA, WA.

#CTL123

Benefits:

Crisis Text Line employee  benefits are thoughtfully designed using an equity lens, acknowledging that we are all unique human beings with individual life circumstances that require flexibility and support. 

 Benefits include: 

  • 20 paid holidays including:
    • Federal  holidays like Juneteenth and Labor Day
    • Election day
    • Holiday break from Dec 24  through January 1
    • 2 renewal days 
    • 2 floating holidays  
  • Flexible  paid time off, including: 
    • 15 vacation days
    • 3 personal days
    • 7 sick days 
  • Medical, dental, and vision benefits for the staff member and family at no cost to the employee
  • 403B retirement plan (the nonprofit equivalent of a 401K): 3% contribution by Crisis Text Line to support building financial wellness,  regardless of personal contribution
  • 12 weeks paid parental leave (after 6 months of employment)
  •  Student loan repayment (after 2 years of continuous full time service)
  • Family support through a virtual childcare platform
  • Stipends/Allowances
    • Mental health  (Monthly) 
    • Internet Service (Monthly) 
    • Professional Development (Annual)
    • Wellness (Annual)
    • Home office setup (One time/First year)

(Benefits are only for US-based employees. International benefits may differ).Crisis Text Line is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. We provide reasonable accommodation to individuals who have a disability and meet the skill, experience, education, and other job-related requirements of the role to allow the individual to perform the essential functions of the job.

Top Skills

AWS
The Company
New York, NY
2,355 Employees
On-site Workplace
Year Founded: 2013

What We Do

Millions of people are quietly suffering every day. They quietly struggle with depression, anxiety, eating disorders, bullying, suicidal thoughts, and more. Crisis Text Line provides free, 24/7 support for people in crisis in the United States via a medium people already use and trust: text. And we use insights from our work to develop and share innovations in prevention, treatment, and long-term care.

Crisis Text Line trains volunteers (like you!) to support people in crisis. With over 70 million crisis messages processed to date, the org is growing quickly, and so is the need.

Crisis Text Line is a not-for-profit tech start-up, born from the rib of DoSomething.org.

Similar Jobs

Crunchyroll Logo Crunchyroll

Staff Site Reliability Engineer - Data Engineering, Platform

Digital Media • eCommerce • Gaming • Mobile • News + Entertainment
Remote
San Francisco, CA, USA
1200 Employees
191K-239K Annually

Kustomer Logo Kustomer

Software Engineer, SRE (Senior/Staff levels)

Artificial Intelligence • Enterprise Web • Machine Learning • Natural Language Processing • Software • Conversational AI • Automation
Remote
New York, NY, USA
175 Employees

Webflow Logo Webflow

Senior Site Reliability Engineer

eCommerce • Software • Design • SEO
Easy Apply
Remote
U.S.

Favor Delivery Logo Favor Delivery

Senior Site Reliability Engineer

Food • Logistics • Mobile • On-Demand • App development
Remote
Texas, USA
460 Employees

Similar Companies Hiring

firsthand Health Inc Thumbnail
Software • Social Impact • Information Technology • Healthtech • App development
New York, NY
380 Employees
Zealthy Thumbnail
Telehealth • Social Impact • Pharmaceutical • Healthtech
New York City, NY
13 Employees
ReUp Education Thumbnail
Social Impact • Edtech
Austin, TX
145 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account