Staff Site Reliability Engineer

Posted 10 Days Ago
Hiring Remotely in United States
Remote
148K-185K Annually
Senior level
Information Technology • Security • Cybersecurity
The Role
As a Staff Site Reliability Engineer, you will lead incident management, optimize monitoring solutions, refine SLOs/SLIs/SLAs, and mentor other engineers.
Summary Generated by Built In
About Us

At SentinelOne, we’re redefining cybersecurity by pushing the limits of what’s possible—leveraging AI-powered, data-driven innovation to stay ahead of tomorrow’s threats.

From building industry-leading products to cultivating an exceptional company culture, our core values guide everything we do. We’re looking for passionate individuals who thrive in collaborative environments and are eager to drive impact. If you’re excited about solving complex challenges in bold, innovative ways, we’d love to connect with you.

What Are We Looking For?

Please note that under Federal & FedRAMP regulations, hiring for this role is limited to US citizens only. 

FedRamp Staff may be subject to customer or third-party background checks up to and including secret clearance if required by their role at SentinelOne. 

We are looking for a Staff Site Reliability Engineer (SRE) to join the Site Reliability Engineering team at SentinelOne. This organization’s mission is to keep our uptime promise to our customers by ensuring we meet our SLOs/SLAs, help our engineering teams ship software to our customers fast and with quality, and ensure our customers are successful. We are looking to add a Staff SRE who has experience running incident post-mortems, automating repetitive operational tasks, improving alerting accuracy, and building and refining processes that reduce downtime. You will work closely with cross-functional teams to lead reliability initiatives and bring best practices to our team.

We value good written communication skills, data-driven decisions, and a keen eye for continuous improvements. You’ll help simplify, have a passion for new ideas and know how to execute iteratively toward the final goal. We value candor and collaboration.

What Will You Do?
  • Lead and execute incident management for production issues, ensuring rapid recovery and root cause analysis
  • Improve and optimize the observability strategy…..
  • Collaborate with application engineering teams to design and implement monitoring solutions that enhance our alerting capabilities and reduce noise
  • Develop and refine SLOs, SLIs, and SLAs that align with business objectives and customer expectations
  • Conduct post-incident review, documenting findings and driving follow-up actions to prevent recurrence.
  • Mentor and support other engineers in incident response, troubleshooting techniques, and reliability best practices.
What skills and knowledge should you bring?
  • 8+ years of experience in Site Reliability Engineering, DevOps, or a related field in cloud native environments
  • Strong expertise in incident management processes and the ability to lead complex troubleshooting efforts under pressure.
  • Experience with Kubernetes and container orchestration
  • Experience with industry standard observability stacks (Prometheus, Grafana, ELK, OpenTelemetry, etc).
  • Proficiency in Python and Bash scripting to improve operational workflows and incident response
  • Familiarity with modern CI/CD pipelines and DevOps practices
  • Excellent communication skills with demonstrated ability to lead and mentor engineers in reliability practices.
Why us?

You will work on real-world problems and make an impact by protecting our customers from cyber threats. You will join a cutting-edge project and will be able to influence the architecture, design and structure of our core platform. You will tackle extraordinary challenges and work with the very BEST in the industry.

  • You will be joining a cutting-edge company, where you will tackle extraordinary challenges and work with the very best in the industry
  • Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
  • Unlimited PTO
  • Industry-leading gender-neutral parental leave
  • Paid company holidays
  • Paid sick time
  • Employee stock purchase program
  • Disability and life insurance
  • Employee assistance program
  • Gym membership reimbursement
  • Cell phone reimbursement
  • Numerous company-sponsored events including regular happy hours and team-building events

This U.S. role has a base pay range that will vary based on the location of the candidate. For some locations, a different pay range may apply.  If so, this range will be provided to you during the recruiting process. You can also reach out to the recruiter with any questions.

Base Salary Range
$148,000$185,000 USD

SentinelOne is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

SentinelOne participates in the E-Verify Program for all U.S. based roles. 

Top Skills

Bash
Elk
Grafana
Kubernetes
Opentelemetry
Prometheus
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Mountain View, CA
2,830 Employees
Year Founded: 2013

What We Do

SentinelOne is a leading provider of autonomous security solutions for endpoint, cloud, and identity environments. Founded in 2013 by a team of cybersecurity and defense experts, SentinelOne revolutionized endpoint protection with a new, AI-powered approach. Our platform unifies prevention, detection, response, remediation, and forensics in a single, easy-to-use solution.
Our endpoint security product is designed to protect your organization's endpoints from known and unknown threats, including malware, ransomware, and APTs. It uses artificial intelligence to continuously learn and adapt to new threats, providing real-time protection and automated response capabilities.

SentinelOne's approach to security is designed to help organizations secure their assets with speed and simplicity. We provide the ability to detect malicious behavior across multiple vectors, rapidly eliminate threats with fully-automated integrated response, and adapt their defenses against the most advanced cyberattacks.

We are recognized by Gartner in the Endpoint Protection Magic Quadrant as a Leader and have enterprise customers worldwide. Our customers include some of the world's largest companies in various industries such as finance, healthcare, government, and more.

At SentinelOne, we understand that cybersecurity is a constantly evolving field and that the threats facing organizations are becoming increasingly sophisticated. That's why we are committed to staying at the forefront of technology and innovation and providing our customers with the best protection against cyber threats.

We offer our customers a wide range of services, including threat hunting, incident response, and incident management. Our team of experts is available to assist you 24/7 and can help you respond to and manage cyber incidents quickly and effectively.

To learn more about our products and services, please visit our website at www.sentinelone.com or contact us to schedule a demo

Gallery

Gallery

Similar Jobs

ServiceNow Logo ServiceNow

Site Reliability Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Orlando, FL, USA
28000 Employees

NBCUniversal Logo NBCUniversal

Staff Software Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote or Hybrid
New York, NY, USA
68000 Employees
130K-180K Annually

ServiceNow Logo ServiceNow

Senior Site Reliability Engineer

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
San Diego, CA, USA
28000 Employees
111K-172K Annually

Alpaca Logo Alpaca

Site Reliability Engineer

Fintech • Information Technology
Easy Apply
Remote
2 Locations
132 Employees

Similar Companies Hiring

Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account