SRE Manager

Sorry, this job was removed at 11:17 a.m. (CST) on Tuesday, Jan 07, 2025
Be an Early Applicant
San Francisco, CA
Hybrid
Cloud • Greentech • Other • Energy
We're on a mission to eliminate flaring and emissions in the oil field.
The Role

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company. We’re pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. Crusoe is redefining AI cloud infrastructure, with a mission to align the future of computing with the future of the climate. Our AI platform is recognized as the "gold standard" for reliability and performance. Our data centers are optimized for AI workloads and are powered by clean, renewable energy.

Be part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.

About This Role:

As the SRE Manager, you will lead the creation and operation of a 24/7 Site Reliability Engineering team. Your primary goal is to ensure continuous availability and optimal performance of our cloud infrastructure, providing customers with uninterrupted access to their GPUs. You will design and implement advanced alerting and monitoring systems, manage incident response, and drive system improvements. Collaborating with remote teams across time zones, you will prioritize projects and streamline workflows to achieve rapid results. This role offers the opportunity to significantly impact the reliability of our cutting-edge cloud services and drive the success of our team.

A Day in the Life:

As a Site Reliability Engineering Manager at Crusoe Energy Systems, your day is a blend of people management and operational oversight. Your morning starts with one-on-one meetings and team stand-ups, focusing on guidance, support, and aligning daily goals. You'll spend about 40% of your time on team development, strategic planning, and fostering a collaborative environment.

The remaining 60% is dedicated to operational tasks, such as reviewing performance metrics, overseeing incident responses, and driving automation projects. You ensure high SLIs and SLOs while resolving technical issues and optimizing processes. By day's end, you review project progress and plan the next steps, maintaining a high-performing, customer-centric SRE organization.

You Will Thrive In This Role If:

  • You have at least 3 years of experience with building and managing a 24/7 technical support team in a cloud operations environment.

  • You have a strong background in Linux, containerization technologies, and Kubernetes. You understand virtualization and cloud computing concepts.

  • You have worked with Prometheus, Victoria Metrics, exporters, against bare-metal endpoints

  • You have some experience with Infrastructure as it relates to Data Center Operations.

  • You’re interested in playing a key role in talent acquisition and retention. This includes diligent performance management and coaching/developing your team according to their individual needs.

  • You’ve developed training programs for new hires and ongoing professional development opportunities for your team members.

  • You like the idea of serving as a technical escalation point and ensuring the highest quality of support. You have experience with Implementing quality assurance measures.

  • You have supported, monitored, and handled Service Level Agreements (SLAs) for a variety of categories that enable an end customer

  • You have used technologies such as RabbitMQ, Kafka, Temporal, NATs

  • You can produce solid solutions in Golang or Python

  • You’re strategic about tracking and reporting KPIs, with a focus on team performance and customer satisfaction. You’ve played a big part in the strategic planning for a team’s growth and scalability.

  • You like the idea of working with other departments to align on technical escalations, live incidents, customer needs, and feedback.

  • Leadership & Communication: Demonstrated leadership ability and excellent communication skills.

  • Problem-Solving & Adaptability: Robust problem-solving skills and adaptability in a fast-paced environment.

  • Project Management: Experience with project management tools and methodologies.

  • Embody the Company values

Benefits: 

  • Hybrid work schedule

  • Competitive Paid Time Off

  • Industry competitive pay

  • Retirement benefits

  • Healthcare benefits including Medical, Dental, and Vision

  • Short and Long-Term Disability Insurance

  • Life Insurance

  • Paid Parental Leave

  • Subscription to Calm App

Compensation Range

Compensation will be paid up to $210,000 base salary. Restricted Stock Units are included in all offers. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Similar Jobs

Capital One Logo Capital One

Site Reliability Engineer

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
4 Locations
55000 Employees
209K-286K Annually
In-Office
4 Locations
2359 Employees
196K-248K Annually

Anduril Logo Anduril

Supplier Quality Engineer, Intelligence Systems

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Santa Ana, CA, USA
6000 Employees
113K-149K Annually

Anduril Logo Anduril

Senior Supplier Quality Engineer, Intelligence Systems

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Santa Ana, CA, USA
6000 Employees
129K-185K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Denver, CO
667 Employees
Year Founded: 2018

What We Do

Crusoe is on a mission to eliminate routine flaring of natural gas and reduce the cost of cloud computing. We are passionate about our goals to help the oil industry operate more efficiently, achieve better relationships with communities and regulators, and improve environmental performance. Crusoe repurposes otherwise wasted energy to fuel the growing demand for computational power in the expanding digital economy.

Why Work With Us

Crusoe has five core values with each value grounded in a set of actionable practices. The combination of philosophical values and actionable practices creates a decision-making framework for each employee to achieve success at Crusoe.

Gallery

Gallery

Similar Companies Hiring

Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account