Site Reliability Engineer (SRE)

Posted 2 Days Ago
Be an Early Applicant
3 Locations
Mid level
Information Technology
The Role
The role involves enhancing observability, creating dashboards, developing alerts, managing on-call rotations, and refining the deployment process for reliability in a dynamic environment.
Summary Generated by Built In

About xAI

The xAI London team is a team of software engineers with a focus on large-scale, highly-reliable distributed systems. We work on many different levels of the stack ranging from build systems, to production backend infrastructure, and frontend development. For example, we built large parts of the Grok production stack. We focus on building high-quality software and aren’t afraid to delve into technically complex topics to solve problems the right way.

About the role

We’re looking for an experienced site reliability engineer (SRE) who can thrive in a dynamic start-up environment. The main responsibilities for this role are:

  1. Improving our observability by adding/adjusting metrics,
  2. Building easily parsable dashboards,
  3. Building reliable alerts,
  4. Designing and overseeing our on-call rotations,
  5. Improving our deployment process to increase reliability.

An ideal candidate meets at least the following requirements:

  1. Expert in at least one programming language that compiles to machine code such as Rust, C++, or Go. Rust or C++ experience is preferred,
  2. Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty,
  3. Expert knowledge of deployment technologies such as Pulumi or Terraform,
  4. Expert knowledge of Kubernetes.

Location

The role is based in our London office close to Piccadilly Circus underground station. We usually work from the office 5 days a week but allow for work-from-home days when required. Candidates must be willing to attend late meetings at least twice a week to coordinate with the rest of our team, which is based in California. This role includes semi-regular business trips to California.

Interview process

After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15 minute interview (“phone interview”) during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews:

  1. Coding interview in Rust, C++ or Go,
  2. Monitoring & deployment design interview,
  3. Distributed systems design interview,
  4. Meet the wider team and give a 20 minute presentation about the most difficult technical problems you have solved.

Our goal is to finish the process within one week. We don’t rely on recruiters for assessments. Every application is reviewed by a member of our technical team. All interviews will be conducted via Google Meet.

Benefits

  • Competitive cash-based compensation
  • xAI equity
  • Private health and dental insurance
  • Unlimited time off subject to prior approval

California Consumer Privacy Act (CCPA) Notice

Top Skills

C++
Go
Rust
The Company
96 Employees
Remote Workplace

What We Do

Understand the Universe

Similar Jobs

Cisco Meraki Logo Cisco Meraki

Lead Site Reliability Engineer - Remote

Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI
Easy Apply
Remote
San Francisco, CA, USA
3000 Employees
173K-242K Annually

Cisco Meraki Logo Cisco Meraki

Lead Site Reliability Engineer , Cloud Platform - Remote

Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI
Easy Apply
Remote
San Francisco, CA, USA
3000 Employees
173K-242K Annually

Atlassian Logo Atlassian

Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees

Atlassian Logo Atlassian

Principal Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees
167K-269K Annually

Similar Companies Hiring

Silverfort Thumbnail
Security • Sales • Information Technology • Cybersecurity • Automation
GB
357 Employees
Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees
InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account