Site Reliability Engineer

Reposted 19 Days Ago
Hiring Remotely in United States
Remote
Mid level
Security • Software • Analytics
The Role
Design, operate, and automate scalable, secure infrastructure for Axiom Cloud. Define SLOs, plan disaster recovery and capacity, tune performance, improve deployment practices, build reliability tooling, respond to incidents, and promote monitoring and observability across teams.
Summary Generated by Built In
Site Reliability Engineer (SRE)

Global (UTC-3 preferred)

Axiom’s mission is to empower developers to get the best insights into their data, as fast as possible. We are a remote-first and globally distributed team building a cloud native, serverless data analytics platform. Axiom completely changes the way in which developers and organizations think about their data: they can now send unlimited data with cost-effective storage and lightning-fast querying.

As a Site Reliability Engineer at Axiom, you will be pivotal in upholding our promise of superior reliability and performance to our customers. Collaborating with backend engineers and product teams, you will emphasize creating and operating scalable and reliable systems. Axiom's emphasis on SREs revolves around automating, measuring, and continuously improving the reliability and efficiency of our systems.

Your primary responsibilities:

  • Engineer and maintain a robust, secure, and scalable infrastructure for Axiom Cloud.

  • Collaborate with engineering teams to define and refine service level objectives.

  • Contribute to disaster recovery planning, capacity engineering, performance analysis, and system tuning.

  • Foster best practices for code deployments, aiding in the education of the broader development team.

  • Roll out tooling and solutions that improve system reliability and reduce manual toil.

  • Address and remediate service incidents and contribute to postmortems and root cause analyses.

  • Foster a culture of monitoring, alerting, and observability across the organization.

You are an ideal candidate if:

  • You have over two years of experience in a reliability-focused engineering environment.

  • You are passionate about system reliability, latency, performance, and efficiency.

  • You're familiar with AWS tools and technologies.

  • You have hands-on experience with Docker, Kubernetes, and Amazon EKS.

  • You understand infrastructure-as-code tools such as Terraform/Pulumi.

  • You possess strong networking knowledge and are adept with Linux systems.

  • Familiarity with CI platforms like GitHub Actions, GitLab, CircleCI or others.

  • You can efficiently use LLMs.

  • Experience with monitoring, alerting, and observability tools.

Bonus skills and experiences:

  • Proven track record of maintaining production systems at scale.

  • A software engineering background with expertise in Golang.

We provide:
  • Flexibility to work from wherever suits you best. For this role, we are considering individuals based in the timezone range UTC-5 (EST) to UTC +2.

  • Budget to build your home office set-up.

  • Monthly budget to support mental and physical wellness.

  • A focus day each week with no meetings, Slack or Zoom. Uninterrupted time to focus on work.

  • Uncapped vacation to unplug and rejuvenate.

  • Generous and flexible family leave for everyone.

Skills Required

  • Over two years of experience in a reliability-focused engineering environment.
  • Passion for system reliability, latency, performance, and efficiency.
  • Familiarity with AWS tools and technologies.
  • Hands-on experience with Docker, Kubernetes, and Amazon EKS.
  • Experience with infrastructure-as-code tools such as Terraform or Pulumi.
  • Strong networking knowledge and proficiency with Linux systems.
  • Familiarity with CI platforms (e.g., GitHub Actions, GitLab, CircleCI).
  • Ability to efficiently use large language models (LLMs).
  • Experience with monitoring, alerting, and observability tools.
  • Located within the UTC-5 to UTC+2 timezone range.
  • Proven track record of maintaining production systems at scale.
  • Software engineering background with expertise in Golang.
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
59 Employees
Year Founded: 2020

What We Do

Axiom captures 100% of your data for every possible need: o11y, security, analytics, and new insights.

Similar Jobs

Optum Logo Optum

Principal Software Engineer

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office or Remote
Eden Prairie, MN, USA
160000 Employees
135K-231K Annually

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
Crystal City, VA, USA
8697 Employees
140K-200K Annually

Sprinter Health Logo Sprinter Health

Site Reliability Engineer

Artificial Intelligence • Healthtech • Logistics • Social Impact • Software • Telehealth
Remote or Hybrid
2 Locations
500 Employees
160K-235K Annually

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
San Jose, CA, USA
8697 Employees
193K-275K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Hardware • Other • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account