Site Reliability Engineer

Reposted 19 Days Ago
Hiring Remotely in USA
Remote
Senior level
Artificial Intelligence • Cloud • Software
The Role
The Senior SRE Engineer will design, build, and maintain resilient infrastructure systems, manage infrastructure-as-code, and write tooling in various languages.
Summary Generated by Built In

At TensorWave, we’re leading the charge in AI compute, building a versatile cloud platform that’s driving the next generation of AI innovation. We’re focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what’s possible in the AI landscape.

About the Role:

We're looking for a Senior SRE Engineer with a strong software engineering background to build and maintain highly scalable, secure, and resilient infrastructure. You’ll play a critical role in designing low-level systems, automating infrastructure with modern tooling, and ensuring platform reliability. This role is ideal for someone who’s comfortable working at the intersection of systems programming and DevOps—writing code in Go, Javascript, Rust, C, or Zig while also managing infrastructure with NixOS, Kubernetes, and Terraform.

Responsibilities:
  • Design, build, and maintain infrastructure systems using Linux and NixOS.

  • Manage infrastructure-as-code with Terraform to provision and scale resources.

  • Architect and operate Kubernetes clusters with a focus on performance, security, and automation.

  • Write high-performance tooling and internal utilities in Go, Javascript, Rust.

  • Develop and maintain CI/CD pipelines for infrastructure and code deployments.

  • Monitor system performance, resolve issues, and improve reliability through observability tooling.

  • Collaborate closely with engineering teams to support deployment strategies and development workflows.

Essential Skills & Qualifications:
  • 5+ years in DevOps, Site Reliability, or Infrastructure Engineering roles.

  • Deep experience with Linux systems and configuration management (preferably NixOS).

  • Hands-on experience with Terraform, Kubernetes, and containerized environments.

  • Proficiency in one or more low-level languages: Rust, C, Zig, Javascript, and Go.

  • Strong understanding of systems programming, performance tuning, and operating system internals.

  • Familiarity with CI/CD practices and infrastructure monitoring/alerting tools.

We’re looking for resilient, adaptable people to join our team—folks who enjoy collaborating and tackling tough challenges. We’re all about offering real opportunities for growth, letting you dive into complex problems and make a meaningful impact through creative solutions. If you're a driven contributor, we encourage you to explore opportunities to make an impact at TensorWave. Join us as we redefine the possibilities of intelligent computing.

What We Bring:
  • Stock Options

  • 100% paid Medical, Dental, and Vision insurance

  • Life and Voluntary Supplemental Insurance

  • Short Term Disability Insurance

  • Flexible Spending Account

  • 401(k)

  • Flexible PTO

  • Paid Holidays

  • Parental Leave

  • Mental Health Benefits through Spring Health

Top Skills

C
Go
JavaScript
Kubernetes
Nixos
Rust
Terraform
Zig
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Las Vegas, Nevada
56 Employees

What We Do

TensorWave is a cutting-edge cloud platform designed specifically for AI workloads. Offering AMD MI300X accelerators and a best-in-class inference engine, TensorWave is a top-choice for training, fine-tuning, and inference. Visit tensorwave.com to learn more.
Send us a message to try it for free.

Similar Jobs

Iodine Software Logo Iodine Software

Site Reliability Engineer

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Software
Remote or Hybrid
USA
250 Employees

Close Logo Close

Site Reliability Engineer

Sales • Software • Automation
Remote
USA
100 Employees
140K-210K Annually

Zapier Logo Zapier

Site Reliability Engineer

Artificial Intelligence • Productivity • Software • Automation
Remote
2 Locations
760 Employees

NBCUniversal Logo NBCUniversal

Staff Software Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote or Hybrid
New York, NY, USA
68000 Employees
130K-180K Annually

Similar Companies Hiring

Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account