Infrastructure Operations Engineer

Reposted 5 Days Ago
Las Vegas, NV, USA
In-Office
Senior level
Artificial Intelligence • Cloud • Software
The Role
As a DevOps Systems Engineer, you will manage enterprise hardware, automate infrastructure processes, oversee networking, and ensure system reliability across data centers.
Summary Generated by Built In

Our mission at Tensorwave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation.


About the role

We are seeking a Infrastructure Operations Engineer to join our growing infrastructure team.

This role is ideal for someone who thrives in hardware-centric environments, enjoys hands-on datacenter and system administration work, and can build reliable automation around large-scale infrastructure.

You will be responsible for managing enterprise hardware, monitoring systems, network operations, infrastructure automation, and supporting our compute clusters across multiple data centers.

This role touches every layer of modern infrastructure - from bare metal provisioning, to OS and Kubernetes management, to monitoring and troubleshooting hardware.

If you are detail-oriented, resourceful, and comfortable working with both low-level hardware systems and higher-level DevOps tooling, we’d love to talk.

Responsibilities

  • Manage and maintain enterprise-grade server hardware including diagnostics and break/fix for CPUs, memory, disks, PSUs, and NICs

  • Operate out-of-band management systems for remote access and recovery - iLO, iDRAC, IPMI, Redfish

  • Design, build, and maintain infrastructure monitoring and alerting - Prometheus, Grafana, SNMP, or similar

  • Administer and troubleshoot Linux systems - OS install, boot issues, services, networking, filesystems, and access controls

  • Own bare-metal provisioning workflows - PXE/UEFI boot and automated node bring-up using MAAS, Foreman, or equivalents

  • Build and maintain infrastructure automation - shell scripting and CLI tooling to improve reliability and scale operations

  • Manage core networking - subnets, IP address management, VLANs, routing, NAT, and firewall configuration

  • Configure and support secure connectivity such as VPNs - IPsec, WireGuard, OpenVPN

  • Support Kubernetes clusters at the infrastructure layer - node lifecycle, access, basic troubleshooting, and scaling

  • Partner with internal teams to ensure compute clusters remain reliable, secure, and scalable across multiple data centers

Required Experience

  • Bachelor of Science in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience

  • Proven experience managing enterprise-grade hardware at scale

  • Expertise with automation languages such as Python, Go, PHP, or Perl

  • Strong understanding of out-of-band management systems - IPMI, BMC, Redfish

  • Hands-on expertise with monitoring systems - Prometheus, Grafana, SNMP, Nagios, CheckMK, or similar

  • Solid knowledge of network administration - firewalls, routing, VPNs, NAT, and managed switches

  • Linux system administration experience - installation, configuration, troubleshooting

  • Experience with filesystems - RAID, partitioning, and general storage management.

  • Familiarity with certificate management - key-based auth, and cryptographic functions.

  • Experience with bare metal provisioning - MAAS, Foreman, or similar

  • Understanding of PXE/UEFI/HTTP boot systems

  • Ability to write functional, maintainable bash scripts for automation

Nice to Have

  • Experience with Kubernetes - operators, cluster scaling, CRDs

  • Experience with Helm chart customization

  • Exposure to high-availability or distributed compute environments

  • Knowledge of infrastructure security and hardening practices

What We Bring

  • Mission driven company

  • Competitive Salary

  • Stock Options

  • 100% paid Medical, Dental, and Vision insurance

  • Flexible PTO

  • Paid Holidays

  • 401(k)

  • Parental Leave

  • Flexible Spending Account

  • Short Term Disability Insurance

  • Life and Voluntary Supplemental Insurance

  • Mental Health Benefits through Spring Health

We’re looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future.

Tensorwave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, national origin, or veteran status.

Top Skills

Bash
Bmc
Checkmk
Foreman
Go
Grafana
Ipmi
Kubernetes
Linux
Maas
Nagios
Perl
PHP
Prometheus
Python
Redfish
Snmp
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Las Vegas, Nevada
56 Employees

What We Do

TensorWave is a cutting-edge cloud platform designed specifically for AI workloads. Offering AMD MI300X accelerators and a best-in-class inference engine, TensorWave is a top-choice for training, fine-tuning, and inference. Visit tensorwave.com to learn more. Send us a message to try it for free.

Similar Jobs

Babylist Logo Babylist

Senior Product Designer

eCommerce • Healthtech • Kids + Family • Retail • Social Media
Easy Apply
Remote or Hybrid
2 Locations
300 Employees
178K-214K Annually

HiBob Logo HiBob

Business Development Representative

HR Tech • Information Technology • Professional Services • Sales • Software
Remote or Hybrid
United States
1350 Employees
64K-64K Annually
Hybrid
2 Locations
205000 Employees

Wells Fargo Logo Wells Fargo

Personal Banker South Meadows

Fintech • Financial Services
Hybrid
Reno, NV, USA
205000 Employees
21-28 Hourly

Similar Companies Hiring

Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account