DevOps Systems Engineer

Posted 8 Days Ago
Las Vegas, NV
In-Office
Senior level
Artificial Intelligence • Cloud • Software
The Role
As a DevOps Systems Engineer, you will manage enterprise hardware, automate infrastructure processes, oversee networking, and ensure system reliability across data centers.
Summary Generated by Built In

At TensorWave, we’re leading the charge in AI compute, building a versatile cloud platform that’s driving the next generation of AI innovation. We’re focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what’s possible in the AI landscape.


About the Role

We are seeking a highly skilled DevOps & Infrastructure Management Engineer to join our growing infrastructure team. This role is ideal for someone who thrives in hardware-centric environments, enjoys hands-on datacenter and system administration work, and can build reliable automation around large-scale infrastructure. You will be responsible for managing enterprise hardware, monitoring systems, network operations, infrastructure automation, and supporting our compute clusters across multiple data centers.

This role touches every layer of modern infrastructure—from bare metal provisioning, to OS and Kubernetes management, to monitoring and troubleshooting hardware. If you are detail-oriented, resourceful, and comfortable working with both low-level hardware systems and higher-level DevOps tooling, we’d love to talk.

Key ResponsibilitiesHardware & Infrastructure Management
  • Manage and maintain enterprise-grade server hardware and infrastructure components.

  • Utilize out-of-band management systems (iLO, iDRAC, IPMI, Redfish, etc.) for remote operations.

  • Use automated hardware management tools (BMC/Redfish-based) to streamline provisioning and maintenance.

  • Perform hardware diagnostics and troubleshooting (CPU, memory, disks, PSUs, NICs, etc.).

  • Handle vendor interactions, including RMAs, part replacements, and inventory tracking.

  • Oversee datacenter hardware operations, including racking, cabling, PDU installation, and physical layout.

Datacenter & DCIM
  • Use Data Center Infrastructure Management (DCIM) tools for inventory, capacity planning, and environmental tracking.

  • Manage power delivery and consumption across racks and nodes.

  • Configure and monitor managed PDU systems for power cycling, monitoring, and alerts.

  • Collaborate with colocation providers on connectivity, power, security, and maintenance tasks.

Monitoring & Observability
  • Build and maintain infrastructure monitoring and alerting using tools such as Prometheus/Grafana, SNMP, Nagios, CheckMK, or similar platforms.

  • Implement automated alerting for hardware health, network status, power issues, and service-level metrics.

  • Create dashboards to give internal teams visibility into system performance and reliability.

Network Operations
  • Manage and configure firewalls, routing, and network segmentation.

  • Configure and troubleshoot VPN technologies (IPsec, OpenVPN, WireGuard).

  • Oversee subnetting, IP address allocation, and network architecture planning.

  • Configure managed switches, VLANs, port settings, and trunking.

  • Manage NAT, port forwarding, and related gateway/edge network configurations.

System Administration (Linux)
  • Install, configure, and manage Linux servers (Ubuntu/Debian preferred).

  • Perform system-level troubleshooting (boot issues, login problems, service failures).

  • Manage networking configuration (static IPs, DHCP).

  • Configure and maintain filesystems: partitioning, MD RAID, ext4/XFS, LVM, resizing/growing volumes.

  • Implement secure access using public key authentication and proper SSH hardening.

  • Manage certificates for internal systems, including issuance, revocation, HTTPS installation, and rotation.

  • Handle basic BIOS configuration relevant to bare metal provisioning or system bring-up.

Bare Metal Provisioning
  • Deploy and manage hardware provisioning tools such as MAAS, Foreman, or similar systems.

  • Configure and troubleshoot network boot mechanisms (PXE, UEFI Boot, HTTP Boot).

  • Automate provisioning pipelines to rapidly bring new nodes online.

Containerization & Orchestration
  • Work with Kubernetes clusters at a foundational level (cluster access, basic resource troubleshooting).

  • Deploy workloads using Helm charts and maintain cluster application lifecycle.

  • Assist with cluster scaling, node replacements, and security hardening.

Automation & Scripting
  • Write shell scripts (bash) for automation of system tasks, monitoring, or provisioning.

  • Use CLI tooling such as jq, sed, awk, grep, and rsync.

  • Optionally automate workflows using languages like Python, Go, PHP, or Perl.

Required Qualifications
  • Proven experience managing enterprise-grade hardware at scale.

  • Strong understanding of out-of-band management systems (IPMI/BMC/Redfish).

  • Hands-on expertise with monitoring systems (Prometheus, Grafana, SNMP, Nagios, CheckMK, or similar).

  • Solid knowledge of network administration, including firewalls, routing, VPNs, NAT, and managed switches.

  • Linux system administration experience (installation, configuration, troubleshooting).

  • Experience with filesystems, RAID, partitioning, and general storage management.

  • Familiarity with certificate management, key-based auth, and basic cryptographic functions.

  • Experience with bare metal provisioning (MAAS, Foreman, or similar).

  • Understanding of PXE/UEFI/HTTP boot systems.

  • Ability to write functional, maintainable bash scripts for automation.

Nice to Have
  • Experience with Kubernetes beyond the basics (operators, cluster scaling, CRDs).

  • Experience with Helm chart customization.

  • Familiarity with automation languages such as Python, Go, PHP, or Perl.

  • Previous datacenter operations or colocation management experience.

  • Exposure to high-availability or distributed compute environments.

  • Knowledge of infrastructure security and hardening practices.

What We Bring
  • Stock Options

  • 100% paid Medical, Dental, and Vision insurance

  • Life and Voluntary Supplemental Insurance

  • Short Term Disability Insurance

  • Flexible Spending Account

  • 401(k)

  • Flexible PTO

  • Paid Holidays

  • Parental Leave

  • Mental Health Benefits through Spring Health

Top Skills

Bash
Bmc
Checkmk
Foreman
Go
Grafana
Ipmi
Kubernetes
Linux
Maas
Nagios
Perl
PHP
Prometheus
Python
Redfish
Snmp
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Las Vegas, Nevada
56 Employees

What We Do

TensorWave is a cutting-edge cloud platform designed specifically for AI workloads. Offering AMD MI300X accelerators and a best-in-class inference engine, TensorWave is a top-choice for training, fine-tuning, and inference. Visit tensorwave.com to learn more.
Send us a message to try it for free.

Similar Jobs

Pfizer Logo Pfizer

Global Medical Director, Dermatology, Ritlecitinib, non-MD

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Remote or Hybrid
9 Locations
121990 Employees
170K-283K Annually

Pfizer Logo Pfizer

Global Medical Director, Dermatology, Ritlecitinib, MD

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Remote or Hybrid
9 Locations
121990 Employees
220K-366K Annually

Cox Enterprises Logo Cox Enterprises

Client Growth Executive (Cox Business)

Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
Hybrid
Las Vegas, NV, USA
50000 Employees
47K-85K Annually

Toast Logo Toast

Account Executive

Cloud • Fintech • Food • Information Technology • Software • Hospitality
In-Office
Las Vegas, NV, USA
5000 Employees
129K-206K Annually

Similar Companies Hiring

PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account