AI Infrastructure Engineer

Reposted 6 Days Ago
Be an Early Applicant
Las Vegas, NV
In-Office
Senior level
Artificial Intelligence • Cloud • Software
The Role
The AI Infrastructure Engineer will design and maintain AI compute clusters, optimize performance, and ensure system reliability for TensorWave's cloud services, collaborating closely with IT and AI development teams.
Summary Generated by Built In

At TensorWave, we're leading the charge in AI compute, building a versatile cloud platform that's driving the next generation of AI innovation. We're focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what's possible in the AI landscape.

About the Role:

We are looking for an AI Infrastructure Engineer with a passion for high-performance computing and distributed systems. The ideal candidate will support our vision by developing and managing the compute infrastructure that underpins our innovative AI cloud services. This role involves building and maintaining robust AI clusters, ensuring optimal performance and reliability for our clients' most demanding workloads.

Responsibilities:

  • Collaborate with a dynamic IT team to design, deploy, and maintain high-performance AI compute clusters supporting both AMD and NVIDIA GPU technologies.

  • Lead initiatives to optimize cluster performance, resource utilization, and job scheduling to maximize efficiency across diverse AI workloads.

  • Ensure system reliability, performance, and security for cloud services, implementing monitoring solutions and automated recovery systems.

  • Work closely with the AI development team to align infrastructure capabilities with the evolving needs of TensorWave's cloud platform.

  • Troubleshoot and resolve complex infrastructure issues across Linux systems, networking, and distributed computing environments, providing expert guidance to maintain high service levels.

  • Implement and maintain configuration management, deployment automation, and infrastructure-as-code practices.

Essential Skills & Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or related field.

  • At least 5 years of relevant experience in infrastructure engineering, with a focus on supporting high-performance computing (HPC) and AI applications.

  • Expert-level Linux system administration skills across multiple distributions.

  • Strong experience with clustered computing environments (GPU, CPU, or hybrid clusters).

  • Solid understanding of networking fundamentals, including TCP/IP, routing protocols, and high-speed interconnects.

  • Experience with container technologies (Docker, Kubernetes), job schedulers (Slurm, PBS), and configuration management tools.

  • Familiarity with AMD and NVIDIA GPU ecosystems, CUDA, ROCm, and their infrastructure requirements.

  • Exceptional debugging and problem-solving abilities with a methodical approach to complex system issues.

  • Demonstrated ability to learn new technologies quickly and adapt to rapidly evolving infrastructure needs.

We're looking for resilient, adaptable people to join our team—folks who enjoy collaborating and tackling tough challenges. We're all about offering real opportunities for growth, letting you dive into complex problems and make a meaningful impact through creative solutions. If you're a driven contributor, we encourage you to explore opportunities to make an impact at TensorWave. Join us as we redefine the possibilities of intelligent computing.

What We Bring:

In addition to a competitive salary, we offer a variety of benefits to support your needs, including:

  • Stock Options

  • 100% paid Medical, Dental, and Vision insurance

  • Life and Voluntary Supplemental Insurance

  • Short Term Disability Insurance

  • Flexible Spending Account

  • 401(k)

  • Flexible PTO

  • Paid Holidays

  • Parental Leave

  • Mental Health Benefits through Spring Health

Top Skills

Amd
Cuda
Docker
Kubernetes
Linux
Nvidia
Pbs
Rocm
Slurm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Las Vegas, Nevada
56 Employees

What We Do

TensorWave is a cutting-edge cloud platform designed specifically for AI workloads. Offering AMD MI300X accelerators and a best-in-class inference engine, TensorWave is a top-choice for training, fine-tuning, and inference. Visit tensorwave.com to learn more.
Send us a message to try it for free.

Similar Jobs

PwC Logo PwC

Managed Services - IBM MDM (Master Data Management) Developer - Senior Associate

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
37 Locations
77K-202K Annually
Easy Apply
Hybrid
6 Locations
112K-161K

Anduril Logo Anduril

Project Engineer

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
In-Office
Las Vegas, NV, USA
165K-218K Annually

Pfizer Logo Pfizer

Director, HTA, Value & Evidence (HV&E), Solid Tumors

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Hybrid
29 Locations
170K-283K Annually

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account