VAST Data

Senior Solutions Engineer, AI Infrastructure

Reposted 18 Days Ago

Hiring Remotely in United States

Remote or Hybrid

Senior level

Artificial Intelligence • Software

The Role

The Senior Solutions Engineer will design and implement infrastructure for AI and HPC workloads, engage with customers, and lead technical discovery and architecture design.

Summary Generated by Built In

Description

We're looking for a deeply technical Solutions Architect to help customers design, evaluate, and deploy infrastructure for large-scale AI, HPC, analytics, and data-intensive workloads.

This is a customer-facing technical role for someone who has lived inside production infrastructure. You may have been a platform engineer, infrastructure engineer, SRE, MLOps engineer, AI infrastructure engineer, storage engineer, cloud engineer, or HPC systems engineer. What matters most is that you have built, operated, or architected real systems, and can bring that credibility into customer conversations.

Our customers are building infrastructure at serious scale: GPU clusters, high-performance storage systems, Kubernetes platforms, distributed training environments, inference platforms, data pipelines, lakehouses, and large enterprise systems. You'll help them reason about architectures involving 10,000+ GPUs, 100PB+ of storage, high-performance networking, distributed filesystems, orchestration layers, and demanding production workloads.

You'll own technical discovery, architecture design, PoC planning, competitive positioning, and customer technical strategy. You'll work from the first whiteboard session through evaluation, deployment planning, and production success. You'll also partner closely with product and engineering teams to bring field feedback into the roadmap.

We're looking for someone who can go deep technically, communicate clearly, operate without a rigid playbook, and translate complex infrastructure into customer outcomes.

Responsibilities

Lead technical discovery with customers across infrastructure, platform, ML, data, and executive stakeholders.
Design architectures for large-scale AI, HPC, analytics, and enterprise data workloads.
Help customers evaluate infrastructure involving GPUs, storage, networking, orchestration, and data movement.
Translate complex technical requirements into clear solution designs, reference architectures, and deployment guidance.
Debug customer issues across Linux, storage, networking, Kubernetes, schedulers, GPUs, and application workloads.
Build technical assets, runbooks, and field guidance for repeatable customer engagements.
Partner with product and engineering to communicate customer requirements, gaps, and roadmap opportunities.
Help customers move from architecture design to production deployment.

Requirements

8 to 12+ years of technical experience, with significant hands-on infrastructure experience.
Experience building, operating, or architecting production platform infrastructure.
Strong understanding of Linux kernel implementation details, distributed systems including PAXOS and raft, storage implementations details like NAND or write amplification, networking store/forward, load balancing designs, and production operations.
Experience with one or more of: GPU infrastructure, large scale HPC systems, Kubernetes platforms from scratch, MLOps, storage systems, cloud infrastructure, data platforms, or large-scale enterprise infrastructure.
Ability to communicate credibly with engineers, architects, technical executives, and business stakeholders.
Strong discovery, problem-solving, and systems debugging skills.
Comfort operating in ambiguous, fast-moving environments.
Interest in customer-facing technical work, solution design, and business outcomes.

Preferred Experience

Experience with large-scale GPU clusters, distributed training, inference infrastructure, or AI platforms.
Experience with petabyte-scale storage or high-performance data systems.
Experience with Kubernetes, Slurm, Ray, Spark, or other orchestration / scheduling systems.
Domain Expertise with one or more of these - Lustre, Ceph, Weka, BeeGFS, GPFS, VAST, object storage, or distributed filesystems.
Experience with large-scale InfiniBand, RoCE, RDMA, high-performance Ethernet, or NVIDIA/Mellanox networking.
Direct Experience with CUDA, NCCL, DCGM, GPUDirect, checkpointing, dataset staging, or model-serving infrastructure.
Experience across multiple industries or customer environments.

Skills Required

8 to 12+ years of technical experience
Significant hands-on infrastructure experience
Experience with GPU infrastructure or large scale HPC systems
Ability to communicate with engineers and business stakeholders
Strong discovery and systems debugging skills

VAST Data Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about VAST Data and has not been reviewed or approved by VAST Data.

Healthcare Strength — Medical, dental, vision and life insurance are included, with several elements identified as employer-provided. Coverage aligns with what is commonly offered by high‑growth tech companies.
Leave & Time Off Breadth — Time off includes generous or unlimited PTO alongside paid sick days and paid holidays, coupled with remote or work‑from‑home flexibility. This breadth supports taking time away when needed.
Equity Value & Accessibility — Company equity is a standard part of offers with a typical four‑year vest, and professional development support is available. Equity is positioned as a meaningful component of total compensation at a growth‑stage company.

Learn more about VAST Data's Compensation & Benefits →

VAST Data Insights

What's It Like to Work at VAST Data? VAST Data Culture & Values VAST Data Career Growth & Development What's the Work-Life Balance Like at VAST Data? VAST Data Leadership & Management VAST Data Company Growth, Stability & Outlook

View all jobs at VAST Data

View VAST Data Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: New York, NY

848 Employees

Year Founded: 2016

What We Do

Meet the data platform company for the AI era. Accelerating time-to-insight for workload-intensive applications, the VAST Data Platform delivers scalable performance, radically simple data management and enhanced productivity for the AI-powered world. Launched in 2019, VAST is the fastest-selling data infrastructure startup in history.