AI Inference Infrastructure Software Engineer (Kubernetes / Cloud)

Posted 4 Days Ago
Be an Early Applicant
Seattle, WA, USA
Hybrid
Mid level
Artificial Intelligence • Hardware • Machine Learning • Generative AI
The Role
Design, operate, and evolve ElastixAI's Kubernetes and multi-cloud inference infrastructure. Run accelerated ML workloads at scale, build deployment and automation tooling, harden AWS/GCP/on-prem systems, partner with ML/runtime teams to productionize models, optimize costs and reliability, and participate in on-call rotation.
Summary Generated by Built In

Location: Seattle, WA (Hybrid - 3 days/week in office)

About ElastixAI:

ElastixAI is an early-stage Software startup on a mission to reinvent AI inference infrastructure from the ground up. We're building a next-generation inference platform that delivers unprecedented efficiency by tightly integrating machine learning, software stack, and custom hardware. Our philosophy is simple: the best performance comes from holistic co-design, where every layer, from model architecture to kernels to silicon, works in harmony.

If you're excited about pushing AI performance to physical limits and shaping the future of large-scale inference, we'd love to meet you.

Role Summary:

We're looking for an Inference Infrastructure Software Engineer to own and evolve the cloud and Kubernetes backbone behind our Token-as-a-Service platform. You'll be the connective tissue between our inference engine and the production environments where customers actually consume tokens — making sure our accelerated workloads run reliably, scale predictably, and deploy seamlessly across managed and self-hosted clusters.

This is a hands-on role with broad surface area. You'll touch everything from cluster bring-up, automating the software releases, and AI Accelerator scheduling to service reliability and cost optimization, working closely with our ML, runtime, and hardware teams to expose the full performance of our co-designed stack to end users.

Key Responsibilities:

  • Build, operate, and evolve ElastixAI's Kubernetes infrastructure powering our Token-as-a-Service capability.

  • Run accelerated inference workloads in production at scale, with strong SLAs around latency, throughput, and availability.

  • Manage and harden our AWS, GCP, and on-prem infrastructure, including networking, storage, IAM, and observability layers tied to our services.

  • Develop tooling and automation in Python, Bash, Rust, and Go to streamline deployments, rollouts, autoscaling, and incident response.

  • Partner with the ML and runtime teams to productionize new inference capabilities, model deployments, and routing strategies.

  • Contribute to capacity planning, cost optimization, and reliability engineering across multi-cloud and self-hosted environments.

  • Help define the platform roadmap as we scale from early customers to broad production deployments.

  • Be a member of the Elastix On-Call rotation

Required Qualifications:

  • Minimum BS in Computer Science, Software Engineering, or a related field.

  • 3–5 years of hands-on Kubernetes experience, including EKS, GKE, and/or self-hosted clusters.

  • 2–3 years of production experience operating workloads on AWS or GCP.

  • Proven track record running ML or inference services at scale on Kubernetes in production.

  • Strong experience running accelerated workloads in Kubernetes, scheduling, drivers, device plugins, MIG, networking, and storage considerations.

  • Solid coding skills in Python, Bash and proficiency in Go

  • Proficient in configuring and leveraging Linux OS in production

  • Experience with infrastructure-as-code (Terraform, Pulumi), OS configuration state (Ansible, Puppet, Salt) and GitOps workflows (Argo CD, Flux).

  • Experience in OS configuration tooling.

  • Familiarity with AI inference and/or training workflows and the operational patterns around them.

  • Pragmatic, ownership-oriented mindset; comfortable operating in early-stage ambiguity and shipping iteratively.

Preferred/Bonus Qualifications:

  • MS/PhD in Computer Science, Software Engineering, or a related field.

  • Experience with inference servers and runtimes (e.g., Triton, vLLM, TGI) and model-serving patterns (batching, streaming, KV-cache aware routing).

  • Exposure to heterogeneous accelerators beyond GPUs (FPGAs, custom ASICs).

  • Background in observability, SRE, or performance engineering for latency-sensitive services.

  • Experience building customer facing API platforms including onboarding, API keys/auth management, and usage metering.

What We Offer:

  • A chance to be a foundational engineer in an innovative AI startup.

  • A dynamic and collaborative work environment and the change to have a significant impact on new technology

  • The opportunity to work on challenging problems at the intersection of ML, software, and systems.

  • Competitive compensation and startup equity package

  • Comprehensive medical, dental, and vision coverage (premiums 100% paid by employer)

  • Flexible Time Off (FTO)

  • Paid parental leave

  • Gym or fitness benefit

  • Commuter benefit

  • Investment in employee learning & development

Skills Required

  • BS in Computer Science, Software Engineering, or related field
  • 3-5 years hands-on Kubernetes experience (EKS, GKE, and/or self-hosted)
  • 2-3 years production experience operating workloads on AWS or GCP
  • Proven track record running ML or inference services at scale on Kubernetes in production
  • Strong experience running accelerated workloads in Kubernetes, including scheduling, drivers, device plugins, MIG, networking, and storage considerations
  • Solid coding skills in Python and Bash, and proficiency in Go
  • Proficient in configuring and leveraging Linux OS in production
  • Experience with infrastructure-as-code (Terraform, Pulumi), OS configuration state tools (Ansible, Puppet, Salt) and GitOps workflows (Argo CD, Flux)
  • Experience in OS configuration tooling
  • Familiarity with AI inference and/or training workflows and operational patterns
  • Pragmatic, ownership-oriented mindset; comfortable operating in early-stage ambiguity and shipping iteratively
  • MS/PhD in Computer Science, Software Engineering, or related field
  • Experience with inference servers and runtimes (Triton, vLLM, TGI) and model-serving patterns (batching, streaming, KV-cache aware routing)
  • Exposure to heterogeneous accelerators beyond GPUs (FPGAs, custom ASICs)
  • Background in observability, SRE, or performance engineering for latency-sensitive services
  • Experience building customer-facing API platforms including onboarding, API keys/auth management, and usage metering
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
0 Employees
Year Founded: 2007

What We Do

ElastixAI delivers elastic, cost-efficient AI inference by co-designing machine learning models, system software, and reconfigurable hardware as a unified architecture. Their mission is to enable adaptable and cost-efficient GenAI inference infrastructure, driving breakthroughs and making Artificial Super Intelligence accessible to everyone. By removing inefficiencies at every layer of the stack, they achieve lower total cost of ownership and reduced power consumption per token.

Similar Jobs

Samsara Logo Samsara

Senior Security Operations Engineer I

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
WA, USA
4000 Employees

HiBob Logo HiBob

Technical Support

HR Tech • Information Technology • Professional Services • Sales • Software
Remote or Hybrid
United States
1350 Employees
103K-129K Annually

PureSpectrum Logo PureSpectrum

AI Enablement Manager

Big Data • Marketing Tech • Sales • Software • Analytics • Big Data Analytics
Remote or Hybrid
USA
283 Employees

Boeing Logo Boeing

Aviation Maintenance Inspector - 97110

Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
In-Office
Everett, WA, USA
170000 Employees
35-62 Hourly

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
LTX Thumbnail
Conversational AI • Generative AI
Jerusalem, Israel
360 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account