Senior Software Engineer, LLM Performance

Reposted 12 Days Ago
7 Locations
In-Office or Remote
Senior level
Artificial Intelligence • Cloud • Hardware • Information Technology • Software
The Role
Optimize and integrate LLMs across the stack from GPU kernels to Kubernetes deployments. Improve inference performance via kernel development, algorithmic techniques (quantization, speculative decoding), and contributions to open-source LLM engines like vLLM. Drive hardware utilization, profiling, and enterprise-grade scalable implementations.
Summary Generated by Built In

Parasail is redefining AI infrastructure by enabling seamless deployment across a distributed network of GPUs, optimizing for cost, performance, and flexibility. Our mission is to empower AI developers with a fast, cost-efficient, and scalable cloud experience—free from vendor lock-in and designed for the next generation of AI workloads.

Job Description:

The Senior Software Engineer, LLM Performance plays a crucial role in delivering a competitive platform by focusing on efficiently scheduling, executing, and managing AI workloads on distributed compute systems. This role is deeply technical, spanning from low-level GPU kernels to distributed AI orchestration and Kubernetes (K8s) deployments. It is about more than optimization; it’s about pioneering efficient infrastructure that supports AI’s transformative role in reshaping productivity, revolutionizing industries, and addressing some of the world’s most challenging problems. You’ll ensure that generative AI — including large language models (LLMs), multi-modal models, and diffusion models — operates efficiently at enterprise scale while driving continuous improvements in cost, performance, and sustainability.

Responsibilities:

  • Add support for new LLMs, working across the stack from low-level GPU kernels to Kubernetes-based deployments.

  • Contribute to cutting-edge open-source LLM engines such as vLLM or SGLang to extend their capabilities and performance (e.g. use Python technologies to improve API servers or request schedulers).

  • Operate closer to the hardware, focusing on building and integrating solutions to boost performance and hardware utilization. For example, improve attention backends like FlashAttention or FlashInfer by contributing to their development and optimization, or by integrating their solutions into vLLM.

  • Improve LLM performance using advanced algorithmic solutions such as speculative decoding, quantization, or other state-of-the-art techniques. Understand the impact of such techniques in model quality.

Qualifications:

  • Expertise in GPU computing, including low-level platforms such as CUDA, ROCm, XLA, PyTorch, Jax, etc.

  • Background in performance analysis and optimization of AI/HPC workloads (e.g. profiling or theoretical analysis of Flops and bandwidth).

  • Experience in writing GPU kernels using technologies like CUDA, CUTLASS, Triton.

  • Strength in Python and C++.

  • Demonstrated contributions to open-source projects. Contributions to inference engines such as vLLM is a strong plus.

  • A production-oriented mindset emphasizing robust, scalable code suitable for enterprise-grade applications.

  • A relentless curiosity about cutting-edge AI technologies combined with a passion for solving complex problems.

What You Bring to the Table: We are looking for people who are eager to learn and master the lower-level compute concepts that are critical for the AI revolution. With us, your skills will not only contribute to coding but will also have a significant impact on the scalability and efficiency of AI applications at large. If you're geared up for the challenge of optimizing AI performance and eager to push our technological prowess to new heights, we're excited to welcome you aboard.

Skills Required

  • Expertise in GPU computing (CUDA, ROCm, XLA, PyTorch, Jax)
  • Performance analysis and optimization of AI/HPC workloads (profiling, FLOPs and bandwidth analysis)
  • Experience writing GPU kernels using CUDA, CUTLASS, Triton
  • Strong programming skills in Python and C++
  • Experience with Kubernetes and distributed AI orchestration
  • Demonstrated contributions to open-source projects (inference engines like vLLM a strong plus)
  • Production-oriented mindset; build robust, scalable enterprise-grade code
  • Familiarity with attention backends and inference optimizations (e.g., FlashAttention, FlashInfer) and quantization/speculative decoding techniques
  • Curiosity about cutting-edge AI technologies and solving complex performance problems
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Mateo, California
23 Employees
Year Founded: 2023

What We Do

Parasail is the first AI Deployment Network built for the new era of open and scalable AI. We connect teams to the world’s largest pool of on-demand GPU compute—giving AI builders fast, flexible, and cost-efficient infrastructure to deploy and scale models without contracts, quotas, or cloud complexity. From real-time inference to massive batch jobs, Parasail intelligently matches workloads across a global GPU network, optimizing for performance, price, and geography. No DevOps burden, no vendor lock-in—just plug-and-play access to high-performance infrastructure that works with the latest open-source models and evolving AI stacks. Companies like Weights & Biases, Elicit, Rasa, Everpilot, and Oumi are already building faster and saving up to 30x on costs with Parasail. The future of AI deployment isn’t a single cloud. It’s a global compute network. Parasail is making that future a reality. 🔗 www.parasail.io

Similar Jobs

Babylist Logo Babylist

Senior TPM, Merchandising

eCommerce • Healthtech • Kids + Family • Retail • Social Media
Easy Apply
Remote or Hybrid
2 Locations
300 Employees
204K-225K Annually

Zapier Logo Zapier

Automation Strategist (Customer Success)

Artificial Intelligence • Productivity • Software • Automation
In-Office or Remote
3 Locations
800 Employees
119K-238K Annually

Optum Logo Optum

Technical Engineer - Remote

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office or Remote
Vancouver, BC, CAN
160000 Employees
66K-137K Annually

Optum Logo Optum

Director, Program Management - Technology & Software Engineering - Remote

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office or Remote
Richmond, BC, CAN
160000 Employees
128K-265K Annually

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account