Senior AI Systems Engineer (LLM Inference & Infra Optimization)

Posted 10 Days Ago
Hiring Remotely in US
Remote
Senior level
Artificial Intelligence • Healthtech
The Role
Lead efforts in deploying and optimizing large language models on GPU hardware, optimizing inference pipelines and managing multi-cloud infrastructures.
Summary Generated by Built In
About Us

At Sully.ai, we’re building cutting-edge AI-native infrastructure to power real-time, intelligent healthcare applications. Our team operates at the intersection of high-performance computing, ML systems, and cloud infrastructure — optimizing inference pipelines to support next-generation multimodal AI agents. We're looking for a deeply technical engineer who thrives at the systems level and loves building performant, scalable infrastructure.

The Role

We’re looking for a senior-level engineer to lead efforts in deploying and optimizing large language models on high-end GPU hardware and building the infrastructure that supports them. You'll work across the stack — from C++ and CUDA kernels to Python APIs — while also shaping our DevOps practices for scalable, multi-cloud deployments. This role blends systems performance, ML inference, and infrastructure-as-code to deliver low-latency, production-grade AI services.

What You’ll Do
  • LLM Inference Optimization: Develop and optimize inference pipelines using quantization, attention caching, speculative decoding, and memory-efficient serving.

  • Systems Programming: Build and maintain low-level modules in C++/CUDA/NCCL to squeeze the most out of GPUs and high-throughput architectures.

  • DevOps & Infrastructure Engineering: Stand up and manage multi-cloud environments using modern IaC frameworks such as Pulumi or Terraform. Automate infrastructure provisioning, deployment pipelines, and GPU fleet orchestration.

  • Real-Time Architectures: Design low-latency streaming and decision-support systems leveraging embedding models, VRAM token caches, and fast interconnects.

  • Developer Enablement: Build robust tooling, interfaces, and sandbox environments so that other engineers can contribute safely to the ML systems layer.

What We’re Looking For
  • Proficiency in C++, CUDA, and Python with experience in systems or ML infrastructure engineering.

  • Deep understanding of GPU architectures, inference optimization, and large model serving techniques.

  • Hands-on experience with multi-cloud environments (GCP, AWS, etc.) and infrastructure-as-code tools such as Pulumi, Terraform, or similar.

  • Familiarity with ML deployment frameworks (TensorRT, vLLM, DeepSpeed, Hugging Face Transformers, etc.).

  • Comfortable with DevOps workflows, containerization (Docker), CI/CD, and distributed system debugging.

  • (Bonus) Experience with streaming embeddings, semantic search, or hybrid retrieval architectures.

  • (Bonus) Interest in building tools that democratize high-performance systems for broader engineering teams.

Why Join Us
  • Collaborate with a highly technical team solving hard problems at the edge of AI and healthcare.

  • Work with bleeding-edge GPU infrastructure and build systems that push what's possible.

  • Be a foundational part of shaping AI-native infrastructure for real-time, mission-critical applications.

  • Help accelerate a meaningful product that improves how clinicians work and patients are cared for.

Sully.ai is an equal opportunity employer. In addition to EEO being the law, it is a policy that is fully consistent with our principles. All qualified applicants will receive consideration for employment without regard to status as a protected veteran or a qualified individual with a disability, or other protected status such as race, religion, color, national origin, sex, sexual orientation, gender identity, genetic information, pregnancy or age. Sully.ai prohibits any form of workplace harassment. 

Top Skills

C++
Cuda
Deepspeed
Docker
Hugging Face Transformers
Pulumi
Python
Tensorrt
Terraform
Vllm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Mountain View, California
60 Employees
Year Founded: 2023

What We Do

Superhuman medical staff that’s 10x better, 20x cheaper, and 100x faster.

Similar Jobs

FreeWheel Logo FreeWheel

Comcast Site Reliability Engineering Intern

AdTech • Digital Media • Marketing Tech
Remote or Hybrid
Pennsylvania, USA
32-32

NBCUniversal Logo NBCUniversal

Freelance Broadcast Facilities Supervisor, Milan Cortina Olympics - NBC Sports

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote or Hybrid
New York, NY, USA
550-700

NBCUniversal Logo NBCUniversal

Staff Data Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote or Hybrid
New York, NY, USA
150K-185K Annually

NBCUniversal Logo NBCUniversal

Freelance BOC Maintenance Core Manager, Milan Cortina Olympics - NBC Sports

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote or Hybrid
New York, NY, USA
650-750

Similar Companies Hiring

Camber Thumbnail
Social Impact • Healthtech • Fintech
New York, NY
53 Employees
Sailor Health Thumbnail
Telehealth • Software • Social Impact • Healthtech
New York City, NY
20 Employees
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account