Senior Software Engineer, Inference

Posted 2 Days Ago
Be an Early Applicant
Palo Alto, CA, USA
In-Office
185K-250K Annually
Senior level
Information Technology
The Role
Lead inference acceleration and GPU-parallelism optimizations (TP, SP, PP); implement high-performance CUDA/NCCL kernels; deploy videogen and LLM models to production; collaborate with researchers, conduct code reviews, mentor engineers, and optionally improve training efficiency and resource utilization.
Summary Generated by Built In
About the Role

We are seeking a Senior Inference Engineer to accelerate the performance of Pika's AI-driven products. In this highly technical role, you will operate at the intersection of cutting-edge inference acceleration, GPU parallelism, advanced model deployment, and video generation technologies. Your expertise will drive significant improvements to model speed and efficiency, ensuring our creative AI systems deliver industry-leading user experiences at scale.

 

You will design and optimize inference pipelines, implement state-of-the-art acceleration techniques, and work closely with researchers and engineers across the team to push the boundaries of what’s possible in real-time AI deployment. Your efforts will play a foundational role in powering the next generation of Pika’s video and language models.

 
What You’ll Do
  • Accelerate Inference: Lead and implement advanced inference acceleration techniques, including attention optimization and quantization for efficient model serving.

  • Maximize GPU Parallelism: Engineer and optimize GPU strategies across tensor, sequence, and pipeline parallelism (TP, SP, PP) for maximal efficiency and scalability.

  • Programming for Performance: Develop and optimize high-performance computing kernels and distributed workloads using CUDA and NCCL.

  • Advance AI Deployment: Collaborate with research and engineering teams to bring state-of-the-art videogen and large language models into production.

  • Improve Training Efficiency: (Bonus) Contribute to improvements in model training speed, stability, and resource utilization as part of our deployment lifecycle.

  • Technical Excellence: Drive rigorous code reviews, participate in technical discussions, and mentor fellow engineers on best practices in inference and GPU programming.

 
What We’re Looking For
  • Experience: 5+ years engineering experience, with a strong track record in inference acceleration and model deployment at scale.

  • Inference Mastery: Proven expertise in inference optimization, including quantization, attention acceleration, and deep learning compiler stacks.

  • GPU & Parallelism: Deep knowledge of GPU programming (CUDA, NCCL) and experience with SP, TP, PP, and other forms of parallelism for distributed inference.

  • AI Domain Knowledge: Familiarity with video generation (videogen) models and large language models (LLMs).

  • Collaboration: Strong cross-discipline communication skills; able to drive shared goals across research and engineering functions.

  • Ownership Mindset: Self-driven, solutions-oriented, and capable of managing ambiguity in a fast-paced startup environment.

  • Bonus: Experience in enhancing training efficiency, stability, or resource optimization for large models.

 
Nice to Have
  • Experience with high-throughput video or real-time streaming model deployment

  • Familiarity with distributed training and optimization toolkits

  • Contributions to open source projects in AI infrastructure or deep learning compilers

  • Startup or rapid prototyping experience

 
What We Offer
  • Competitive salary in the AI industry

  • Equity in a fast-growing startup shaping the future of AI

  • Comprehensive health benefits, monthly stipends, company retreats

  • A supportive and collaborative office culture—we’re all building and launching together

 
About Pika

At Pika, we're crafting a future where video creation is seamless, intuitive, and universally accessible. Our mission is to empower creativity by breaking down technical barriers using the transformative power of AI. We’re a tight-knit, energetic team based in Palo Alto, CA, valuing efficiency, curiosity, and the ambition to make a meaningful impact on the world.

 

We work from our Palo Alto office 3–5 days a week and welcome applicants who are eager to contribute onsite.

Skills Required

  • 5+ years engineering experience
  • Proven expertise in inference optimization including quantization and attention acceleration
  • Experience with deep learning compiler stacks
  • Expert GPU programming with CUDA and NCCL
  • Experience with tensor, sequence, and pipeline parallelism (TP, SP, PP)
  • Familiarity with video generation models and large language models
  • Strong cross-discipline communication skills
  • Self-driven, solutions-oriented with ownership mindset
  • Experience improving model training efficiency, stability, or resource optimization
  • Experience with high-throughput video or real-time streaming model deployment
  • Familiarity with distributed training and optimization toolkits
  • Contributions to open source AI infrastructure or deep learning compilers
  • Startup or rapid prototyping experience
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
29 Employees
Year Founded: 2023

What We Do

An idea-to-video platform that brings your creativity to motion

Similar Jobs

Anthropic Logo Anthropic

Senior Software Engineer

Artificial Intelligence • Natural Language Processing • Generative AI
In-Office
3 Locations
2500 Employees
320K-485K Annually

Anthropic Logo Anthropic

Senior Software Engineer

Artificial Intelligence • Natural Language Processing • Generative AI
In-Office
San Francisco, CA, USA
2500 Employees
320K-485K Annually

NVIDIA Logo NVIDIA

Senior Software Engineer

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
In-Office
Santa Clara, CA, USA
21960 Employees
152K-288K Annually

CoreWeave Logo CoreWeave

Senior Software Engineer

Cloud • Information Technology • Machine Learning
In-Office
2 Locations
1450 Employees
139K-204K Annually

Similar Companies Hiring

Scrunch  Thumbnail
Artificial Intelligence • Information Technology • Marketing Tech • Software • SEO
Salt Lake City, Utah
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account