Multimodal LLM Researcher (MLLM)

Reposted Yesterday
Be an Early Applicant
Palo Alto, CA, USA
In-Office
185K-400K Annually
Senior level
Information Technology
The Role
Lead research on multimodal generative models focusing on real-time synthesis from text, image, video, and audio, and collaborate with teams to develop scalable technologies.
Summary Generated by Built In
Multimodal LLM Researcher (MLLM)
About the Role

At Pika, we are pioneering next-generation creative infrastructure built around real-time, multimodal generation and intelligent, agentic platforms. We are seeking accomplished Multimodal LLM Researchers (LLM, VLM, and Audio LM) to drive forward our mission to make agentic real-time generative technology accessible, dynamic, and transformative for millions of creators.

 

As a core member of our research team, you will be integral to designing and building foundational technologies, developing novel approaches for large multimodal language models (LLMs/VLMs/Audio LMs), and orchestrating intelligent agentic systems that power scalable, interactive multimedia experiences. You will collaborate closely with engineering and product teams, shaping the future of real-time creative platforms.

 
What You’ll Do
  • Lead and contribute to research efforts focused on real-time, multimodal generation—including text, image, video, and audio synthesis—as well as orchestration of agentic platform infrastructure

  • Design and prototype novel algorithms and architectures for high-fidelity, real-time multimodal synthesis and interactive experiences

  • Focus on real-time aspects of model inference and synthesis across modalities

  • Work on diffusion model distillation and/or develop diffusion-based world models for multimodal applications

  • Train and finetune autoregressive and diffusion models in LLM, VLM, or Audio LM contexts with a focus on real-time performance

  • Curate specific datasets, especially for video, audio, cross-modal, and sensory-rich data

  • Collaborate with cross-functional teams to bring research advancements into production-ready technologies

  • Publish work in top-tier conferences and journals; communicate research results internally and externally

  • Stay at the cutting edge of real-time multimodal generative AI and agentic orchestration

 
What We’re Looking For
  • 5+ years of relevant experience, including research during graduate studies, in large language models, vision-language models, audio language models, deep learning, or related fields

  • Demonstrated impact as first author on major publications in top conferences or journals (e.g., NeurIPS, ICML, ICLR, frontier research background)

  • Deep expertise in at least one area: language modeling (LLM), vision-language modeling (VLM), or audio language modeling (Audio LM)

  • Strong experience with generative models, including autoregressive and diffusion models, and their real-time deployment

  • Hands-on experience curating, constructing, or augmenting large, high-quality multimodal datasets

  • Experience developing and deploying real-time systems and/or agentic orchestration infrastructure

  • Strong programming and prototyping skills (Python, PyTorch, TensorFlow, etc.)

  • Passion for building creative tools and platforms that empower users

  • Excellent communication and collaboration skills

 
What We Offer
  • Competitive salary and substantial equity in a high-growth startup

  • Full health benefits + 401k matching and more

  • Collaborative, mission-driven team environment with major growth opportunities

  • Flexible on-site/remote hybrid (HQ in Palo Alto, CA)

 
 
About Pika

Pika empowers creators by building state-of-the-art agentic and multimedia platforms. Our vision is to break down technical barriers to creativity, making real-time generative and intelligent orchestration accessible to all. Join us and shape the next evolution of creative technology!

 

If you are a leading researcher excited by real-time multimodal AI and agentic platforms, we want to hear from you.

Skills Required

  • 5+ years of relevant experience in large language models, vision-language models, deep learning, or related fields
  • First author on major publications in top conferences or journals
  • Deep expertise in language modeling, vision-language modeling, or audio language modeling
  • Experience with generative models and real-time deployment
  • Hands-on experience curating multimodal datasets
  • Experience developing real-time systems and agentic orchestration infrastructure
  • Strong programming skills in Python, PyTorch, TensorFlow
  • Excellent communication and collaboration skills
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
29 Employees
Year Founded: 2023

What We Do

An idea-to-video platform that brings your creativity to motion

Similar Jobs

ServiceNow Logo ServiceNow

Program Manager

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Santa Clara, CA, USA
29000 Employees
138K-241K Annually

MetLife Logo MetLife

Customer Care Advocate AMS Service - Omaha, NE 9.21.26 - 18275

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Remote or Hybrid
United States
43000 Employees
42K-42K Annually

MetLife Logo MetLife

Customer Care Advocate Disability Intake - Cary, NC 9.14.26 - 18272

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Remote or Hybrid
United States
43000 Employees
42K-42K Annually

MetLife Logo MetLife

Customer Care Advocate Disability Intake - Cary, NC 9.21.26 - 18274

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Remote or Hybrid
United States
43000 Employees
42K-42K Annually

Similar Companies Hiring

Scrunch  Thumbnail
Artificial Intelligence • Information Technology • Marketing Tech • Software • SEO
Salt Lake City, Utah
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account