Helix AI Engineer, Video Pretraining

Reposted 14 Days Ago
Be an Early Applicant
San Jose, CA, USA
In-Office
Senior level
Artificial Intelligence • Robotics • Automation • Manufacturing
The Role
Lead the development of large-scale video foundation models for humanoid autonomy, focusing on training strategies and model evaluation for real-world applications.
Summary Generated by Built In

Figure is an AI robotics company developing autonomous general-purpose humanoid robots. Our goal is to build embodied AI systems that can perceive, reason, and act in the real world. Figure is headquartered in San Jose, CA, and this role requires 5 days/week in-office collaboration.

Our Helix team is responsible for developing the core AI systems that power humanoid autonomy. We are looking for a Helix AI Engineer, Video Pretraining to lead the development of large-scale video foundation models trained on diverse real-world and robot-collected data.

This role focuses on pretraining models that learn from raw video—capturing motion, interaction, and temporal structure—to enable downstream capabilities in perception, prediction, and embodied reasoning.

Responsibilities
  • Design and train large-scale video foundation models on diverse datasets spanning internet-scale video and robot-collected data
  • Develop pretraining strategies that capture temporal dynamics, motion, and object interaction from raw video sequences
  • Build models that learn transferable representations for downstream tasks such as perception, tracking, prediction, and control
  • Explore architectures for video understanding and generation, including transformer-based and diffusion-based approaches
  • Implement efficient data pipelines and training strategies for high-throughput video ingestion and large-scale distributed training
  • Optimize model performance across compute, memory, and training efficiency constraints
  • Collaborate closely with generative modeling, agent, and robot learning teams to integrate pretrained models into the autonomy stack
  • Design evaluation frameworks and benchmarks to measure temporal understanding, prediction quality, and generalization
Requirements
  • Experience training large-scale models on video data or other high-dimensional sequential modalities
  • Strong understanding of modern deep learning architectures for video, vision, or multimodal systems
  • Experience with large-scale pretraining, including dataset curation, training dynamics, and scaling laws
  • Proficiency in Python and deep learning frameworks such as PyTorch
  • Experience working with distributed training systems and large GPU clusters
  • Strong experimental rigor and ability to iterate quickly on model design and training strategies
  • Solid software engineering skills and ability to build scalable, reliable systems
  • Ability to operate independently and drive ambiguous, high-impact research directions
Bonus Qualifications
  • Experience working on frontier video models or multimodal foundation models
  • Background in video diffusion, autoregressive video modeling, or world models
  • Experience at leading AI labs such as OpenAI, Google DeepMind, Google, ByteDance, Midjourney, or Adobe
  • Experience with large-scale dataset construction and filtering for video pretraining
  • Familiarity with robotics, embodied AI, or learning from egocentric / first-person video
  • Publication record in machine learning, computer vision, or multimodal AI

The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended. 

Skills Required

  • Experience training large-scale models on video data or high-dimensional sequential modalities
  • Strong understanding of modern deep learning architectures for video or multimodal systems
  • Experience with large-scale pretraining, dataset curation, and training dynamics
  • Proficiency in Python and deep learning frameworks such as PyTorch
  • Experience with distributed training systems and large GPU clusters
  • Strong experimental rigor and ability to iterate on model design
  • Solid software engineering skills for building scalable systems
  • Ability to operate independently and drive ambiguous research directions
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Sunnyvale, California
86 Employees
Year Founded: 2022

What We Do

Figure is an AI Robotics company building the world's first commercially viable autonomous humanoid robot. We are based in Sunnyvale, CA.

Similar Jobs

Golden Pet Brands Logo Golden Pet Brands

Manager, Workforce Management

Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
Easy Apply
In-Office
2 Locations
178 Employees
96K-120K Annually

BlackRock Logo BlackRock

Aladdin Front Office Relationship Manager – Vice President

Fintech • Information Technology • Financial Services
In-Office
San Francisco, CA, USA
25000 Employees
148K-195K Annually
Hybrid
San Francisco, CA, USA
2450 Employees
90K-143K Annually

HiBob Logo HiBob

Senior Product Manager

HR Tech • Information Technology • Professional Services • Sales • Software
Remote or Hybrid
United States
1350 Employees
160K-215K Annually

Similar Companies Hiring

Amalgamated Sugar Thumbnail
Food • Greentech • Agriculture • Industrial • Manufacturing
Boise, Idaho
768 Employees
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York City, NY
100 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account