AI Engineer (VLM & VLA)

Posted 9 Days Ago
Be an Early Applicant
Munich, Bayern, DEU
In-Office
150K-300K Annually
Expert/Leader
Information Technology • Robotics • Software
The Role
Develop and optimize vision-language-action models, build multimodal representations, integrate LLM reasoning with action control, deploy models, and ensure safety in robotics applications.
Summary Generated by Built In

About Us


Foundation is developing the future of general purpose robotics with the goal to address the labor shortage.

Our mission is to create advanced robots that can operate in complex environments, reducing human risk in conflict zones and enhancing efficiency in labor-intensive industries.


We are on the lookout for extraordinary engineers and scientists to join our team. Your previous experience in robotics isn't a prerequisite — it's your talent and determination that truly count.


We expect that many of our team members will bring diverse perspectives from various industries and fields. We are looking for individuals with a proven record of exceptional ability and a history of creating things that work.


Our Culture


We like to be frank and honest about who we are, so that people can decide for themselves if this is a culture they resonate with. Please read more about our culture here https://foundation.bot/culture.


Who should join:


  • You deeply believe that this is the most important mission for humanity and needs to happen yesterday.
  • You are highly technical - regardless of the role you are in. We are building technology; you need to understand technology well.
  • You care about aesthetics and design inside out. If it's not the best product ever, it bothers you, and you need to “fix” it.
  • You don't need someone to motivate you; you get things done.

Why Are We Hiring for this Role 

  • Develop and optimize vision-language-action models, including transformers, diffusion models, and multimodal encoders/decoders.
  • Build representations for 2D/3D perception, affordances, scene understanding, and spatial reasoning.
  • Integrate LLM-based reasoning with action planning and control policies.
  • Design datasets for multimodal learning: video-action trajectories, instruction following, teleoperation data, and synthetic data.
  • Interface VLAM outputs with real-time robot control stacks (navigation, manipulation, locomotion).
  • Implement grounding layers that convert natural language instructions into symbolic, geometric, or skill-level action plans.
  • Deploy models on on-board or edge compute platforms, optimizing for latency, safety, and reliability.
  • Build scalable pipelines for ingesting, labeling, and generating multimodal training data.
  • Create simulation-to-real (Sim2Real) training workflows using synthetic environments and teleoperated demonstration data.
  • Optimize training pipelines, model parallelism, and evaluation frameworks.
  • Work closely with robotics, hardware, controls, and safety teams to ensure model outputs are executable, safe, and predictable.
  • Collaborate with product teams to define robot capabilities and user-facing behaviors.
  • Participate in user and field testing to iterate on real-world performance.
What Kind of Person are we looking For 
  • Strong experience with training multimodal models, including VLAs, VLMs, vision transformers, LLMs.
  • Ability to build and iterate on large-scale training pipelines.
  • Deep proficiency in PyTorch or JAX, distributed training, and GPU acceleration.
  • Strong software engineering skills in Python and modern ML tooling.
  • Experience with (synthetic) dataset creation and curation.
  • Understanding of real-time deployment constraints on embedded hardware.
  • Optimally, familiarity with robotics simulation environments (Isaac Lab, Mujoco, or similar).
  • Ideally, hands-on experience with robotics, embodied AI, or reinforcement/imitation learning.
  • MSc or PhD in Computer Science, Robotics, Machine Learning, or related field—or equivalent industry experience.

Benefits

We provide market standard benefits (health, vision, dental, 401k, etc.). Join us for the culture and the mission, not for the benefits.

Salary

The annual compensation is expected to be between $150,000 - $300,000. Exact compensation may vary based on skills, experience, and location.


Top Skills

Jax
Python
PyTorch
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
58 Employees

What We Do

Foundation is developing the future of general purpose robotics with the goal to address the labor shortage. Our mission is to create advanced robots that can operate in complex environments, reducing human risk in conflict zones and enhancing efficiency in labor-intensive industries.

Similar Jobs

MongoDB Logo MongoDB

Customer Success Manager

Big Data • Cloud • Software • Database
Easy Apply
Hybrid
Munich, Bayern, DEU
5550 Employees

Celonis Logo Celonis

Account Executive

Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
Hybrid
Munich, Bayern, DEU
3000 Employees

Celonis Logo Celonis

Account Executive

Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
Remote or Hybrid
Germany
3000 Employees

Magna International Logo Magna International

Werkstudent HR (m/w/d)

Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Hybrid
Markt Schwaben, Bayern, DEU
171000 Employees

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account