Research Scientist, Video Understanding & World Models

Posted Yesterday
Be an Early Applicant
New York, NY, USA
In-Office
200K-250K Annually
Senior level
Artificial Intelligence • Computer Vision • Machine Learning • Robotics
The Role
Lead large-scale video representation and video-language model research and training on egocentric/stereo robotics data, run self-supervised and multimodal pretraining, produce reproducible checkpoints and embeddings for production, and build rigorous training/eval workflows.
Summary Generated by Built In
About Mecka AI

Mecka AI is building the data infrastructure layer for robotics and embodied AI.

We partner with leading AI labs and robotics companies to deliver high-quality, real-world datasets used to train, evaluate, and deploy robotic systems. Our work sits directly between research, data, and real-world execution — where model performance is dictated by data quality.

Our Mission

Robotics will become the largest industry in human history — larger than anything that has come before it. As intelligent machines move into the physical world, they will dramatically expand global GDP, raise the material standard of living for everyone, and ultimately help make humanity a multiplanetary civilization. None of that happens without one thing: enormous amounts of high-quality, real-world data.

Mecka AI builds that foundation. We are the data infrastructure layer for robotics and embodied AI — the substrate that teaches machines to perceive, reason, and act in reality. Get this right, and we accelerate the most important technological transition of our time.

Our Culture
  • Excellence as the baseline. We hold an extremely high bar and expect the best work of your career. Mediocrity isn't interesting to us.

  • Highly technical. We reason from first principles, not by analogy. The best argument wins — regardless of title or tenure.

  • Truth-seeking. We are relentlessly honest with ourselves and each other. We chase reality — measured, not assumed — and kill our own bad ideas fast.

  • Maniacal urgency. The work matters and the clock is real. We move fast, ship, measure, and iterate.

  • Extreme ownership. You own outcomes end-to-end — no hand-offs, no excuses, no waiting for permission.

  • Hardcore. This is a high-intensity environment for people who want to do the defining work of their lives.

The Role

We are looking for a Research Scientist, Video Understanding to own Mecka’s video understanding agenda end-to-end: train large-scale video representation and video-language models on our egocentric + stereo corpus, and turn the resulting checkpoints into production signals the rest of the stack ships on.

This role is focused on large model training, video encoders, video-language models, VLMs/VLAs, and temporal representation learning on real-world robotics data.

What You’ll Work OnLarge-Scale Training & Architecture
  • Own model architecture and training strategy across Mecka’s task families (manipulation, locomotion, daily activity, long-horizon behavior).

  • Run self-supervised and multimodal pretraining (VideoMAE / VJEPA / VideoPrism / InternVideo-class) with rigorous evals and clean ablations.

Video-Language & Multimodal Modeling
  • Train and fine-tune video encoders and video-language models (temporal transformers, joint-embedding models, contrastive objectives, masked modeling, instruction/video alignment).

  • Incorporate useful priors (pose, depth, camera motion, optical flow) when it improves representation quality.

Research → Production Signals
  • Turn checkpoints into usable artifacts: embeddings and model outputs that downstream systems can reliably consume (retrieval, labeling, QA, analytics).

  • Build a disciplined training + eval workflow with regression tracking and reproducible runs.

Who You AreRequired Background
  • Deep experience training large models in PyTorch (or equivalent), including multi-GPU or distributed training.

  • Strong understanding of modern video representation learning and/or multimodal modeling.

  • Ability to run rigorous experiments and communicate results clearly.

  • Warning: Research Scientist positions require hyper-specific expertise. Please limit your applications to one research role. Applying to multiple Research Scientist positions suggests a lack of focus and may result in the rejection of all submissions. You may, however, apply to other non-research roles alongside your research application.

Strong Signals:

  • Experience with video VLMs / VLA-adjacent systems (VideoCLIP, InstructBLIP-Video, LLaVA-Video-class).

  • Experience with egocentric / embodied datasets (Ego4D, EgoExo4D, EPIC-Kitchens, Something-Something).

  • Strong software engineering discipline: you write research code that can be shipped.

Why This Role
  • Work on a domain — egocentric embodied video — where data is scarce everywhere except here.

  • Own a research agenda that directly feeds production systems and product outcomes.

Skills Required

  • Deep experience training large models in PyTorch, including multi-GPU or distributed training
  • Strong understanding of modern video representation learning and/or multimodal modeling
  • Ability to design and run rigorous experiments and communicate results clearly
  • Strong software engineering discipline; research code that can be productionized
  • Experience with video VLMs / VLA systems (VideoCLIP, InstructBLIP-Video, LLaVA-Video-class)
  • Experience with egocentric / embodied datasets (Ego4D, EgoExo4D, EPIC-Kitchens, Something-Something)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
58 Employees
Year Founded: 2024

What We Do

Mecka AI is a data and infrastructure company that provides high-quality human movement data to accelerate the development of autonomous systems for humanoid robotics. It serves as the data and deployment layer for physical AI, capturing, structuring, and evaluating real-world activity to create labeled datasets that enable robots to learn and deploy reliably in commercial settings.

Similar Jobs

Citadel Securities Logo Citadel Securities

Crypto Market Feed Handlers Lead

Information Technology • Software • Financial Services • Quantitative Trading
In-Office
2 Locations
1900 Employees
175K-350K Annually

Nexthink Logo Nexthink

Enterprise Account Executive

Artificial Intelligence • Big Data • Cloud • Information Technology • Machine Learning • Software
Remote or Hybrid
New York, NY, USA
1200 Employees
150K-360K Annually

Nexthink Logo Nexthink

Client Director - Northeast

Artificial Intelligence • Big Data • Cloud • Information Technology • Machine Learning • Software
Remote or Hybrid
New York, NY, USA
1200 Employees
180K-360K Annually

Superhuman Logo Superhuman

Enterprise Account Executive

Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI
Remote or Hybrid
United States
1500 Employees
207K-300K Annually

Similar Companies Hiring

Fairly Even Thumbnail
Hardware • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account