TwelveLabs

Staff ML Research Engineer, Marengo

Reposted 24 Days Ago

Be an Early Applicant

Seoul, KOR

Hybrid

110K-110K Annually

Mid level

Software

The Role

As a Research Scientist/Engineer, you will push the boundaries of video search through research, user behavior analysis, and system implementation, bridging research and production. You'll work on multimodal representations and retrieval systems at TwelveLabs.

Summary Generated by Built In

Who we are

At TwelveLabs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do. Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media.

With a $110+ million in Seed and Series A funding, our company is backed by top-tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei-Fei Li, Silvio Savarese, Alexandr Wang and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation.

Our partnership with NVIDIA and AWS gives us access to the most advanced chips, including B300s, enabling us to push the boundaries of what's possible in video AI.

We are a global company that values the uniqueness of each person’s journey. It is the differences in our cultural, educational, and life experiences that allow us to constantly challenge the status quo. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI.

About the Team

This team owns the research and development of Marengo, TwelveLabs’ multimodal embedding model. We develop foundation models that bring video, audio, and text into a shared embedding space, powering state-of-the-art multimodal understanding and retrieval.

End-to-end model development: We work across a broad range of research areas, including contrastive learning, temporal video understanding, and multimodal representation learning. The team owns the entire model development lifecycle—from building large-scale training datasets and designing model architectures to optimizing distributed training and developing robust evaluation frameworks.

Research at scale: With access to world-class compute infrastructure, including NVIDIA B300 GPUs, we rapidly iterate on large-scale experiments, enabling fast progress on ambitious research problems.

Research with real-world impact: The path from research to production is exceptionally short. We work closely with the Search, Product, and Infrastructure teams to continuously improve the models that power multimodal search and understanding for thousands of customers worldwide.

About the Role

As a Staff ML Research Engineer on the Marengo team, you will set the technical direction for TwelveLabs' next-generation multimodal embedding models and own the end-to-end model development process, from research strategy and data architecture to training infrastructure and evaluation frameworks.

This is a high-autonomy role at the intersection of multimodal representation learning, large-scale systems design, and cross-team technical leadership. We're looking for someone who thrives in ambiguity: someone who can identify the highest-impact research problems, define the technical approach, and drive cross-team execution to deliver models that serve customers worldwide.

In this role, you will

Set the technical direction for next-generation multimodal embedding model architecture, training methodology, and data strategy
Own end-to-end model development from research planning through large-scale distributed training to production evaluation
Architect and optimize training infrastructure: distributed training pipelines, data processing systems, experiment workflows, and GPU utilization across the team's compute fleet
Drive data strategy: design large-scale data curation, filtering, and quality frameworks that systematically improve model performance
Define evaluation methodology and quality standards for embedding models, ensuring rigorous benchmarking that captures what matters
Co-design embedding architectures with the search team, optimizing for end-to-end retrieval quality rather than isolated benchmarks
Drive cross-functional alignment with search, product, and infrastructure teams on model integration and performance requirements
Raise the research engineering bar through design review, experiment review, and technical mentorship

Even if you don't check every box, we encourage you to apply.

If you're a zero-to-one achiever, a ferocious learner, and a kind team player who motivates others, you'll find a home at TwelveLabs.

You may be a good fit if you have

7+ years of industry experience in computer vision, NLP, or multimodal learning, with a track record of owning and shipping ML systems end-to-end
Demonstrated ability to take ambiguous, loosely-defined research problems and drive them to concrete, impactful solutions, from problem identification through delivery
Deep expertise in large-scale distributed model training (Kernel optimization, FSDP, or similar)
Strong experience in contrastive learning, representation learning, or foundation model training
Proven end-to-end ownership: not just running experiments, but defining what to build, building it, deploying it, and iterating on it in production
Strong proficiency in Python and PyTorch
Evidence of both research depth and engineering impact: publications paired with shipped products, not one or the other

We evaluate based on relevant technical skills and sustained industry impact. This role is typically a strong fit for engineers with an MS and deep industry experience who have evolved from individual contributor to technical leader in production ML environments.

Preferred Qualifications

Experience training models at billion-parameter scale
Experience with training operations: pipeline reliability, monitoring, fault tolerance, cost optimization
Experience with large-scale data curation and data quality systems
Experience with temporal video understanding or multimodal video modeling
Deep experience with training infrastructure optimization (GPU utilization, mixed precision, communication optimization)
Track record of technical leadership: driving architectural decisions that shaped team or product direction

What makes this role unique

The gap between research and production is remarkably short here. Models you build will be used by thousands of companies worldwide within months. We work as a unified team toward the broader goal of video understanding, rather than solving isolated problems. Our research philosophy balances rigorous experimentation with real-world application: we aim to build multimodal systems that are powerful, trustworthy, and genuinely useful.

Others

Work Location: Seoul Itaewon office + Pangyo satellite office

Hiring Process

Application Review → Recruiter Interview (비대면/30분) → Loop Interview [Hiring Manager Interview&Live Coding Test Interview] (대면/약 90분) → System Design Interview(대면/약 60분) → Final Round Interview (비대면/약 30분) → Reference Check → Offer

Benefits and Perks

Growth & Tools
- 글로벌 B2B 고객과 함께 성장하는 Global Team
- 자율성과 협업을 모두 갖춘 하이브리드 근무
- 최신 맥북 및 70만 원 상당 재택근무 장비 지원, 3년 주기로 최신 장비 교체
- Tokens never sleep - Tech 직군 LLM 토큰 무제한 지원
- 강의, 컨퍼런스, 멤버십 등에 사용 가능한 연 140만원 상당 자기개발비 지원
- 영어 교육 프로그램 및 글로벌 버디 프로그램 운영
- 야간 및 주말 출퇴근 택시비 지원
Meal & Snack
- 식비·교통비 등 자유롭게 사용할 수 있는 연 720만원 상당 법인카드 제공
- 사무실 내 스낵바 운영 (간식, 커피, 제철 과일 등)
- 사무실 근무 시, 오후 7시 이후 저녁 식대 제공
Wellness & Family
- 연 1회 본인 및 가족 1인의 건강검진 제공
- 단체보험 가입 (상해보험/치아보험/가족 상해보험 중 택 1)
- 독감 예방접종비 지원
- 연말 2주간 유급 Holiday Break 운영