VLM Research Engineer (m/f/d)

Reposted 10 Days Ago
Be an Early Applicant
Berlin, DEU
In-Office
Internship
Artificial Intelligence • Computer Vision • Productivity • Analytics
Unlocking a new level of productivity in manual assembly with the power of computer vision and AI-based analytics
The Role
The Research Engineer will develop vision-language models for video understanding, create training pipelines, and enhance model performance in collaboration with engineering teams.
Summary Generated by Built In

We’re looking for a Research Engineer to push the limits of vision-language models for real-world video understanding. You’ll work on applied, state-of-the-art multimodal models and turn them into production pipelines used by customers.

Your role
  • Design and adapt vision-language and video models for scene understanding, temporal reasoning and activity / action recognition

  • Build and maintain large-scale training and evaluation pipelines on GPU clusters

  • Curate and augment video-text and action datasets, including synthetic labels and retrieval-based augmentation

  • Develop robust benchmarks for video QA, instruction following and temporal understanding, and use them to drive iterative model improvements

  • Cut and refactor model architectures for efficiency and deployability (compression, pruning, distillation)

  • Deliver production-ready inference pipelines to product and customer teams, working closely with CV, platform and robotics engineers

You bring
  • Completed PhD (or equivalent research track record) in computer vision, machine learning, robotics or a related field

  • Strong background in video-centric deep learning: scene understanding, temporal / activity / action recognition, or video generation

  • Experience training and adapting large vision or VLM models (e.g. InternVL, Qwen-VL, DeepSeek-VL, similar stacks)

  • Proven work with multi-GPU training (PyTorch, distributed, mixed precision) and large-scale datasets

  • Solid engineering habits: clean Python, reproducible experiments, reliable data and training pipelines

  • Track record of moving research into usable systems (demos, internal tools, or productised features) in fast-moving teams

Nice to have
  • Publications at top-tier venues (CVPR, ICCV, ECCV, NeurIPS, ICLR, etc.) on video, multimodal learning or scene understanding

  • Experience with 3D/4D scene representations, action generation or embodied / sense-plan-act style projects

  • Inference optimisation: quantisation, TensorRT, model distillation, or deployment on constrained hardware

  • Prior experience in a startup or applied research lab environment

What we offer

Employee Share Options Program for all permanent employees*

An increasing benefits list: currently includes Urban Sports club and quarterly team retreats.

Be on the forefront in defining what artificial intelligence means in manufacturing

Gain hands-on experience in working in an AI-first software company

Supportive and inclusive culture that values diversity and promotes the advancement of underrepresented groups within the company

Collaborate with a diverse (currently more than 10 nationalities) and talented team, working on cutting-edge projects with real-world impact

Network with professionals and leaders in the field, opening doors to potential future career opportunities

We have a very flat hierarchy, open 360° feedback, and flexible working hours

Ethics⚖: We are committed to developing ethical AI software
Don't meet all the requirements?

Deltia is committed to creating a workplace that is diverse, fair, and inclusive. We encourage candidates from all backgrounds, even if they do not meet every qualification, to submit their application. We firmly believe that having a team with diverse perspectives only strengthens our company and drives innovation. Our commitment also extends to providing an accessible environment for everyone, including those with disabilities. Please let us know if you require any accommodations during the application process or while working with us, and we will do our best to support you.

*Only full-time, permanent roles are eligible for stock options. Part-time roles, contract roles, work-student, internships and freelance roles are not eligible for stock options.

Skills Required

  • Completed PhD or equivalent research track record
  • Strong background in video-centric deep learning
  • Experience training and adapting large vision or VLM models
  • Proven work with multi-GPU training and large-scale datasets
  • Solid engineering habits and reproducible experiments
  • Track record of moving research into usable systems
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Berlin
39 Employees
Year Founded: 2022

What We Do

AI-based process analytics platform to increase productivity and quality in manual shop-floor processes. Processes are captured using computer vision and automatically analysed with a highly flexible AI to identify improvement potential. 1. Real-time capture with cameras Cameras are installed at individual assembly stations capturing live video streams of assembly or packaging tasks. 2. AI tracks material & work steps Video streams are continuously analyzed to detect workpiece movements, cycle times, and work step sequencing. 3. Aggregation for data analysis Process data is aggregated per article and production to provide insights on process performance. Video snippets allow for a comprehensive root-cause analysis. 4. Data-driven improvements Your factory managers and process engineers define, implement and measure process improvements to increase productivity and quality in your assembly line.

Similar Jobs

HERE Technologies Logo HERE Technologies

Software Engineer

Artificial Intelligence • Automotive • Computer Vision • Information Technology • Internet of Things • Logistics • Software
Hybrid
Berlin, DEU
6000 Employees

Zscaler Logo Zscaler

Senior Sales Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
Germany
8697 Employees
89K-128K Annually

Zscaler Logo Zscaler

Sales Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
Germany
8697 Employees
113K-161K Annually

Superhuman Logo Superhuman

System Engineer

Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI
Hybrid
Berlin, DEU
1500 Employees

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account