The Role
The ML Research Scientist will drive AI research in video embedding and retrieval, multimodal language models, and intelligent agents. The role involves executing AI research projects, collaborating on data collection methods, and improving team communication on ongoing projects.
Summary Generated by Built In
About the role
We're seeking a talented AI Research Scientist to drive our research efforts in video embedding and retrieval, multimodal language models, and intelligent agents. As a key member of our team, you will collaborate closely with the team lead to select and execute AI research projects responsibly and proactively. Your contributions will be instrumental in shaping the future of our products and ensuring their success in the market.
In this role, you will
- Conduct AI research projects essential for advancing Twelve Labs' products (e.g. video embedding & retrieval, multimodal language model, agents)
- Collaborate with the team lead to select AI research projects and execute them responsibly and proactively
- Discuss and determine data collection and labeling methods for project execution
- Provide regular communication and feedback among ML team members on ongoing projects
- Proactively communicate and provide feedback on ongoing projects with the team lead and other team members
- Discuss and contribute to creating a culture and environment that enables the team to work most efficiently
You may be a good fit if you have
- Research experience and interest in fields such as Video Understanding, Multimodal Language Modeling, Vision + X, Open-domain QA, Generative Models, Video Representation Learning, Action Recognition, Multimodal AI, or similar areas
- PhD in a related field, or a Master's degree with 3+ years of industry experience
- Ability to independently lead research projects
- Passionate and responsible in performing job duties
- Experience developing large-scale commercial ML products
- Experience in large-scale model training and acceleration
- Published research in top-tier AI conferences (e.g., NeurIPS, ICML, ICLR, AAAI, NAACL, ACL, EMNLP, CVPR, ICCV, ECCV, KDD, SIGGRAPH) related to video, Vision + NLP, or multimodal studies
- High-ranking records in international conference challenges, Kaggle competitions, or domestic AI competitions
- Excellent English writing and communication skills
The Company
What We Do
Helping developers build programs that can see, hear, and understand the world as we do by giving them the world's most powerful video-understanding infrastructure.