Unity Jobs

Staff Machine Learning Engineer - Computer Vision & Multi-Modal AI

Unity

Staff Machine Learning Engineer - Computer Vision & Multi-Modal AI

Posted 19 Hours Ago

Be an Early Applicant

San Francisco, CA, USA

Hybrid

Senior level

AdTech • Artificial Intelligence • Gaming • Machine Learning • Software • Virtual Reality • Metaverse

Unity is the leading platform to create and grow games and interactive experiences.

The Role

Lead design and productionization of computer-vision and multi-modal AI models (transformers, diffusion, VLMs, JEPA). Translate research to scalable training and low-latency deployments across cloud and on-device targets, define KPIs, mentor engineers, and align ML architecture with product and runtime constraints.

Summary Generated by Built In

The opportunity
We are building the next generation of AI-driven game experiences — generative world models, neural rendering, and multi-modal understanding that turn images, text, and 3D primitives into interactive worlds. As our Staff Machine Learning Engineer, you will be a core technical leader bringing state-of-the-art computer vision and multi-modal models — transformers, diffusion networks, vision-language models (VLMs), and JEPA-style architectures — from research into robust, production-grade systems.

This is a deeply hands-on, high-impact role. You will help define the modeling and deployment strategy, drive architectural decisions across the ML stack, and mentor a team of senior and mid-level engineers. Your work will directly shape the quality, capability, and performance of AI features experienced by billions of players — across cloud, server, and on-device targets.

What you'll be doing

Technical Leadership

Help set the technical vision and roadmap for computer vision and multi-modal AI models, spanning transformers, diffusion models, vision-language models, and JEPA-style generative architectures.
Drive design and implementation of models for image and video understanding, generation, segmentation, detection, and dense prediction, as well as multi-modal reasoning over images, text, and 3D inputs.
Make sound decisions on model architecture, training strategy, data pipelines, and evaluation — balancing quality, capability, latency, and cost across deployment targets.
Own the path from research prototype to production: training, fine-tuning, distillation, export, and serving, with deployment spanning cloud GPUs through to efficient on-device inference where the product requires it.

Architecture & Research Translation

Collaborate directly with research scientists to translate novel CV and multi-modal model architectures into deployable, well-engineered implementations.
Design scalable systems for multi-modal inference that process diverse inputs images,
video, text, primitives, and metadata — and produce rich outputs from semantic
predictions to pixel-level generation.
Track and rapidly adopt breakthroughs across the field: vision-language pretraining and
alignment, efficient diffusion (e.g., consistency models, flow matching), efficient attention
e.g., FlashAttention, linear-attention variants), and tokenization/representation learning
for vision.
Where latency or device constraints demand it, apply compression, quantization, pruning, and knowledge distillation, and work with appropriate runtimes (e.g., TensorRT, ONNX Runtime, CoreML, TFLite) to meet performance budgets.
Team & Cross-Functional Leadership
Lead and mentor a team of ML engineers; define engineering best practices, code review standards, and rigorous benchmarking and evaluation methodology.
Partner with research, platform engineers, product managers, and runtime teams to align ML capabilities with product roadmaps and target-platform constraints.
Champion a culture of measurement: define KPIs for model quality, accuracy, latency, memory, and cost, and ensure the team tracks them rigorously.

What we're looking for

6+ years in ML engineering, with significant depth in computer vision and/or multi-modal modeling.
Proven production experience with transformer-based and diffusion-based vision models (e.g., ViT, CLIP/SigLIP-style encoders, Stable Diffusion, DETR/SAM-style architectures)
Strong command of the full model lifecycle: data curation, training and fine-tuning, evaluation, and serving at scale.
Familiarity with efficient attention, diffusion samplers, multi-modal fusion, and vision-language alignment techniques.
Strong Python and modern deep-learning tooling (PyTorch); solid software
engineering fundamentals.
Track record of technical leadership: setting direction, influencing cross-functional partners, and growing engineers.

You might also have

Experience with world-model, video-generation, or neural rendering pipelines (NeRF,
3DGS, or similar).
Experience deploying models to constrained or on-device targets, including quantization
INT8/INT4/FP16), pruning, distillation, and runtimes such as CoreML, TFLite, ONNX
Familiarity with mobile SoC accelerators (Apple Neural Engine, Qualcomm Hexagon/Adreno,ARM Mali) or compiler stacks such as MLIR, TVM, or XLA.
Contributions to open-source ML frameworks or peer-reviewed CV/ML research publications.
Background in real-time graphics or game engine pipelines (Metal, Vulkan, OpenGL ES).

Additional information

Relocation support is not available for this position
Work visa/immigration sponsorship is not available for this position

Benefits
At Unity, we want our team members to thrive. We offer a wide range of benefits designed to support well-being and work-life balance.

Please note: Benefits eligibility, specific offerings, and coverage vary based on the country and employment status.

While specific benefits vary, here are some of the ways we strive to take care of our eligible team members globally: Comprehensive health, life, and disability insurance | Commute subsidy | Employee stock ownership | Competitive retirement/pension plans | Generous vacation and personal days | Support for new parents through leave and family-care programs | Office food snacks | Mental Health and Wellbeing programs and support | Employee Resource Groups | Global Employee Assistance Program | Training and development programs | Volunteering and donation matching program

Life at Unity
Unity [NYSE: U] is the world’s leading game engine, powering play for more than 3 billion consumers each month. The top mobile games in the world, the most played PC indie titles, the most innovative console games, and virtually all of the top XR and Web Games are developed, deployed, and grown in Unity. Unity also enables teams across industries like automotive, manufacturing, and healthcare to design, simulate, and collaborate in 3D — closing the gap between ideas and reality. For more information, please visit www.unity.com.

Unity is a proud equal opportunity employer. We are committed to fostering an inclusive, innovative environment and celebrate our employees across age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, or any other protected status in accordance with applicable law. Our differences are strengths that enable us to support the growing and evolving needs of our customers, partners, and collaborators. If you have a disability that means there are preparations or accommodations we can make to help ensure you have a comfortable and positive interview experience, please fill out this form to let us know.

Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

This position requires the incumbent to have a sufficient knowledge of English to have professional verbal and written exchanges in this language since the performance of the duties related to this position requires frequent and regular communication with colleagues and partners located worldwide and whose common language is English.

Headhunters and recruitment agencies may not submit resumes/CVs through this Web site or directly to managers. Unity does not accept unsolicited headhunter and agency resumes. Unity will not pay fees to any third-party agency or company that does not have a signed agreement with Unity.

Your privacy is important to us. Please take a moment to review our Prospect Privacy Policy and Applicant Privacy Policy. Should you have any concerns about your privacy, please contact us at [email protected].

#SEN #LI-MC1

Skills Required

6+ years in ML engineering with significant depth in computer vision and/or multi-modal modeling.
Proven production experience with transformer-based and diffusion-based vision models (e.g., ViT, CLIP-style encoders, Stable Diffusion, DETR/SAM-style architectures).
Strong command of the full model lifecycle: data curation, training and fine-tuning, evaluation, and serving at scale.
Familiarity with efficient attention, diffusion samplers, multi-modal fusion, and vision-language alignment techniques.
Strong Python skills and modern deep-learning tooling (PyTorch); solid software engineering fundamentals.
Track record of technical leadership: setting direction, influencing cross-functional partners, and growing engineers.
Experience with world-models, video-generation, or neural rendering pipelines (NeRF, 3DGS, or similar).
Experience deploying models to constrained or on-device targets, including quantization (INT8/INT4/FP16), pruning, distillation, and runtimes such as CoreML, TFLite, ONNX Runtime.
Familiarity with mobile SoC accelerators (Apple Neural Engine, Qualcomm Hexagon/Adreno, ARM Mali) or compiler stacks such as MLIR, TVM, or XLA.
Contributions to open-source ML frameworks or peer-reviewed CV/ML research publications.
Background in real-time graphics or game engine pipelines (Metal, Vulkan, OpenGL ES).

Unity Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Unity and has not been reviewed or approved by Unity.

Healthcare Strength — Core medical, dental, vision, life/disability, and mental‑health/EAP offerings are positioned as comprehensive across eligible locations. This breadth aligns with large‑tech standards and is highlighted in official materials.
Retirement Support — A 401(k) plan with employer matching is part of the U.S. package. Retirement benefits are characterized as competitive and a stable element of total rewards.
Parental & Family Support — Paid parental leave and family‑care support are emphasized, with indications of generous time off for new parents. These programs are presented as global in scope, with specifics verified by location.

Learn more about Unity's Compensation & Benefits →

Unity Insights

What's It Like to Work at Unity? Unity Culture & Values Unity Career Growth & Development What's the Work-Life Balance Like at Unity? Unity Leadership & Management Unity Company Growth, Stability & Outlook

View all jobs at Unity

View Unity Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: New York, NY

4,500 Employees

Year Founded: 2004

What We Do

Unity [NYSE: U] is the world’s leading game engine, powering play for more than 3 billion consumers each month. The top mobile games in the world, the most played PC indie titles, the most innovative console games, and virtually all of the top XR and Web Games are developed, deployed, and grown in Unity. Unity also enables teams across industries like automotive, manufacturing, and healthcare to design, simulate, and collaborate in 3D — closing the gap between ideas and reality.

Why Work With Us

We believe the world is a better place with more creators in it. This is at the core of our business because we believe our technology can change the world. Our products give content creators the tools to not just entertain but to create innovative RT3D experiences and deliver better processes for almost every industry.