Senior AI/ML Research Engineer (Computer Vision)

Posted Yesterday
Be an Early Applicant
Sunnyvale, CA, USA
In-Office
Senior level
Healthtech • Robotics
The future of Intuitive is bright—and it will take curious, driven, and diverse team members to get us there.
The Role
Design, train, and evaluate computer-vision perception models (anatomy, instruments, actions) for surgical video; develop temporal/video models and bench-mark SOTA; define perception I/O and move models from offline experiments to robust, real-time OR performance; establish continuous improvement loops, annotation pipelines, and partner cross-functionally to enable prototype-to-product deployment.
Summary Generated by Built In
Company Description

It started with a simple idea: what if surgery could be less invasive and recovery less painful? Nearly 30 years later, that question still fuels everything we do at Intuitive. As a global leader in robotic-assisted surgery and minimally invasive care, our technologies—like the da Vinci surgical system and Ion—have transformed how care is delivered for millions of patients worldwide.

We’re a team of engineers, clinicians, and innovators united by one purpose: to make surgery smarter, safer, and more human. Every day, our work helps care teams perform with greater precision and patients recover faster, improving outcomes around the world.

The problems we solve demand creativity, rigor, and collaboration. The work is challenging, but deeply meaningful—because every improvement we make has the potential to change a life.

The Future Forward organization is Intuitive’s advanced concepts group. We explore emerging technologies, prototype next-generation solutions, and build software experiences that shape the future of robotic-assisted surgery.

If you’re ready to contribute to something bigger than yourself and help transform the future of healthcare, you’ll find your purpose here.

Job Description

Primary Function of Position

We are building advanced augmented dexterity capabilities for next-generation robotic platforms. As a Senior AI/ML Research Engineer (Computer Vision), you will develop the perception models that let our Embodied-AI system understand the surgical scene. Working within a hierarchical, multimodal stack—where a high-level model interprets sensory observations into structured intent and a low-level policy turns that intent into precise, safe, real-time control—you will focus on the vision layer: designing, training, and evaluating models that extract anatomy, instruments, actions, and surgical context from intraoperative video. You will partner with the broader AI/ML team to define how perception feeds reasoning and control, and you will drive the research-to-deployment path for your models, taking them from offline experimentation to robust, real-time performance in the OR.

Working within Intuitive's Future Forward research organization, you will identify, build and finetune the AI/ML models and algorithms that enables us to deliver safe and performant embodied AI systems. This role calls for someone who is equally comfortable getting hands-on with models and data and designing systems that scale.

Roles and Responsibilities

  • Develop temporal models for activity and workflow understanding: event/state recognition and fine-grained temporal action segmentation.
  • Benchmark in-house models against the state of the art and recommend the target perception architecture.
  • Define the perception input/output specification and demonstrate offline feasibility on recorded data.
  • Stand up a continuous-improvement loop (discrepancy flagging, active learning, human-in-the-loop relabeling) and the tooling/UI needed for offline evaluation and the path to real-time use.
  • Partner with annotation and data teams to shape label taxonomies, QC, and the data pipeline that feeds the AI/ML models.
  • Establish the path from offline evaluation on recorded data to real-time integration, including the continuous-improvement (human-in-the-loop) data loop.
  • Partner with AI/ML researchers, robotics, data engineers, and other stakeholders to deliver a perception layer that enables rapid prototyping and learning while working toward a product solution.

Qualifications

Minimum Qualifications

  • MS or PhD in CS, EE, Robotics, or a related field, with 5+ years of applied computer-vision research experience.
  • Strong grasp of modern CV and deep-learning fundamentals: CNNs and vision transformers, segmentation, detection, tracking, and representation/self-supervised learning.
  • Demonstrated work in video understanding, including temporal action segmentation, action/phase recognition, and video segmentation.
  • Hands-on experience with modern video architectures, including video transformers and self-supervised video pretraining.
  • Exposure to vision-action (VA) / vision-language-action (VLA) models and world-model / self-supervised predictive architectures (e.g., JEPA-style models, MAE, DINO) for learning visual representations and dynamics.
  • Experience working with large, messy, real-world video datasets at scale.
  • Strong software and experimentation skills in Python and C++, with proficiency in one or more of PyTorch/TensorFlow/JAX, and the ability to stand up clean, reproducible experiments and run the full loop (data curation, augmentation, loss design, metrics, error analysis).
  • A research-and-prototyping mindset: comfortable working in ambiguity, framing open-ended problems, running rapid experiments, and reading and reproducing recent papers to pull promising techniques into practice.
  • Sound judgment about the path from prototype to product: writing code others can build on, knowing when to optimize versus when to move fast, and thinking ahead about data quality, evaluation, and robustness even at the research stage.
  • Solid foundations in linear algebra, probability, and optimization, enough to reason about and debug model behavior from first principles.
  • Comfort collaborating across a multidisciplinary team (ML, robotics, software, and clinical/domain experts) and communicating tradeoffs and findings clearly.

Preferred Qualifications

  • Background in healthcare, medical devices, surgical robotics, or other regulated technical domains.
  • Sim-to-real workflows and experience with robotics simulators (e.g., NVIDIA Isaac)
  • Experience with structured, ontology- or taxonomy-based labeling frameworks for fine-grained activity.
  • Multimodal fusion of video with sensor, telemetry, and system-log streams.
  • Designing annotation pipelines, QC processes, and active-learning loops.
  • Real-time / edge inference optimization (e.g., TensorRT, NVIDIA Jetson).
  • Fine-grained interaction and object-relationship modeling.
  • Relevant peer-reviewed publications (CVPR, ICCV, ECCV, NeurIPS, etc.).

Additional Information

Due to the nature of our business and the role, please note that Intuitive and/or your customer(s) may require that you show current proof of vaccination against certain diseases including COVID-19.  Details can vary by role.

Intuitive is an Equal Opportunity Employer. We provide equal employment opportunities to all qualified applicants and employees, and prohibit discrimination and harassment of any type, without regard to race, sex, pregnancy, sexual orientation, gender identity, national origin, color, age, religion, protected veteran or disability status, genetic information or any other status protected under federal, state, or local applicable laws.

Mandatory Notices

U.S. Export Controls Disclaimer:  In accordance with the U.S. Export Administration Regulations (15 CFR §743.13(b)), some roles at Intuitive Surgical may be subject to U.S. export controls for prospective employees
who are nationals from countries currently on embargo or sanctions status.

Certain information you provide as part of the application will be used for purposes of determining whether Intuitive Surgical will need to (i) obtain an export license from the U.S. Government on your behalf (note: the government’s licensing process can take 3 to 6+ months) or (ii) implement a Technology Control Plan (“TCP”) (note: typically adds 2 weeks to the hiring process).  

For any Intuitive role subject to export controls, final offers are contingent upon obtaining an approved export license and/or an executed TCP prior to the prospective employee’s
start date, which may or may not be flexible, and within a timeframe that does not unreasonably impede the hiring need. If applicable, candidates will be notified and instructed on any requirements for these purposes. 

We will consider for employment qualified applicants with arrest and conviction records in accordance with fair chance laws.

Preference will be given to qualified candidates who do not reside, or plan to reside, in Alabama, Arkansas, Delaware, Florida, Indiana, Iowa, Louisiana, Maryland, Mississippi, Missouri, Oklahoma, Pennsylvania, South Carolina, or Tennessee.

This position may be filled at a different job level than listed here depending on
business need and/or on the selected candidate’s experience, knowledge and skills.
Compensation will be based primarily on the job level at which the role is filled and the
candidate’s qualifications, consistent with applicable law.

We provide market-competitive compensation packages, inclusive of base pay, incentives, benefits, and equity. It would not be typical for someone to be hired at the top end of range for the role, as actual pay will be determined based on several factors, including experience, skills, and qualifications. The target compensation ranges are listed.

Skills Required

  • MS or PhD in CS, EE, Robotics, or related field with 5+ years of applied computer-vision research experience
  • Strong grasp of modern CV and deep-learning fundamentals: CNNs and vision transformers, segmentation, detection, tracking, representation/self-supervised learning
  • Demonstrated work in video understanding, including temporal action segmentation, action/phase recognition, and video segmentation
  • Hands-on experience with modern video architectures, including video transformers and self-supervised video pretraining
  • Exposure to vision-action / vision-language-action models and world-model / self-supervised predictive architectures (e.g., JEPA-style models, MAE, DINO)
  • Experience working with large, messy, real-world video datasets at scale
  • Strong software and experimentation skills in Python and C++, with proficiency in one or more of PyTorch/TensorFlow/JAX and ability to run full experimental loop
  • Research-and-prototyping mindset: comfortable working in ambiguity, framing open-ended problems, running rapid experiments, and reproducing recent papers
  • Sound judgment about path from prototype to product, including maintainable code and attention to data quality and robustness
  • Solid foundations in linear algebra, probability, and optimization
  • Comfort collaborating across multidisciplinary teams and communicating tradeoffs and findings clearly
  • Background in healthcare, medical devices, surgical robotics, or other regulated technical domains
  • Sim-to-real workflows and experience with robotics simulators (e.g., NVIDIA Isaac)
  • Experience with structured, ontology- or taxonomy-based labeling frameworks for fine-grained activity
  • Multimodal fusion of video with sensor, telemetry, and system-log streams
  • Designing annotation pipelines, QC processes, and active-learning loops
  • Real-time / edge inference optimization experience (e.g., TensorRT, NVIDIA Jetson)
  • Fine-grained interaction and object-relationship modeling
  • Relevant peer-reviewed publications (CVPR, ICCV, ECCV, NeurIPS, etc.)

Intuitive Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Intuitive and has not been reviewed or approved by Intuitive.

  • Healthcare Strength Healthcare coverage appears broad and modern, including medical/dental/vision, telehealth, second-opinion services, fertility support, and condition-specific programs. Mental health support is positioned as strong, including access to free counseling sessions and a dedicated counseling service.
  • Wellbeing & Lifestyle Benefits Wellbeing and lifestyle offerings extend beyond core insurance, with initiatives such as vaccination clinics, fitness memberships, stress-reduction programs, and employee assistance programs. Additional lifestyle perks include curated discounts, pet insurance, identity theft prevention, and paid volunteer time.
  • Flexible Benefits Flexibility is supported through flexible work schedules and telecommuting options that can help with work-life integration. Benefit availability is described as variable by country, campus, and role, implying a menu that changes by eligibility and location.

Intuitive Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Sunnyvale, CA
12,000 Employees
Year Founded: 1995

What We Do

Intuitive (Nasdaq: ISRG), headquartered in Sunnyvale, Calif., is a global technology leader in minimally invasive care and the pioneer of robotic-assisted surgery. At Intuitive, we believe that minimally invasive care is life-enhancing care. Through ingenuity and intelligent technology, we expand the potential of physicians to heal without constraints. Intuitive brings more than two decades of leadership in robotic-assisted surgical technology and solutions to its offerings, and develops, manufactures, and markets the da Vinci surgical system and the Ion endoluminal system.

Why Work With Us

We bring together the thinkers and doers; those who pursue excellence and are energized by discovering ways to do what can’t yet be done. We question, we test, we challenge each other and the status quo until we see the impact we’ve made, until we’ve set a new standard for minimally invasive care. We revel momentarily in our achievements before sta

Gallery

Gallery

Similar Jobs

Sprout Social Logo Sprout Social

Director of GTM System and Applied AI

Marketing Tech • Social Media • Software • Analytics • Business Intelligence
Easy Apply
Remote or Hybrid
US
1400 Employees
200K-330K Annually

Product.ai Logo Product.ai

Chief Of Staff

Artificial Intelligence • Big Data • Consumer Web • eCommerce
Hybrid
Metropolitan, CA, USA
25 Employees
200K-400K Annually

Product.ai Logo Product.ai

Workplace Operations Lead

Artificial Intelligence • Big Data • Consumer Web • eCommerce
In-Office
Metropolitan, CA, USA
25 Employees
120K-200K Annually

Product.ai Logo Product.ai

Artificial Intelligence Engineer

Artificial Intelligence • Big Data • Consumer Web • eCommerce
In-Office
Metropolitan, CA, USA
25 Employees
170K-500K Annually

Similar Companies Hiring

Sailor Health Thumbnail
Healthtech • Social Impact • Telehealth
New York City, NY
20 Employees
Granted Thumbnail
Mobile • Insurance • Healthtech • Financial Services • Artificial Intelligence
New York, New York
23 Employees
Fairly Even Thumbnail
Hardware • Robotics • Sales • Software • Hospitality
New York, NY
30 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account