What we’re doing isn’t easy, but nothing worth doing ever is.
We envision a future powered by robots that work seamlessly with human teams. We build artificial intelligence that enables service robots to collaborate with people and adapt to dynamic human environments. Join our mission-driven team as we build out current and future generations of robots.
As an ML Engineer II (World Models), you will develop and deploy predictive perception (“world model”) systems that fuse multi-sensor robot data into a unified representation of the near future. You’ll build training and evaluation workflows that convert fleet data into reliable model improvements, and partner with robotics engineers to ship the model on edge hardware. Your work will directly impact Moxi’s ability to move confidently and safely in crowded hospital environments.
Responsibilities- Develop multimodal world-model architectures that ingest and fuse camera, LiDAR/depth, and robot state and produce short-horizon predictions.
- Build and maintain training pipelines: dataset construction, tokenization/backbones, distributed training, and ablation frameworks.
- Define model evaluation metrics and regression suites that reflect real robot outcomes.
- Create visualization/debug tooling for temporal predictions (rollouts, replays, overlays, failure case inspection).
- Optimize and distill models for edge deployment; benchmark latency, memory, and stability on target hardware.
- Collaborate with the AI Platform team to integrate the world model into autonomy stacks and validate behavior.
- Work with Operations to identify failure modes in the field and drive data curation and model iteration.
- Bachelor’s or Master’s degree in Robotics, Computer Science, Electrical Engineering, or related field (PhD a plus).
- 3+ years of experience building and training deep learning models in robotics, autonomy, or perception.
- Strong proficiency with PyTorch and modern training workflows (distributed training, mixed precision, profiling).
- Experience working with multimodal sensor data (cameras + LiDAR/depth) and temporal models.
- Experience with predictive perception / world models / video prediction.
- Experience deploying ML to edge devices (TensorRT/ONNX, quantization/INT8, runtime profiling).
- Familiarity with ROS pipelines, sensor calibration, and autonomy stack integration.
- Experience with simulation-based evaluation (Isaac Sim/Mujoco or similar) and offline replay testing.
Top Skills
What We Do
We are a human-centered robotics company. Our mission is to make technical advances towards robots and humans working together side by side, with an emphasis on human-centric design. Diligent Robotics is developing a suite of artificial intelligence that enables robots to collaborate with and adapt to humans in everyday environments. Our service robots are designed to participate and work together with teams of humans. Our first product is a hospital service robot that can assist clinical staff with logistical tasks, allowing them to spend more of their time on direct patient care, improving patient satisfaction, quality of service, and safety.









