Vinci4D.ai

Simulation Runtime Software Engineer (Senior)

Reposted 21 Days Ago

Be an Early Applicant

Palo Alto, CA, USA

Hybrid

220K-265K Annually

Senior level

Hardware • Information Technology • Design

Powering the future of hardware design

The Role

Design and implement low-latency, scalable runtime solutions to decompose and distribute production physics simulations across multi-GPU and multi-node environments. Focus on parallelism, correctness, performance optimization, validation, CI, and moving prototypes into production while working closely with ML researchers and physicists.

Summary Generated by Built In

The Mission

At Vinci, we are building the operator intelligence infrastructure that modern hardware programs rely on daily. We have already proven that a single foundation model works out of the box across physics on realistic production workloads.

Trained on PetaBytes of structured physics data
Running billion-voxel inference in production
Tier-1 semiconductor and hardware customers
Operating across multiple physical scales and operator regimes

We are scaling deployment at industrial magnitude:

Increase simulation throughput by two orders of magnitude
Expand simulation capabilities to maximize utility and domain coverage
Support global, multi-entity deployment across Tier-1 ecosystems

Our ambition is to become the default operator intelligence layer that hardware companies run on.

Simulation Runtime

Mapping a computational problem to a runtime environment is the engine of our product. Making our simulations run fast across heterogeneous compute platforms while retaining accuracy is at the center of our value proposition to customers. Delivering involves overcoming many challenges; efficient data sharing or customizing inference and math to bespoke runtime hardware.

The value: making Vinci Simulations run effectively regardless of the hardware: from a single desktop to multi-gpu clusters that span global data center sites. This kind of compute platform flexibility means simulations are easier to run and complete faster when hardware permits. The ability to losslessly divvy up simulations across large scale compute resources will unlock new utility for our customers and power larger more useful applications.

What You Will Do

Your north star will be parallelism and correctness.

In this role you will design and implement low-latency, scalable solutions to decompose and distribute our production simulations across the most challenging computational boundaries;

Multi-GPU machines
Multi-Node clusters
Networked nodes

What We’re Looking For

Being successful in this role requires a deep understanding of scientific computing methods, boundary decomposition problems, and parallel computing.

Qualifications;

Experience working on High Performance Computing runtime applications
Experience with any of highly parallel computing frameworks;
- MPI, MPICH, ZMQ, OpenMP

Experience with GPU Programming; Cuda, ROCM, Triton
Have contributed to a production data processing system.
Familiarity with Statistical validation methods
- Outlier detection, Bayes method, convergence criterion for nonlinear solvers
Familiarity with ML basics
- back prop, loss functions, generators, embeddings, transformer models

We are very excited to talk with you if you have

Worked on highly performant deployed inference environments
Have shipped HPC library components
Experience going from early stage prototype moving to a production environment
- At a Startup or National Lab
Experience with highly parallel ML training frameworks such as Ray

Engineering Expectations

Software engineering fundamentals
- Comfortable meeting software design standards to get code into a production environment.
A practical approach to prototyping necessary components that are currently missing.
Strong CI, regression testing, and validation discipline
Comfort learning and evolving model deployment & runtime infrastructure

Why Vinci

Join a rare early-stage startup that has successfully moved a foundational product from research to real-world, production environments, already serving Tier-1 semiconductor and hardware customers.

Our Mission & Impact

Vinci is building the operator intelligence infrastructure that modern hardware programs rely on daily. We are scaling our solution to accelerate design validation from hours to seconds. You will contribute to expanding our unified model architecture, which currently runs billion-voxel inference, into the transient domain—a key frontier in modeling interactions, deformation, and dynamics. Our ambition is to become the default operator intelligence layer for hardware companies.

Growth & Opportunity

This is a unique opportunity for technical and professional growth, as you will define a foundational abstraction layer early in the company's trajectory. The team is small, friendly, and accessible. You will be empowered to "own and architect large pieces of the system" alongside a team of Physicists, AI researchers, Software Engineers, and Computational Geometry experts. This includes greenfield opportunities to expand Vinci’s core capabilities.

Leadership

You will work with spectacular technical leaders like CTO Sarah Osentoski and CEO Hardik Kabaria, whose vision is to greatly accelerate physics simulations with ML while retaining solver grade accuracy.

Skills Required

Deep understanding of scientific computing methods, boundary decomposition problems, and parallel computing
Experience working on High Performance Computing runtime applications
Experience with highly parallel computing frameworks: MPI, MPICH, ZeroMQ, OpenMP
Experience with GPU programming and ecosystems: CUDA, ROCm, Triton
Contributed to a production data processing system
Familiarity with statistical validation methods (outlier detection, Bayesian methods, convergence criteria for nonlinear solvers)
Familiarity with ML basics (backprop, loss functions, generators, embeddings, transformer models)
Software engineering fundamentals and ability to meet production software design standards
Strong CI, regression testing, and validation discipline
Comfort learning and evolving model deployment and runtime infrastructure
Worked on highly performant deployed inference environments
Shipped HPC library components
Experience moving prototypes to production (startup or national lab)
Experience with highly parallel ML training frameworks such as Ray