The Role
Architect and manage ML pipelines for training and deployment on cloud platforms, automate CI/CD, and ensure system performance and availability.
Summary Generated by Built In
Requirements:
- Architect, build, and operate end-to-end ML pipelines for training, validation and deployment on Google Cloud and AWS.
- Define, instrument, and maintain logging, monitoring, and alerting for model performance and data drift.
- Automate CI/CD for ML artifacts and infrastructure using GitHub Actions or equivalent.
- Collaborate with cross-functional teams, including frontend engineers, backend engineers, research engineers, and infrastructure engineers.
- Write clean, well-documented, fast, and maintainable code.
- Help ensure our systems have high availability and performance.
- Experience in computer graphics or physics-based simulation.
- Background in setting up Prometheus/Grafana, ELK, or similar monitoring stacks.
- Experience with Vertex AI.
- Experience working with custom Domain-Specific Languages.
About Us:
We are an MIT-born, venture-backed Silicon Valley startup building a real-life 'Jarvis'—an AI Copilot for design and manufacturing. Our goal is to utilize advanced AI, physics simulation, and computer graphics to reduce costs and improve engineering productivity across all steps of the design and manufacturing process.
What we're looking for
- BS in Computer Science or a related field.
- 5+ years of experience as a AI/ML Ops, DevOps, Infrastructure Engineer or equivalent.
- Expert-level Python and TypeScripts skills.
- Experience with Docker, Kubernetes, Terraform, Google Cloud and AWS.
- Deep understanding of machine learning models, including LLMs.
- Experience designing and maintaining CI/CD pipelines to fine-tune or train ML models.
- Excellent written and verbal communication skills.
Bonus Points
- Experience in computer graphics or physics-based simulation.
- Background in setting up Prometheus/Grafana, ELK, or similar monitoring stacks.
- Experience with Vertex AI.
- Experience working with custom Domain-Specific Languages.
Our tech stack
- Google Cloud, AWS
- Python, TypeScript
- Protobuf, gRPC
- Next.JS, React.JS
- GitHub Actions
- Docker, Kubernetes, Spinnaker
- PostgreSQL
Skills Required
- 5+ years of experience as a AI/ML Ops, DevOps, Infrastructure Engineer or equivalent
- Expert-level Python and TypeScript skills
- Experience with Docker, Kubernetes, Terraform, Google Cloud and AWS
- Deep understanding of machine learning models, including LLMs
- Experience designing and maintaining CI/CD pipelines to fine-tune or train ML models
- BS in Computer Science or a related field
Am I A Good Fit?
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.
Success! Refresh the page to see how your skills align with this role.
The Company
What We Do
Meet the Al platform to Revolutionize Engineering. We started this company because manufacturing and engineering are overdue for a digital revolution. These fields are complex, and while agile, lean, and just-in-time methods have driven progress, AI now offers a transformative opportunity to reshape how work gets done. Foundation’s EGI platform is built to accelerate every stage of the product development cycle—from research and design to manufacturing and documentation—empowering engineers to build better products, faster and more efficiently.









