Senior ML Infrastructure Engineer

Sorry, this job was removed at 12:19 a.m. (CST) on Sunday, May 25, 2025
Be an Early Applicant
Palo Alto, CA
In-Office
Artificial Intelligence • Healthtech
The Role
About Us:

Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in the world by bringing deep healthcare expertise to every human. No other technology has the potential to have this level of global impact on health. 

Why Join Our Team:

  • Innovative Mission: We are developing a safe, healthcare-focused large language model (LLM) designed to revolutionize health outcomes on a global scale.

  • Visionary Leadership: Hippocratic AI was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from leading institutions, including El Camino Health, Johns Hopkins, Stanford, Microsoft, Google, and NVIDIA.

  • Strategic Investors: We have raised a total of $278 million in funding, backed by top investors such as Andreessen Horowitz, General Catalyst, Kleiner Perkins, NVIDIA’s NVentures, Premji Invest, SV Angel, and six health systems.

  • World-Class Team: Our team is composed of leading experts in healthcare and artificial intelligence, ensuring our technology is safe, effective, and capable of delivering meaningful improvements to healthcare delivery and outcomes.

Position Overview:


We are seeking a skilled ML Infrastructure Engineer to help design, build, and maintain a robust orchestration platform for managing a diverse set of Large Language Models (LLMs). The ideal candidate will have hands-on experience with infrastructure orchestration tools such as Kubernetes and Terraform, as well as a strong understanding of multi-cloud environments. This role offers the opportunity to work on cutting-edge technologies and play a key part in scaling our AI infrastructure.

Key Responsibilities: Infrastructure Development & Maintenance:


• Build and maintain infrastructure for deploying and managing LLMs at scale.
• Implement automated processes using Kubernetes and Infrastructure as Code (IAC) tools like Terraform.


Orchestration Platform Support:


• Contribute to the development and optimization of an orchestration platform for managing a heterogeneous set of LLMs.
• Monitor and troubleshoot issues in the platform to ensure high availability and performance.


Cloud Integration:


• Deploy and manage resources across multiple cloud platforms (e.g., AWS, Azure, Google Cloud).
• Optimize cloud resource usage for cost efficiency and scalability.


Collaboration:


• Work closely with ML engineers and DevOps teams to ensure smooth deployment and operation of AI models.
• Provide feedback on system designs and recommend improvements to infrastructure workflows.


Performance Monitoring:


• Implement tools and processes to monitor system health, identify bottlenecks, and improve model lifecycle management.
• Perform capacity planning to support growing infrastructure needs.

Qualifications:

Technical Skills:

• 3-5 years of experience in infrastructure engineering, DevOps, or a related field.

  • Experience with enterprise GPUs such as H200, H100, A100

• Proficiency with Kubernetes, Terraform, and other IAC tools.
• Familiarity with multi-cloud environments and cloud-native services (e.g., AWS Lambda, Google Cloud Run, Azure Functions).
• Programming skills in Python, Bash, or a similar language for automation and scripting.
• Basic understanding of ML workflows and frameworks like TensorFlow, PyTorch, or Hugging Face is a plus.

Soft Skills: • Strong problem-solving skills and attention to detail.
• Good communication and collaboration abilities to work effectively with cross-functional teams.
• Eagerness to learn new technologies and improve existing systems.

Education & Experience: • Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent work experience).

Similar Jobs

General Motors Logo General Motors

Senior Software Engineer

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
Sunnyvale, CA, USA
165000 Employees
153K-234K Annually

General Motors Logo General Motors

Infrastructure Engineer

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
2 Locations
165000 Employees

Match Group Logo Match Group

Senior Software Engineer

Mobile • Social Media
Hybrid
Palo Alto, CA, USA
1400 Employees
220K-250K Annually
Easy Apply
In-Office or Remote
Los Angeles, CA, USA
56 Employees
150K-240K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Palo Alto, California
97 Employees
Year Founded: 2023

What We Do

Hippocratic AI’s mission is to develop the first safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in the world by bringing deep healthcare expertise to every human. No other technology has the potential to have this level of global impact on health.
The company was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, Microsoft, Meta and NVIDIA. Hippocratic AI has received a total of $137 million in funding and is backed by leading investors, including General Catalyst, Andreessen Horowitz, Premji Invest, SV Angel, NVentures (Nvidia Venture Capital), and Greycroft. For more information on Hippocratic AI: www.HippocraticAI.com.

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account