Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in the world by bringing deep healthcare expertise to every human. No other technology has the potential to have this level of global impact on health.
Why Join Our TeamInnovative Mission: We are developing a safe, healthcare-focused large language model (LLM) designed to revolutionize health outcomes on a global scale.
Visionary Leadership: Hippocratic AI was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from leading institutions, including El Camino Health, Johns Hopkins, Stanford, Microsoft, Google, and NVIDIA.
Strategic Investors: We have raised a total of $278 million in funding, backed by top investors such as Andreessen Horowitz, General Catalyst, Kleiner Perkins, NVIDIA’s NVentures, Premji Invest, SV Angel, and six health systems.
World-Class Team: Our team is composed of leading experts in healthcare and artificial intelligence, ensuring our technology is safe, effective, and capable of delivering meaningful improvements to healthcare delivery and outcomes.
For more information, visit www.HippocraticAI.com.
We value in-person teamwork and believe the best ideas happen together. Our team is expected to be in the office five days a week in Palo Alto, CA, unless explicitly noted otherwise in the job description.
About the RoleAt Hippocratic AI, data is the backbone of everything we build — from our voice-based generative healthcare agents to the systems that ensure their clinical safety and reliability.
We’re looking for data engineers who thrive at the intersection of data systems, security, and scalability to design the infrastructure that powers safe AI deployment across healthcare environments.
As part of the Data Infrastructure team, you’ll design, operate, and scale systems that collect, transform, and serve data for model training, evaluation, and operational analytics. You’ll work closely with ML, AI, and product teams to ensure our data is trustworthy, compliant, and high-quality — forming the foundation for real-world, patient-facing AI systems.
This is an opportunity to build data systems for a new class of AI products — where correctness, traceability, and privacy aren’t optional, they’re mission-critical.
What You'll DoBuild & operate data platforms and pipelines (batch/stream) that feed training, RAG, evaluation, and analytics using tools like Prefect, dbt, Airflow, Spark, and cloud data warehouses (Snowflake/BigQuery/Redshift).
Own data governance and access control: implement HIPAA-grade permissioning, lineage, audit logging, and DLP; manage IAM, roles, and policy-as-code.
Ensure reliability, observability, and cost efficiency across storage (S3/GCS), warehouses, and ETL/ELT—SLAs/SLOs, data quality checks, monitoring, and disaster recovery.
Enable self-service analytics via curated models and semantic layers; mentor engineers on best practices in schema design, SQL performance, and data lifecycle. Partner with ML/Research to provision high-quality datasets, feature stores, and labeling/eval corpora with reproducibility (versioning, metadata, data contracts).
Must Have:
5+ years of software or data engineering experience, with 3+ years building data infrastructure, ETL/ELT pipelines, or distributed data systems.
Deep experience with Python and at least one cloud data platform (Snowflake, DataBricks, BigQuery, Redshift, or equivalent).
Familiarity with orchestration tools (Airflow, prefect, dbt) and infrastructure-as-code (Terraform, CloudFormation).
Strong understanding of data security, access control, and compliance frameworks (HIPAA, SOC 2, GDPR, or similar).
Proficiency with SQL and experience optimizing query performance and storage design.
Excellent problem-solving and collaboration skills — able to work across engineering, ML, and clinical teams.
Comfortable navigating trade-offs between performance, cost, and maintainability in complex systems.
Nice-to-Have:
Experience supporting ML pipelines, feature stores, or model training datasets.
Familiarity with real-time streaming systems (Kafka, Kinesis) or large-scale unstructured data storage (S3, GCS).
Background in data reliability engineering, data quality monitoring, or governance automation.
Experience in healthcare, safety-critical systems, or regulated environments.
If you’re passionate about building data systems that power safe, real-world AI, we’d love to hear from you. Join Hippocratic AI and help lay the foundation for clinically safe, data-driven healthcare agents that make a measurable impact on patient outcomes.
***Be aware of recruitment scams impersonating Hippocratic AI. All recruiting communication will come from @hippocraticai.com email addresses. We will never request payment or sensitive personal information during the hiring process.
Top Skills
What We Do
Hippocratic AI’s mission is to develop the first safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in the world by bringing deep healthcare expertise to every human. No other technology has the potential to have this level of global impact on health.
The company was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, Microsoft, Meta and NVIDIA. Hippocratic AI has received a total of $137 million in funding and is backed by leading investors, including General Catalyst, Andreessen Horowitz, Premji Invest, SV Angel, NVentures (Nvidia Venture Capital), and Greycroft. For more information on Hippocratic AI: www.HippocraticAI.com.








