Amigo

Staff Software Engineer (Data)

Reposted 12 Days Ago

Be an Early Applicant

San Francisco, CA

In-Office

220K-300K Annually

Senior level

Artificial Intelligence • Enterprise Web • Healthtech • Software

Building healthcare AI systems organizations stake their reputations on—trust & safety infrastructure for clinical agent

The Role

As a Staff Software Engineer (Data), you'll design and build data infrastructure for healthcare, including data pipelines, storage solutions, and compliance systems.

Summary Generated by Built In

About Amigo

Amigo builds AI agents that deliver healthcare autonomously—AI doctors, AI nurses, and AI care coordinators. Our agents handle clinical workflows and patient engagement across the entire patient journey: pre-visit intake, triage, care navigation, post-visit care plans, side effect management, and medication adherence. Context-aware with memory across sessions and the ability to take clinical action.

We own outcomes, not just delivery. For our customers, we're responsible for agent performance: clinical safety, continuous improvement, measurable patient outcomes. Agents operate autonomously within bounded clinical domains, with clear scope and handoff protocols. That scope expands as we validate performance across populations.

We're fresh off our Series A from Tier 1 investors like General Catalyst, GSV Ventures, SVA, and CohoVC. Our work is validated with leading academic medical institutions. We're currently reaching 1M+ patient interactions every 90 days, and are on track to 10x this year.

About this role

As a Staff Software Engineer (Data) at Amigo, you'll own the technical direction of our data platform—a strategic differentiator that powers agent improvement, clinical analytics, and research collaboration. You'll architect streaming and batch infrastructure on Databricks that processes agent conversations, clinical events, and patient outcomes at scale.

We own the entire data foundation: raw interaction data, agent reasoning traces, clinical outcomes, and high-fidelity synthetic data. You'll drive architecture decisions for population analysis, data mining pipelines, the Research Platform backend, and secure data sharing with academic partners.

What you'll do

Own technical architecture for the data platform across Databricks, Delta Lake, and supporting infrastructure
Drive engineering standards for pipeline reliability, data quality, and observability
Architect streaming and CDC pipelines that power real-time analytics and agent feedback loops
Design the data backend architecture for Research Platform, including natural language query capabilities
Architect data mining systems for persona discovery, scenario extraction, and edge case detection
Design anonymization and data sharing infrastructure for research partnerships with academic medical institutions
Own multi-region data architecture and compliance requirements
Make build vs. buy decisions for data tooling and evaluate technical tradeoffs
Mentor engineers and establish patterns that raise the bar for the data team
Collaborate with data scientists, agent engineers, and clinical operations to align data capabilities with business needs

What we're looking for

7+ years of production data engineering experience, with significant time at high-caliber engineering organizations
Expert-level experience with Databricks, Spark, and Delta Lake at scale
Strong Python and SQL skills with deep understanding of distributed data systems
Proven track record designing data architectures that scale
Deep experience with streaming systems, CDC patterns, and real-time data processing
Strong understanding of data modeling, medallion architecture, and query optimization
History of establishing engineering standards and mentoring engineers
Extremely high standards for data quality, reliability, and operational excellence
Both execution-oriented and defensive-minded: you ship infrastructure while anticipating failure modes
Excellent communication across engineering, data science, and executive stakeholders

Nice to have

Experience with healthcare data platforms or HIPAA compliance at scale
Background architecting multi-tenant data systems with strict isolation requirements
Experience building natural language query interfaces or LLM-powered data tools
Track record with ML infrastructure (feature stores, training pipelines, model serving)
Experience with Delta Sharing or cross-organization data collaboration
Knowledge of vector search systems and embedding infrastructure at scale

Benefits

Health & Wellness

Comprehensive health, dental, and vision insurance
Mental health support and wellness coaching
Flexible wellness stipend for fitness, therapy, or personal growth
Daily catered lunch and dinner

Growth & Development

Annual learning budget for courses, books, or conferences
Conference attendance budget for professional development
Development setup of your choice
Academic collaboration opportunities

Top Skills

Cloud Data Platforms

Data Modeling

Data Processing Frameworks

Etl Pipelines

NoSQL

SQL

View all jobs at Amigo

View Amigo Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: New York, New York

20 Employees

Year Founded: 2024

What We Do

Amigo AI builds trust and safety infrastructure for clinical agents—ensuring AI systems in healthcare provide quantified confidence when mistakes aren't an option. Our platform combines advanced simulation, verification, and recursive optimization to enable healthcare organizations to deploy AI with statistical guarantees about its behavior.

We solve the fundamental challenge of reliable AI in critical domains through deterministic verification for clinical protocols and continuous drift detection for real-world performance. Our systems provide complete transparency—every AI decision is traceable and auditable, with quantified confidence intervals rather than black box predictions.

Founded by technologists from Google, Meta AI, Databricks, Coda, and Plaid, we've built systems that let organizations make informed risk decisions about AI deployment in healthcare. Our interdisciplinary approach draws from computer science, economics, physics, and mathematics to tackle human-centric optimization problems where people and populations are at the center of every solution.

We're actively working with healthcare organizations across digital health, cancer care, cardiac care, and personalized medicine to deploy AI systems that continuously learn and adapt from real-world feedback while maintaining verified safety boundaries. Our technology amplifies human expertise rather than replacing it, empowering domain experts to achieve outcomes neither could accomplish alone.

Why Work With Us

We build AI healthcare systems where 99% isn't good enough. Rapid growth—promotions in 3 months. Freedom to work your way: art museums or late nights. Tackle recursive optimization problems that ship to production. Your work directly impacts critical healthcare decisions. Diverse team from Google, Meta AI, Databricks solving problems that matter.