Data Engineer
Full Time
Columbus, OH
About Andhealth
AndHealth is a healthcare technology company created to radically improve access and outcomes for the most challenging chronic health conditions. We are driven by the goal of making world-class specialty care accessible and affordable to all. We partner with health systems, community health centers, and independent practices to remove barriers to care to ensure all people have access to the care they deserve.
About the Role
We are building the modern data platform that powers AndHealth’s analytics, reporting, operational workflows, product integrations, and AI initiatives. This is a senior, hands-on infrastructure and pipeline engineering role. You will own the systems that ingest, transport, transform, and serve data – the foundational layer that everything else at AndHealth depends on.
Healthcare data is messy. Partner feeds arrive in inconsistent formats with no warning when schemas change. Source systems span decades of technical debt: flat files, HL7 feeds, proprietary exports, and undocumented APIs. Compliance requirements are strict and non-negotiable. We need someone who has been through this before and knows how to build ingestion and transformation systems that absorb that complexity, so that by the time data reaches the warehouse, it is clean, consistent, and trustworthy.
We expect engineers to leverage AI tools thoughtfully to move faster, and to bring good judgment about where automation helps and where it introduces risk.
What You'll Do:
- Build and own the ingestion layer. Design scalable frameworks for onboarding new healthcare partner data sources: file-based, API-based, streaming, with standardized validation, error handling, and schema evolution support.
- Design and maintain production-grade data pipelines that are idempotent, incremental where appropriate, and built to recover gracefully from failures.
- Build the data quality and observability infrastructure. Implement schema validation, row-count reconciliation, freshness checks, anomaly detection, and alerting at the platform level.
- Own orchestration, scheduling, and pipeline reliability. Every pipeline has clear SLAs, dependency management, failure alerting, and documented recovery procedures. You build the runbooks, the backfill tooling, and the incident response patterns.
- Manage the warehouse infrastructure layer. Performance tuning, partitioning and clustering strategies, cost optimization, access control, and environment management in BigQuery.
- Translate complex healthcare source systems into clean, standardized raw and staging datasets that analytics engineers and analysts can build on with confidence. This includes messy, semi-structured partner data from EHRs, claims systems, pharmacy platforms, and billing feeds.
- Build reusable ingestion and transformation frameworks that the team can extend without reinventing the wheel. Think config-driven pipelines, shared libraries, and standardized patterns.
- Manage integrations with healthcare partners and external data sources, including HRSA, CMS, Medicaid, FDA Orange Book, and federal drug pricing reference datasets. Own the ingestion contracts, handle schema drift, and ensure no data is silently lost or corrupted.
- Ensure HIPAA-compliant security, privacy, and access controls throughout the data lifecycle, including PII detection and masking, role-based access, encryption, and audit logging.
- Define and enforce data contracts between source systems and the data platform, and between the platform and downstream consumers. When something changes upstream, you know about it before it causes damage.
Education & Experience:
- Bachelor's degree in Computer Science, Information Systems, Engineering, Mathematics, or a related technical field preferred.
Other Skills & Qualifications:
Required
- 5+ years of hands-on data engineering experience, with a track record of building and operating production data systems.
- You’ve built ingestion systems from scratch. You’ve dealt with unreliable source systems, inconsistent file formats, undocumented APIs, and schema changes that arrive without warning. You know how to build frameworks that handle these problems systematically.
- Strong infrastructure and platform thinking. You make build-vs-buy decisions, evaluate new tooling, and set the standards the rest of the team builds on. You’ve designed systems, not just contributed to them.
- Production pipeline engineering with Python: building ETL/ELT workflows, handling file-based and API-based integrations, managing retries and error handling.
- Advanced SQL skills: complex joins, CTEs, window functions, query optimization, and large-scale transformations. You should be able to look at a slow query and know where to start.
- Strong understanding of data warehouse architecture and modeling patterns: star schemas, slowly changing dimensions, incremental models, and when to apply each.
- Deep fluency with Cloud Data Platforms (preferably GCP and BigQuery), partitioning strategies, clustering, slot economics, materialization trade-offs, and cost optimization.
- Design for resilience, not just correctness. You think about failure modes, backfill strategies, idempotency, and what happens when a source schema changes at 2 AM on a Friday.
- Experience troubleshooting and resolving production incidents: diagnosing pipeline failures, data anomalies, and performance bottlenecks under pressure.
- Deep experience with pipeline orchestration and reliability engineering. You’ve designed DAGs with complex dependency chains, built retry and backfill mechanisms, implemented SLA monitoring. You know the difference between a pipeline that works and a pipeline that’s production-ready.
- Raise the bar for the team. You establish patterns, write reusable systems, define standards, and mentor junior engineers.
Preferred
- Experience working with healthcare data: EHR, claims, pharmacy, billing, or revenue cycle datasets. You understand the quirks—messy provider taxonomies, adjudication cycles, NDC codes, ICD/CPT mapping.
- Familiarity with healthcare interoperability standards: HL7, FHIR, CCD, or EDI.
- Experience implementing data quality, observability, governance, lineage, or metadata management solutions.
- Experience working in a startup or high-growth technology environment.
- We expect engineers to leverage AI tools thoughtfully to move faster, and to bring good judgment about where automation helps and where it introduces risk.
Here’s what we’d like to offer you:
- Equal investment and support for our people and patients.
- A fun and ambitious start-up environment with a culture that takes on big things, takes risks, and learns quickly.
- The ability to demonstrate creativity, innovation, and conscientiousness, and find joy in working together.
- A team of highly skilled, incredibly kind, and welcoming employees, every one of whom has something unique to offer.
- We know that the overall success of our business is a collaborative effort, and we strive to provide ongoing opportunities for our employees to learn and grow, both personally and professionally.
- Full-time employees are eligible to participate in our benefits package which includes Medical, Dental, Vision Insurance, Company paid time off, Short- and Long-Term Disability, and more.
We are an equal opportunity and affirmative action employer. We embrace diversity and are committed to creating an inclusive environment for all employees. Applicants will be considered for employment without regard to race, religion, gender, gender identity, sexual orientation, national origin, age, disability, or veteran status.
Skills Required
- 5+ years of hands-on data engineering experience building and operating production data systems
- Built ingestion systems handling unreliable sources, inconsistent file formats, undocumented APIs, and schema changes
- Strong infrastructure and platform thinking; design build-vs-buy decisions and standards
- Production pipeline engineering with Python for ETL/ELT workflows, file and API integrations, retries and error handling
- Advanced SQL skills including complex joins, CTEs, window functions, query optimization, large-scale transformations
- Strong understanding of data warehouse architecture and modeling patterns (star schemas, SCDs, incremental models)
- Deep fluency with Cloud Data Platforms, preferably GCP and BigQuery, including partitioning, clustering, slot economics, and cost optimization
- Design for resilience: idempotency, backfill strategies, failure modes, and recovery procedures
- Experience troubleshooting and resolving production incidents, diagnosing pipeline failures and performance bottlenecks
- Deep experience with pipeline orchestration and reliability engineering, DAGs, retries, backfills, SLA monitoring
- Bachelor's degree in Computer Science or related field
- Experience working with healthcare data (EHR, claims, pharmacy, billing) and familiarity with NDC, ICD/CPT
- Familiarity with healthcare interoperability standards: HL7, FHIR, CCD, or EDI
- Experience implementing data quality, observability, governance, lineage, or metadata management solutions
- Experience working in a startup or high-growth technology environment
What We Do
AndHealth is the leading digital health solution for autoimmune diseases and migraine. Employers use AndHealth's disease reversal programs to unlock employee engagement and productivity while reducing healthcare costs. Because autoimmune diseases and migraine disproportionately impact women, addressing these issues is important to advancing DEI. AndHealth is built by the team that created digital health pioneer CoverMyMeds, which was one of the first digital health unicorns, and is backed with more than $57 million in funding from Francisco Partners, Health 2047 (American Medical Association’s partner venture fund), Kirkland & Ellis, and Twofold Ventures.





