This is a backend, data-intensive role. You will work at the intersection of machine learning research and production data systems, building models that run at scale and integrate into real-world clinical workflows. You should be comfortable moving from research to implementation in a fast-paced, applied setting.
- Build and optimize ML pipelines on large-scale distributed systems using Apache Spark and cloud-native tools
- Work with structured and unstructured data sources to develop predictive models, classification systems, and entity resolution approaches
- Collaborate with data engineers to integrate ML models into production ETL pipelines and ensure scalability and reliability
- Partner with MLOps teams to operationalize models — including monitoring, retraining, and versioning workflows
- Evaluate and benchmark model performance and present findings to both technical and non-technical stakeholders
- Stay current with advances in NLP, deep learning, and healthcare-specific AI applications to identify opportunities for innovation
- Someone who understands the realities of working with messy, large-scale healthcare data and can design around its limitations
- Collaborative and communicative — comfortable working with data engineers, MLOps practitioners, and product teams
- Curious about the clinical space and quick to develop domain intuition around trial processes, provider data, and healthcare terminology
- Confident owning technical decisions and presenting trade-offs clearly to both technical and business stakeholders
- Strong proficiency in Python; working knowledge of SQL and R
- Deep experience with ML frameworks such as TensorFlow and PyTorch
- Proven experience building and running models on large-scale data pipelines using Apache Spark (PySpark)
- Exposure to NLP techniques — including text classification, named entity recognition, information extraction, and transformer-based models
- Familiarity with cloud platforms (Azure preferred) and tools such as Databricks, Delta Lake, and Kubernetes
- Experience working closely with data engineering and MLOps teams in production environments
- Background in healthcare, life sciences, pharma, or clinical research is highly preferred
- Comfortable working independently in a remote setting with a distributed, cross-timezone team
Skills Required
- 5+ years of hands-on experience in machine learning, data science, or applied AI roles
- Strong proficiency in Python; working knowledge of SQL and R
- Deep experience with ML frameworks such as TensorFlow and PyTorch
- Proven experience building and running models on large-scale data pipelines using Apache Spark (PySpark)
- Exposure to NLP techniques
- Familiarity with cloud platforms (Azure preferred) and tools such as Databricks, Delta Lake, and Kubernetes
- Experience working closely with data engineering and MLOps teams in production environments
- Background in healthcare, life sciences, pharma, or clinical research
What We Do
Access to medicine and healthcare is a basic human right. At H1, we believe access to the best healthcare information is also a basic human right, one that will be more important in the 21st century than ever before. Our commitment to creating a healthier future for everyone drives us to build and maintain the most current, accurate, and comprehensive healthcare knowledge base available, as well as the tools and intelligence to extract unparalleled insights to carry global healthcare forward.
Why Work With Us
We’re a team of people building products that help solve difficult problems in healthcare. We work through complex challenges every day, navigating ambiguity, wrestling with uncertainty, and pushing the boundaries of what’s possible–all while caring deeply about one another and the people we seek to help.
Gallery









