The CDAH will be a cloud-native, open-source “operating system” for integrated climate and health intelligence, built on five pillars:
- AI R&D environment:Ingests multi-modal climate, environmental, epidemiological and socio-demographic data into a unified data lake & feature store; supports Kubeflow/PyTorch/TensorFlow pipelines with MLflow registry, automated benchmarking, architecture search, transfer learning and uncertainty-aware modeling.
- Digital tool marketplace & public goods registry: User-facing portal for dashboards, mobile apps and alerting platforms; structured backend registry of pre-trained model packages, microservices, ETL scripts, governance adapters, metadata and version history.
- Systems integration & deployment layer: Middleware adapters and Kafka messaging to plug AI services into DHIS2, HMIS, IDSR and similar platforms; Terraform/Ansible IaC, identity management, end-to-end encryption and compliance with data-governance standards.
- Training environment: Web portal and virtual bootcamp infrastructure hosting open-access modules, instructor-led sessions, hands-on Jupyter labs, code templates and certification tracks on climate-health AI workflows and interoperability.
- Real-world evaluation sandbox: Controlled simulation environment replicating public-health workflows, climate variability and institutional constraints; structured feedback loops for piloting, validating and refining tools prior to full-scale rollout.
What We’re Looking For
- Architect the data backbone:Lead design of a multi-tenant data lake & feature store; define schemas, metadata standards, and secure ETL/ELT pipelines for climate, environmental, epidemiological, and socio-demographic data.
- Source & curate open-source datasets: Identify, evaluate and onboard public climate, environmental, epidemiological and socio-demographic data (e.g., ERA5/ Copernicus, MODIS, WHO, UN, university repositories, open-API feeds), ensuring metadata completeness and licensing compliance for downstream model training.
- Automate data quality assurancet & governance: Build unit/integration tests and data-quality checks (Great Expectations/dbt), track lineage, and enforce access controls.
- Ingest and harmonize datasets:Operationalize ingestion, cleansing, and harmonization of ERA5, Sentinel, GPM, EHR, mobility, and demographic datasets; ensure interoperability with DHIS2/HMIS
- Automate data services: Develop reusable validation libraries, transformation scripts, and secure REST/GraphQL APIs to power downstream AI models and dashboards. Manage the data-service API contract; the AI/ML Engineer manages model APIs.
- Develop Training Labs:Author reference ETL scripts, notebooks, and architecture patterns for “AI-ready” datasets; validate that bootcamp exercises reflect real-world data challenges
- Co-lead bootcamps: Guide participants through hands-on ETL labs, troubleshoot integration issues, and refine training materials based on feedback.
- Publish open-source components: Package and release ETL modules, transformation libraries, and interoperability adapters to the public-goods registry under permissive licenses.
What We’re Looking For
- Deep technical expertise: 8+ years in data engineering, with a strong track record designing and operating large-scale data lakes and pipelines.Demonstrated experience discovering, evaluating and integrating diverse open-source data streams for ML pipelines.
- DataOps & Cloud proficiency:Expertise in Python/SQL, Spark/Flink, Airflow, dbt, Kafka, Docker, Kubernetes, CI/CD (GitOps), and AWS/Azure/GCP.
- API & microservices: Proven ability to design, implement, and secure RESTful APIs and data service micro-architectures.
- Consulting acumen: Exceptional stakeholder management, technical storytelling, and client-facing presentation skills– ideally honed at a top-tier consulting firm or tech organization.
- Autonomous delivery:Demonstrated capacity to own complex projects end-to-end, navigate ambiguity, and deliver production-ready solutions with minimal oversight.
Preferred Qualifications
- Prior engagement in global health, One Health, or climate-health data initiatives.
- Familiarity with data-governance frameworks (e.g., GDPR, HIPAA) and cybersecurity best practices.
- Experience designing and delivering technical training or bootcamps.
- Contributions to open-source digital public goods or curated registries.
Why You’ll Love This Role
- High-impact mission: Your work will directly strengthen early warning systems and resilience in climate-vulnerable regions.
- Technical leadership:Own the design and delivery of the CDAH's data backbone.
- Innovation-friendly environment:Leverage cutting-edge Big Data and cloud technologies in a dynamic, open-source ecosystem.
- Global collaboration: Engage a diverse network of public-health experts, policymakers, and open-source communities.
Top Skills
What We Do
Malaria No More (MNM) envisions a world where no one dies of a mosquito bite. More than a decade into our mission, our work has contributed to historic progress toward this goal. Now, we’re mobilizing the political commitment, funding, and innovation required to achieve what would be one of the greatest humanitarian accomplishments—ending malaria within our generation.
MNM is a global organization with offices in Washington, D.C. and Nairobi, and affiliates in the United Kingdom and Japan.