- Design, build, and maintain scalable data pipelines on Google Cloud Platform (GCP) for AI and machine learning use cases
- Implement data ingestion and transformation frameworks that power Retrieval systems and training datasets for LLMs and multimodal models
- Architect and manage NoSQL and Vector Databases to store and retrieve embeddings, documents, and model inputs efficiently
- Collaborate with ML and platform teams to define data schemas, partitioning strategies, and governance rules that ensure privacy, scalability, and reliability
- Integrate unstructured and structured data sources (text, speech, image, documents, metadata) into unified data models ready for AI consumption
- Optimize performance and cost of data pipelines using GCP native services (BigQuery, Dataflow, Pub/Sub, Cloud Storage, Vertex AI)
- Contribute to data quality and lineage frameworks, ensuring AI models are trained on validated, auditable, and compliant datasets
- Continuously evaluate and improve our data stack to accelerate AI experimentation and deployment
- You have 5+ years of experience in Data Engineering, ideally supporting AI or ML workloads
- You have strong experience with the GCP data ecosystem and proficiency in Python and SQL
- You have deep understanding of NoSQL systems (e.g., MongoDB) and vector databases (e.g., FAISS, Vector Search)
- You have experience designing data architectures for RAG, embeddings, or model training pipelines
- You have knowledge of data governance, security, and compliance for sensitive or regulated data
- You are fluent in English
- You hold a Master's or Ph.D. degree in Computer Science, Data Engineering, or a related field
- You have familiarity with W&B / MLflow / Braintrust / DVC for experiment tracking and dataset versioning
- You have experience with containerized environments (Docker, Kubernetes) and CI/CD for data workflows
- Our solutions are built on a single fully cloud-native platform that supports web and mobile app interfaces, multiple languages, and is adapted to country and healthcare specialty requirements.
- Our stack is composed of Rails, TypeScript, Java, Python, Kotlin, Swift, and React Native.
- We leverage AI ethically across our products to empower patients and health professionals. Discover our AI vision here.
- Free comprehensive health insurance (basic package) for you and your children
- 25 days of paid vacation per year, plus up to 14 days of RTT
- Free mental health and coaching services through our partner Moka.care
- Work from abroad for up to 10 days per year thanks to our flexibility days policy
- Lunch vouchers (Swile card) worth €8.50 per working day, with €4.50 covered by Doctolib
- A subsidy from the work council to refund part of the membership to a sport club or a creative class
- 50% reimbursement of your public transport subscription
- Parent Care Program: receive one additional month of leave on top of the legal parental leave
- Enrollment in Doctolib's long-term employee value sharing plan called DoctoGrowth
- For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
- Relocation support in case of international mobility
- Access to the best AI tools for coding, development and dedicated training
- Recruiter Interview
- Technical Deep Dive
- System Design Interview
- Behavioral Interview
- At least one reference check
- Permanent position
- Tech stack: GCP, Python, SQL, NoSQL, Vector Databases, AI/ML
- Full-time
- Paris, France
- Hybrid work setup (up to 2 remote days per week)
- Start date: as soon as possible
Skills Required
- 5+ years of experience in Data Engineering, ideally supporting AI or ML workloads
- Master's or Ph.D. degree in Computer Science, Data Engineering, or a related field
- Strong experience with the GCP data ecosystem
- Proficiency in Python and SQL
- Deep understanding of NoSQL systems and vector databases
- Experience designing data architectures for RAG or model training pipelines
- Knowledge of data governance, security, and compliance
- Familiarity with experiment tracking and dataset versioning
- Familiarity with containerized environments and CI/CD for data workflows
What We Do
Since Doctolib's creation in 2013, we have had one purpose: strive for a healthier world. 1. We aim to improve the daily lives of care teams by providing them with a new generation of technologies and services. 2. We aim to improve health for all, by offering a fast and frictionless journey for all care episodes, creating new ways for people to receive care and empowering them to become actors of their health. At Doctolib, we are honored to work in the healthcare field and we believe that innovation in healthcare should be handled differently. We apply 4 guiding principles in everything we do: 1. We create helpful solutions for care teams and people. 2. We serve everyone equally and create well-designed and accessible technologies. 3. We team up with our users to strive for a healthier world and act as one team. 4. We protect our users' privacy. It’s their health, their data. To achieve our purpose, we are assembling a team dedicated to improving healthcare, with a human-centric approach and an entrepreneurial mindset. www.doctolib.com







