Who we are: Founded in Lyon, France, in 2015, DataGalaxy is the industry’s first Data Knowledge Catalog, helping organizations understand how their business runs on data. Our data management platform is dedicated to providing user-friendly metadata mapping, management, and knowledge sharing to support organizational data governance and data literacy.
Our mission: To remain at the forefront of modern data management by empowering data professionals and business users through increasing data knowledge and providing a comprehensive understanding of how businesses operate with data. We help organizations facilitate collaboration, manage data as a true asset, and enable powerful, data-driven decision-making.
Our values:
Be intentional. Be clear. Be bold. Be humble. At DataGalaxy, each employee brings unique knowledge, skills, and viewpoints that create a truly well-rounded team. We encourage unique ideas, innovative thinking, and independent perspectives to achieve exceptional results together!
Responsibilities:
- Expertise in Generative AI: Possess a solid understanding of generative AI models and techniques, including but not limited to LLMs, RAG architecture, and agents.
- MLOps Pipeline Development: Design, build, and maintain robust and scalable MLOps pipelines for the training, testing, deployment, and monitoring of AI/ML models. This role specifically emphasizes generative AI models, including fine-tuning,
- LLM inference (VLLM), and maintaining open-source LLM models (e.g., Llama 3.1).
- DevOps Implementation: Implement the best DevOps practices for continuous integration and continuous delivery (CI/CD) of AI/ML models.
- Cloud Infrastructure Management: Manage and optimize cloud infrastructure on GCP for AI/ML workloads, with a strong focus Kubernetes Engine and Helm for efficient deployment and configuration management.
- Monitoring and Optimization: Develop and implement monitoring and alerting systems to ensure model performance, reliability, and cost-effectiveness. Analyze model performance and identify areas for optimization.
- Collaboration and Communication: Collaborate effectively with data scientists, engineers, and product managers to deploy and maintain AI/ML solutions.
- Communicate technical concepts clearly and concisely to both technical and non-technical audiences.
Ideal Candidate:
- Master’s degree or equivalent from an engineering school in Computer Science, Data Science, or a related field.
- Strong programming skills, particularly in Python. Familiarity with relevant libraries/frameworks (e.g. vLLM , PyTorch, Hugging Face, Scikit-Learn).
- Extensive experience with Google Cloud Platform and its AI/ML services.
- Strong experience with GenAI service, in particular Llama3 & vLLM.
- Strong experience of containerization technologies like Docker & Kubernetes.
- Proven experience in creating and managing CI/CD pipelines for AI/ML models.
- Proficiency in version control systems (Git) and effective terminal operations management.
- Experience in data manipulation and analysis (e.g., Pandas, Spark, duckDB).
- Strong written and verbal communication skills, including the ability to present complex technical concepts to both technical and non-technical audiences.
What We offer:
- Flexible working hours (forfait jour).
- The opportunity to join a pioneering French startup in its market 🚀.
- Competitive compensation according to your experience and potential.
- Health insurance (Apicil), meal vouchers (Swile card of 9 €/day), and 50% reimbursement of transportation costs.
- A friendly and welcoming work environment, with o\ices in the heart of Lyon, 10-15 minutes from train stations.
- Quarterly team events and seminars to strengthen team cohesion and celebrate success.
Top Skills
What We Do
An established leader in Europe, growing rapidly and operating worldwide, DataGalaxy offers a user-centric platform dedicated to metadata mapping, active metadata management, and metadata knowledge sharing. With its innovative approach to data cataloging, DataGalaxy helps businesses of all sizes gain control over their data assets and make better, more informed decisions. Govern, organize and curate millions of different assets with minimum effort! Our user centric data catalog that blends the most powerful augmented data stewardship experience with crowd-sourced business knowledge. We also offer a spectrum of integrations so that you can map out your data landscape with ease. Contact us today to find out more about what DataGalaxy can do for you.
Learn more about DataGalaxy data lineage tools, data management software, and business glossary software.