Who we are: Founded in Lyon, France, in 2015, DataGalaxy has become the leading data catalog in France, connecting data, people, and AI through an intuitive data governance platform. Our mission is to simplify metadata mapping, management, and knowledge sharing to enhance organizational data governance and data literacy.
With over 170 clients worldwide and rapidly growing, we are making significant strides in the US market, with the ambition to become a top 3 player in the data catalog space. Our teams span across two continents, fostering a dynamic international spirit that drives our innovation and growth.
Our mission: To lead the revolution in modern business data catalogs by empowering data professionals and business users through AI-driven data governance. As we expand rapidly in the US, we aim to set the standard for data and AI solutions, positioning ourselves as a trusted partner for navigating the complexities of a data-centric world across industries. Our vision is to drive impactful change and growth, particularly in the US market, as we shape the future of data governance on a global scale.
Our values: Be intentional. Be clear. Be bold. Be humble.
Responsibilities:
- Expertise in Generative AI: Possess a solid understanding of generative AI models and techniques, including but not limited to LLMs, RAG architecture, and agents.
- MLOps Pipeline Development: Design, build, and maintain robust and scalable MLOps pipelines for the training, testing, deployment, and monitoring of AI/ML models. This role specifically emphasizes generative AI models, including fine-tuning,
- LLM inference (VLLM), and maintaining open-source LLM models (e.g., Llama 3.1).
- DevOps Implementation: Implement the best DevOps practices for continuous integration and continuous delivery (CI/CD) of AI/ML models.
- Cloud Infrastructure Management: Manage and optimize cloud infrastructure on GCP for AI/ML workloads, with a strong focus Kubernetes Engine and Helm for efficient deployment and configuration management.
- Monitoring and Optimization: Develop and implement monitoring and alerting systems to ensure model performance, reliability, and cost-effectiveness. Analyze model performance and identify areas for optimization.
- Collaboration and Communication: Collaborate effectively with data scientists, engineers, and product managers to deploy and maintain AI/ML solutions.
- Communicate technical concepts clearly and concisely to both technical and non-technical audiences.
Ideal Candidate:
- Master’s degree or equivalent from an engineering school in Computer Science, Data Science, or a related field.
- Strong programming skills, particularly in Python. Familiarity with relevant libraries/frameworks (e.g. vLLM , PyTorch, Hugging Face, Scikit-Learn).
- Extensive experience with Google Cloud Platform and its AI/ML services.
- Strong experience with GenAI service, in particular Llama3 & vLLM.
- Strong experience of containerization technologies like Docker & Kubernetes.
- Proven experience in creating and managing CI/CD pipelines for AI/ML models.
- Proficiency in version control systems (Git) and effective terminal operations management.
- Experience in data manipulation and analysis (e.g., Pandas, Spark, duckDB).
- Strong written and verbal communication skills, including the ability to present complex technical concepts to both technical and non-technical audiences.
What We offer:
- Flexible working hours (forfait jour).
- The opportunity to join a pioneering French startup in its market 🚀.
- Competitive compensation according to your experience and potential.
- Health insurance (Apicil), meal vouchers (Swile card of 9 €/day), and 50% reimbursement of transportation costs.
- A friendly and welcoming work environment, with o\ices in the heart of Lyon, 10-15 minutes from train stations.
- Quarterly team events and seminars to strengthen team cohesion and celebrate success.
What We Do
An established leader in Europe, growing rapidly and operating worldwide, DataGalaxy offers a user-centric platform dedicated to metadata mapping, active metadata management, and metadata knowledge sharing. With its innovative approach to data cataloging, DataGalaxy helps businesses of all sizes gain control over their data assets and make better, more informed decisions. Govern, organize and curate millions of different assets with minimum effort! Our user centric data catalog that blends the most powerful augmented data stewardship experience with crowd-sourced business knowledge. We also offer a spectrum of integrations so that you can map out your data landscape with ease. Contact us today to find out more about what DataGalaxy can do for you.
Learn more about DataGalaxy data lineage tools, data management software, and business glossary software.