Data Engineer - Generative AI & Vector Systems

Posted 9 Hours Ago
Be an Early Applicant
2 Locations
In-Office
Senior level
Mobile • Software
The Role
Design and build scalable cloud-native data pipelines to ingest, clean, transform, chunk, embed, index, and serve enterprise data for LLM applications. Optimize vector database architectures, embedding generation, and RAG pipelines, collaborate with ML and software teams, implement Airflow orchestration, and ensure pipeline performance, monitoring, and reliability on AWS.
Summary Generated by Built In
About Mindera

At Mindera, we build high-performing, cross-functional teams that solve complex business challenges through technology. We partner with global clients to deliver innovative, scalable, and cloud-native solutions while fostering a collaborative engineering culture built on autonomy, ownership, and continuous learning.

We are looking for a Data Engineer with strong expertise in Generative AI data pipelines, Vector Databases, AWS, and modern data engineering technologies to help build next-generation AI-powered platforms. This role will focus on developing scalable pipelines for Large Language Model (LLM) applications, Retrieval-Augmented Generation (RAG), and Voice AI solutions.

Role Overview

As a Data Engineer (Generative AI & Vector Systems), you will be responsible for designing and building scalable data pipelines that prepare, transform, and index enterprise data for AI applications. You will work closely with AI Engineers, Data Scientists, Machine Learning Engineers, and Software Engineers to enable high-performance semantic search and retrieval systems using Vector Databases and cloud-native technologies.

This is an exciting opportunity to work on cutting-edge Generative AI solutions involving embeddings, vector search, RAG architectures, and large-scale cloud data processing.


RequirementsKey ResponsibilitiesAI Data Engineering
  • Design and develop scalable data ingestion pipelines for AI and ML applications.
  • Build automated pipelines to clean, transform, chunk, enrich, and load enterprise data into Vector Databases.
  • Create efficient workflows for generating embeddings from structured and unstructured data.
  • Optimize data quality for semantic search and retrieval systems.
Vector Database Engineering
  • Design and manage Vector Database architectures.
  • Optimize indexing, storage, metadata management, and retrieval performance.
  • Improve similarity search performance and retrieval latency.
  • Work with embedding models and vector search optimization techniques.
Data Pipeline Development
  • Develop production-grade ETL/ELT pipelines.
  • Build batch and near real-time ingestion pipelines.
  • Automate workflow orchestration using Apache Airflow.
  • Monitor pipeline performance and ensure high availability.
Cloud Data Engineering
  • Develop cloud-native solutions on AWS.
  • Work with services such as:
    • Amazon S3
    • AWS Glue
    • EMR
    • Lambda
    • Athena
    • IAM
    • CloudWatch
    • Optimize compute resources for large-scale AI workloads.
SQL & Data Processing
  • Write complex SQL queries across distributed data sources.
  • Use Starburst/Trino to federate and query multiple data platforms.
  • Design efficient data models for AI workloads.
  • Perform large-scale joins and transformations.
AI & Machine Learning Support
  • Work closely with AI engineers to support LLM-based applications.
  • Build Retrieval-Augmented Generation (RAG) pipelines.
  • Manage embedding generation and vector indexing.
  • Support prompt engineering and retrieval optimization initiatives.
Performance & Reliability
  • Improve data pipeline performance and scalability.
  • Implement monitoring, logging, and alerting.
  • Troubleshoot production data issues.
  • Ensure data integrity and governance.
Required Technical SkillsProgramming
  • Python (Advanced)
  • SQL (Advanced)
Data Engineering
  • ETL / ELT
  • Data Transformation
  • Data Cleansing
  • Data Chunking Strategies
  • Metadata Management
  • Data Modeling
Vector Databases

Hands-on experience with one or more:

  • Pinecone
  • Milvus
  • Qdrant
  • Chroma
  • Weaviate (Good to have)
  • FAISS (Good to have)
Workflow Orchestration
  • Apache Airflow
Cloud

Strong experience with AWS services:

  • Amazon S3
  • AWS Glue
  • Amazon EMR
  • Lambda
  • Athena
  • IAM
  • CloudWatch
Query Engines

Experience with:

  • Starburst
  • Trino
  • Presto (Good to have)
AI / Machine Learning

Working knowledge of:

  • Large Language Models (LLMs)
  • Text Embeddings
  • Semantic Search
  • Vector Search
  • Retrieval-Augmented Generation (RAG)
  • Prompt Engineering (basic understanding)
APIs & Integrations
  • REST APIs
  • JSON
  • Data Connectors
Required Experience
  • 4–8 years of experience in Data Engineering.
  • Experience building scalable cloud-native data platforms.
  • Hands-on experience with Vector Databases.
  • Experience working with enterprise-scale SQL environments.
  • Strong background in Python-based data engineering.
  • Experience building production-grade Airflow pipelines.
  • Familiarity with Generative AI architectures.
  • Experience supporting Machine Learning pipelines.

BenefitsWe offer
  • Flexible working hours (self-managed)
  • Annual bonus, subject to company performance
  • Access to Udemy online training and opportunities to learn and grow within the role

At Mindera we use technology to build products we are proud of, with people we love.

Software Engineering Applications, including Web and Mobile, are at the core of what we do at Mindera.

We partner with our clients, to understand their products and deliver high-performance, resilient and scalable software systems that create an impact on their users and businesses across the world.

You get to work with a bunch of great people, and the whole team owns the project together.

Our culture reflects our lean and self-organisation attitude.

We encourage our colleagues to take risks, make decisions, work in a collaborative way and talk to everyone to enhance communication. We are proud of our work and we love to learn all and everything while navigating through an Agile, Lean and collaborative environment.

Our offices are located: Porto, Portugal | Aveiro, Portugal | Coimbra, Portugal | Leicester, UK | San Diego, USA | Chennai, India | Bengaluru, India

Skills Required

  • Python (Advanced)
  • SQL (Advanced)
  • 4-8 years of experience in Data Engineering
  • Hands-on experience with Vector Databases (Pinecone, Milvus, Qdrant, Chroma)
  • Experience with Weaviate
  • Experience with FAISS
  • Production-grade Apache Airflow pipeline development
  • Experience with AWS services: S3, Glue, EMR, Lambda, Athena, IAM, CloudWatch
  • Experience with Starburst and Trino
  • Experience with Presto
  • Design and implement ETL/ELT, data transformation, cleansing, chunking, metadata management, and data modeling
  • Working knowledge of LLMs, text embeddings, semantic search, vector search, and RAG
  • Experience supporting machine learning pipelines and embedding generation
  • Experience with REST APIs, JSON, and data connectors
  • Experience working with enterprise-scale SQL environments and large-scale joins/transformations

Mindera Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Mindera and has not been reviewed or approved by Mindera.

  • Healthcare Strength Company materials highlight health insurance, medical assistance, and wellbeing programs across locations, with the UK explicitly offering private medical including mental‑health, optical, and dental coverage. Country pages position healthcare as a core element of the package.
  • Leave & Time Off Breadth Paid time off, sick leave, and flexible vacation policies are described as standard, with some locales referencing generous or unlimited PTO. Flexible schedules and remote options further support time away when needed.
  • Wellbeing & Lifestyle Benefits Flexibility, remote/hybrid work, and wellness initiatives (e.g., mental‑health sessions and onsite activities) are emphasized as part of the employee experience. Cultural perks such as team gatherings and trips add lifestyle value.

Mindera Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Leicester
490 Employees
Year Founded: 2014

What We Do

At Mindera we craft software with people we love. Software Engineering Applications, including Web and Mobile, are at the core of what we do at Mindera. You get to work with a bunch of great people, where the whole team owns the project together. Our culture reflects our lean and self-organization attitude. We encourage our colleagues to take risks, make decisions, work in a collaborative way and talk to everyone to enhance communication. We partner with our clients, to understand their product and deliver high performance, resilient and scalable software systems that create an impact in their users and businesses across the world Our offices are located in: Portugal | UK | USA | India | Romania | Brazil

Similar Jobs

Pfizer Logo Pfizer

Senior Director, Internal Medicine Portfolio Strategy Lead

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
In-Office or Remote
10 Locations
121990 Employees
215K-358K Annually

Pfizer Logo Pfizer

Director, Portfolio Strategy Inflammation & Immunology

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
In-Office or Remote
10 Locations
121990 Employees
177K-294K Annually

Pfizer Logo Pfizer

Associate II -Reg CMC Strategy

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
In-Office
Chennai, Tamil Nadu, IND
121990 Employees

Optum Logo Optum

Consultant

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office
Chennai, Tamil Nadu, IND
160000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account