GenAI Data ETL Engineer

Reposted 14 Days Ago
Be an Early Applicant
2 Locations
Hybrid
Mid level
Cloud • Information Technology
The Role
The Data ETL Engineer will develop ETL processes using AWS Glue, manage data cataloging, design data models, and ensure data quality.
Summary Generated by Built In
AHEAD builds platforms for digital business. By weaving together advances in cloud infrastructure, automation and analytics, and software delivery, we help enterprises deliver on the promise of digital transformation.

At AHEAD, we prioritize creating a culture of belonging, where all perspectives and voices are represented, valued, respected, and heard. We create spaces to empower everyone to speak up, make change, and drive the culture at AHEAD. 

We are an equal opportunity employer, and do not discriminate based on an individual's race, national origin, color, gender, gender identity, gender expression, sexual orientation, religion, age, disability, marital status, or any other protected characteristic under applicable law, whether actual or perceived. 

We embrace all candidates that will contribute to the diversification and enrichment of ideas and perspectives at AHEAD. 

We are seeking a GenAI Data Engineer – Data Integration & Retrieval to design, build, and operate the data pipelines that power our LLM‑based applications, agents, and analytics. This role sits at the intersection of data engineering and generative AI, with a focus on turning messy, distributed enterprise data into high‑quality context for retrieval‑augmented generation (RAG), copilots, and intelligent automation.
You will partner closely with the Platform and Use Cases Teams, GenAI/ML engineers, and business stakeholders to deliver robust, observable, and future‑proof data flows that keep us ahead of where the industry is going.

Key Responsibilities

    GenAI / RAG Data Pipeline Development
  • Design, develop, and maintain ETL/ELT pipelines that ingest structured and unstructured data (databases, documents, tickets, logs, wikis, APIs, SaaS apps) into vector stores, search indexes, and feature tables that power GenAI use cases.
  • Implement document and record transformations including chunking, metadata enrichment, normalization, deduplication, and PII redaction for safe and high‑quality LLM context.
  • Build and evolve semantic data models that reflect how LLMs consume context (e.g., knowledge domains, entities, relationships, access controls) rather than only traditional star schemas.
  • Optimize pipelines for performance, reliability, and cost (incremental loads, CDC, partitioning, caching, adaptive refresh strategies) in support of low‑latency GenAI experiences.
  • Implement data quality checks and evaluations tailored to GenAI workloads (e.g., coverage of knowledge domains, freshness, retrieval accuracy, hallucination risk signals).
  • LLM & Integration Engineering
  • Design and implement system‑to‑system integrations that consolidate context for GenAI from SaaS platforms and internal systems (CRM, ITSM/ticketing, ERP, knowledge bases, collaboration tools).
  • Work with GenAI engineers to wire data pipelines into LLM orchestration flows (e.g., RAG, tools/agents, workflows), ensuring clean interfaces and robust contracts.
  • Build and maintain prompt/response logging, retrieval traces, and feedback capture to enable experimentation, evaluation, and continuous improvement.
  • Ensure integrations and pipelines are secure, auditable, and compliant, including access controls, row/column‑level permissions, and policy‑driven redaction for LLM consumption.
  • Collaborate with application and platform teams to define SLAs, schemas, and APIs for data contracts that support GenAI services.
  • Operations, Monitoring, and Documentation
  • Set up scheduling, orchestration, and workflow management for GenAI data pipelines (e.g., Airflow, Prefect, Dagster, cloud‑native orchestrators).
  • Implement observability for data and retrieval: pipeline health, data freshness, vector store/index stats, retrieval coverage, and failure modes that impact LLM behavior.
  • Diagnose and resolve pipeline and integration issues, performing root‑cause analysis across data sources, transformations, and downstream GenAI applications.
  • Maintain clear documentation of data flows, lineage, schemas, mappings, and runbooks, with a focus on how they support specific GenAI use cases.
  • Partner with data governance and architecture to enforce naming standards, lineage, and metadata practices that enable safe and explainable GenAI.

Education

  • Minimum Required: Bachelor’s degree in Computer Science, Information Systems, or similar

Skills Required

  • 5+ years of experience in data engineering, ETL/ELT development, or data integration roles.
  • Strong SQL skills (complex joins, window functions, performance tuning) across analytical and operational workloads.
  • Hands‑on experience with at least one modern data pipeline / transformation framework (e.g., dbt, Airflow/Prefect/Dagster, cloud‑native ETL, or custom Python/SQL pipelines).
  • Experience building and maintaining data pipelines on cloud data platforms (e.g., Snowflake, BigQuery, Redshift, Synapse, or equivalent).
  • Proficiency in Python (preferred) or another programming language commonly used in data workflows (e.g., Java, Scala), including working with APIs and JSON.
  • Experience working with REST APIs, webhooks, JSON, CSV, and other common integration formats.
  • Solid understanding of data modeling and integration concepts (relational modeling, denormalization, CDC, event‑driven or log‑based ingestion).
  • Familiarity with version control (Git) and standard software engineering practices (code review, branching strategies, CI/CD basics).
  • Demonstrated exposure whether in personal or work projects to LLMs / GenAI (personal projects, pilots, or production systems).

Preferred Skills

  • Experience with LLM‑centric data patterns, such as retrieval‑augmented generation (RAG), semantic search, or document intelligence.
  • Hands‑on experience with vector databases or search technologies (e.g., Pinecone, Weaviate, pgvector, OpenSearch, Elasticsearch, Vespa).
  • Experience with workflow orchestration tools (e.g., Apache Airflow, Prefect, Dagster, Azure Data Factory, AWS Glue workflows).
  • Exposure to message‑based or streaming integrations (e.g., Kafka, Kinesis, Pub/Sub, EventBridge) for near real‑time data and event feeds into GenAI systems.
  • Experience in data quality and observability (e.g., Great Expectations, Monte Carlo, Soda, or custom checks/alerts).
  • Knowledge of at least one cloud platform (AWS, Azure, GCP) and its data/AI services (e.g., object storage, serverless compute, managed warehouses, managed LLMs or embeddings).
  • Familiarity with security and compliance concepts: data classification, encryption, access controls, secrets management, and safe handling of PII/regulated data.

Nice to Have

  • Experience partnering with ML/GenAI teams, including feature pipelines, evaluation datasets, or MLOps practices.
  • Experience with BI / analytics tools (e.g., Power BI, Tableau, Looker) and understanding how analytical needs intersect with GenAI use cases.
  • Background with data catalogs, lineage tools, or knowledge graphs that help organize enterprise knowledge for GenAI.

Why AHEAD:

Through our daily work and internal groups like Moving Women AHEAD and RISE AHEAD, we value and benefit from diversity of people, ideas, experience, and everything in between.

We fuel growth by stacking our office with top-notch technologies in a multi-million-dollar lab, by encouraging cross department training and development, sponsoring certifications and credentials for continued learning.

India Employment Benefits include: 
Comprehensive health insurance coverage for employees, with options to extend coverage to dependents
Paid time off and company holidays, along with additional leave benefits as per policy
Flexible work arrangements, supporting work-life balance
Learning and development opportunities to support continuous growth and upskilling
Employee wellness initiatives and programs focused on physical and mental well-being
Retirement and statutory benefits in line with India regulations
Inclusive and people-first culture, with a strong focus on collaboration and ownership

Top Skills

Aws Etl
Data Modeling
Deequ
Dms
Glue Data Catalog
Glue Python
Glue Spark
Glue Workflows
Great Expectations
Kinesis
Lake Formation
Msk
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Chicago, IL
1,154 Employees
Year Founded: 2007

What We Do

AHEAD builds platforms for digital business. By weaving together cloud infrastructure, intelligent operations, and modern applications, we help enterprises deliver on the promise of digital transformation.

Similar Jobs

ZS Logo ZS

Corporate Communications & Brand Associate

Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting
Hybrid
Gurugram, Haryana, IND
13000 Employees

ZS Logo ZS

Knowledge Services Associate - CoE

Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting
Hybrid
Gurugram, Haryana, IND
13000 Employees

Pfizer Logo Pfizer

Contractor Quality Lead

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Remote or Hybrid
126 Locations
121990 Employees

Ericsson Logo Ericsson

Architect

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)
In-Office
Gurgaon, Gurugram, Haryana, IND
88000 Employees

Similar Companies Hiring

Scrunch  Thumbnail
Artificial Intelligence • Information Technology • Marketing Tech • Software • SEO
Salt Lake City, Utah
Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account