PalUp

Data Engineer - Generative AI Pipelines

Reposted 22 Days Ago

Be an Early Applicant

Taipei

Mid level

Artificial Intelligence • Software

The Role

As a Data Engineer, you'll design and optimize data pipelines, develop data models, manage ETL/ELT processes, and analyze user data for insights.

Summary Generated by Built In

We’re looking for a Data Engineer —to build scalable data pipelines that power Generative AI applications like RAG, summarization, NER, and FAQ systems. You’ll design systems to scrape, ingest, transform, and enrich data at scale, ensuring it’s clean and optimized for AI/ML workflows.

What You’ll Do:

Build and optimize ETL/ELT pipelines for large-scale structured & unstructured data
Develop data enrichment workflows (entity extraction, embeddings, metadata tagging)
Manage data lakes, warehouses, and vector databases to support AI retrieval
Collaborate with ML engineers on AI-ready data infrastructure
Ensure pipeline reliability, scalability, and observability

Who You Are

5+ years in data engineering (Python, SQL, Spark/Dask/Ray)
Experience with web scraping frameworks (Scrapy, Playwright, Selenium)
Strong knowledge of cloud platforms (AWS/GCP/Azure) & orchestration tools (Airflow/Prefect/Dagster)
Familiarity with GCP storage solutions such as BigQuery, Cloud Storage etc.
Familiarity with vector search databases (Pinecone, Weaviate,, Elasticsearch)
Understanding of NLP concepts relevant to generative AI