We’re looking for a Data Engineer —to build scalable data pipelines that power Generative AI applications like RAG, summarization, NER, and FAQ systems. You’ll design systems to scrape, ingest, transform, and enrich data at scale, ensuring it’s clean and optimized for AI/ML workflows.
What You’ll Do:
Build and optimize ETL/ELT pipelines for large-scale structured & unstructured data
Develop data enrichment workflows (entity extraction, embeddings, metadata tagging)
Manage data lakes, warehouses, and vector databases to support AI retrieval
Collaborate with ML engineers on AI-ready data infrastructure
Ensure pipeline reliability, scalability, and observability
Who You Are
5+ years in data engineering (Python, SQL, Spark/Dask/Ray)
Experience with web scraping frameworks (Scrapy, Playwright, Selenium)
Strong knowledge of cloud platforms (AWS/GCP/Azure) & orchestration tools (Airflow/Prefect/Dagster)
Familiarity with GCP storage solutions such as BigQuery, Cloud Storage etc.
Familiarity with vector search databases (Pinecone, Weaviate,, Elasticsearch)
Understanding of NLP concepts relevant to generative AI
Top Skills
What We Do
PalUp is an AI-powered social network that transforms how people connect, offering personalized, meaningful interactions for everyone.









