Senior Data Engineer

Posted Yesterday
Be an Early Applicant
Riyadh, SAU
In-Office
Senior level
Gaming • Information Technology • Software
The Role
Own the end-to-end data layer for generative AI: build and run batch and streaming pipelines, transform raw to curated datasets, implement retrieval pipelines (parsing, chunking, embeddings, vector indexing), enforce schema/quality/lineage and access controls, add monitoring and cost controls, and document and review team standards.
Summary Generated by Built In

Our Generative AI products are only as good as the data behind them. This role owns that data layer from end to end: the pipelines that bring data in, the transformations that shape it, and the way it reaches retrieval systems, agents, and analytics. The work runs on AWS, and the aim is a single governed source that every consumer can rely on.

We want someone who has already built data pipelines for AI systems, not only for reporting. Preparing data for an LLM or an agent brings its own work around chunking, embeddings, indexing, and keeping content current, and you have done it before. The team is small and spans several languages, so you will own your pipelines and help set the standards the rest of us follow.

WHAT YOU WILL DO

  • Build and run the batch and streaming pipelines that move data from source systems into the lake and through to the warehouse, owning the layers in between from raw to curated, along with their schema, quality, and lineage.
  • Build the data layer behind retrieval: source connectors, document parsing, chunking, embedding generation, and vector indexing, including re-embedding when content changes.
  • Model curated, query-ready datasets and metrics so AI and analytics consumers work from one definition instead of each rebuilding the logic.
  • Add quality checks, validation, and monitoring so problems surface before they reach a model or a user.
  • Apply access control where it belongs: row and column level rules, PII handling, and entitlement-aware datasets, enforced as close to query time as the stack allows.
  • Work with the platform and DevOps engineers to expose data and retrieval as documented, dependable services.
  • Keep storage, compute, and query costs in check, with particular attention to the cost of embedding and vector workloads
  • Review code, write the documentation, and help shape how the team builds its data layer.

Requirements
  • Eight or more years in data engineering overall. That includes hands-on work building data for AI or ML systems such as retrieval, embeddings, or feature data, which can be a more recent part of your background.
  • Strong SQL and strong Python, including PySpark or similar distributed processing.
  • Production experience across the AWS data stack: S3 for the lake, Glue for ETL and the Data Catalog, Athena for serverless query, and Redshift as the warehouse.
  • Hands-on experience with a layered data architecture, whether you call it medallion (bronze, silver, gold), a data lake feeding a warehouse, or a lakehouse, including building the transformation stages that move data from raw to curated.
  • Experience with an ELT or integration tool such as Airbyte, Fivetran, or Meltano, including building or maintaining connectors.
  • Experience with event-driven pipelines using SQS and SNS, and with at least one streaming or change-data-capture technology such as Kinesis, Amazon MSK, or Debezium.
  • Hands-on experience with a semantic or metrics layer over the warehouse, such as Cube or the dbt Semantic Layer.
  • Hands-on experience with at least one vector store and embedding workflow: pgvector, Amazon OpenSearch, Pinecone, Weaviate, or Milvus.
  • Comfort with columnar and open table formats: Parquet together with Apache Iceberg, Delta Lake, or Hudi.
  • Working knowledge of an orchestrator such as Amazon MWAA, Step Functions, Dagster, or Prefect, and enough infrastructure as code to work closely with DevOps.

Skills Required

  • Eight or more years in data engineering
  • Hands-on experience building data pipelines for AI/ML systems (retrieval, chunking, embeddings, indexing)
  • Strong SQL
  • Strong Python, including PySpark or similar distributed processing
  • Production experience with AWS data stack: S3, Glue, Athena, Redshift
  • Hands-on experience with layered data architectures (medallion/lakehouse) and transformations from raw to curated
  • Experience with ELT/integration tools such as Airbyte, Fivetran, or Meltano
  • Experience with event-driven pipelines using SQS and SNS and with streaming/CDC tech (Kinesis, Amazon MSK, Debezium)
  • Hands-on experience with a semantic or metrics layer over the warehouse (e.g., Cube or dbt Semantic Layer)
  • Hands-on experience with vector stores and embedding workflows (pgvector, Amazon OpenSearch, Pinecone, Weaviate, Milvus)
  • Comfort with columnar and open table formats: Parquet and Apache Iceberg, Delta Lake, or Hudi
  • Working knowledge of orchestrators (Amazon MWAA, Step Functions, Dagster, Prefect) and infrastructure-as-code to collaborate with DevOps
  • Experience implementing data quality checks, validation, monitoring, schema management, lineage, and access controls (row/column-level, PII handling)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Boston
73 Employees
Year Founded: 2024

What We Do

Mirai is a Riyadh-based video games studio where technology and creative talent combine with industry experts to learn, develop, and shape the future of games.

Similar Jobs

BNY Logo BNY

Sales Engineer

Fintech • Financial Services
In-Office or Remote
2 Locations
41739 Employees

TAWANTECH Logo TAWANTECH

Senior Data Engineer

Fintech • Information Technology • Payments • Software • Financial Services
In-Office
Riyadh, SAU
39 Employees

Delivery Hero Logo Delivery Hero

Senior Data Engineer

Information Technology
In-Office
Riyadh, SAU
32902 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account