Data Engineer

Posted 4 Hours Ago
Be an Early Applicant
18 Locations
In-Office or Remote
Mid level
Information Technology • Consulting
The Role
Lead foundational data-engineering work to validate and re-engineer pipelines for an anonymized, centralized credit data lake. Harmonize schemas across entities, build dbt models and tests, implement data-quality suites (Great Expectations), entity resolution, anonymization controls, optimize Spark/Glue jobs, orchestrate pipelines (Airflow/Step Functions), and produce documented, feature-ready datasets and runbooks for a regulated UK/Ireland lending environment.
Summary Generated by Built In
About the project (description, duration, stage)

Join Neurons Lab as a Data Engineer on a new engagement with a regulated UK & Ireland credit and lending company. The client has lifted data from multiple business entities into a newly centralized, anonymized data lake, but lacks the data-engineering depth to make it trustworthy and analytics-ready: current pipelines were assembled quickly (partly AI-assisted), and the descriptive statistics cannot yet be validated or reproduced.

You put that foundation on solid ground so the Data Science Lead can model on it with confidence — validate and re-engineer the pipelines, build the harmonization / semantic layer across entities, enforce data quality and lineage, and prepare clean, feature-ready datasets.

This is a foundational data-engineering role on a regulated data estate; data protection and reproducibility are the primary constraints on every decision.

Full-time engagement preferable.

What you'll actually do (example tasks)

  • Reproduce a descriptive-statistics report end-to-end so any figure traces back to raw source — closing the gap the client admitted (numbers they can't currently defend).

  • Profile and reconcile differing source schemas across acquired entities: map differing field names, types, encodings and business definitions for the same concept into one conformed model.

  • Build dbt staging → intermediate → mart models with tests; codify the harmonized definitions the Data Science Lead specifies.

  • Write Great Expectations suites (null / range / uniqueness / referential checks) and wire them into the pipeline so bad data fails loudly rather than silently corrupting analysis.

  • Implement entity / identity resolution (deterministic + fuzzy matching) where there is no clean shared key for the same customer or account across sources.

  • Implement and verify anonymization / pseudonymization (hashing / tokenization / k-anonymity) and evidence that re-identification risk is controlled for the client's IT / compliance team.

  • Optimize Spark / Glue jobs over tens of millions of rows — partitioning, file formats (Parquet), incremental loads, cost control.

  • Orchestrate with Airflow / Step Functions; build repeatable, scheduled pipelines rather than one-off scripts.

  • Prepare clean, documented, feature-ready datasets for the PD / delinquency models.

  • Document runbooks so the offshore team can operate the pipelines and handover takes days, not weeks; help scope onboarding of the remaining (Ireland + additional) sources.

Skills

  • Strong SQL and Python for large-scale data processing

  • AWS data stack: S3, Glue, Lake Formation, Athena / Redshift, EMR / Spark, Step Functions / Airflow

  • Data modeling & semantic layer (dbt or equivalent); dimensional modeling

  • Entity resolution / record linkage across heterogeneous sources

  • Data-quality & testing frameworks (Great Expectations, dbt tests) and data lineage

  • Anonymization / pseudonymization techniques and their analytical trade-offs

  • Big-data processing (Spark) with performance and cost optimization at scale

  • Clear written / verbal English; documents for handover and works well with a distributed team

Knowledge

  • GDPR fundamentals as applied to anonymized / pseudonymized financial data and UK / EU data residency

  • AWS Well-Architected (Analytics, Security) for BFSI

  • Awareness of credit / risk data structures and what downstream modeling consumers need — a plus

Experience

  • 4+ years in data engineering, with strong AWS + Spark / SQL at scale

  • Demonstrated experience harmonizing / integrating data across multiple source systems

  • Experience building validated, reproducible pipelines in a regulated environment (BFSI, healthcare, government) — strong plus

  • Comfortable stepping into a messy, partly-built data estate and bringing it up to standard

  • Comfortable as the sole or lead data engineer on a small (3–4 person) delivery pod

Skills Required

  • 4+ years in data engineering with strong AWS, Spark and SQL at scale
  • Strong SQL for large-scale data processing
  • Strong Python for large-scale data processing
  • AWS data stack: S3, Glue, Lake Formation, Athena, Redshift, EMR/Spark, Step Functions, Airflow
  • dbt (or equivalent) and dimensional/data modeling (staging -> intermediate -> mart)
  • Great Expectations and dbt tests or equivalent data-quality/testing frameworks
  • Entity resolution / record linkage (deterministic and fuzzy matching)
  • Anonymization / pseudonymization techniques (hashing, tokenization, k-anonymity) and re-identification risk controls
  • Experience optimizing Spark / Glue jobs for performance and cost (partitioning, Parquet, incremental loads)
  • Ability to document runbooks and hand over operational pipelines to distributed/offshore teams
  • Experience building validated, reproducible pipelines in regulated environments (BFSI, healthcare, government)
  • Knowledge of GDPR fundamentals and UK/EU data residency considerations
  • Awareness of AWS Well-Architected (Analytics, Security) for BFSI
  • Comfort as sole or lead data engineer on a small delivery pod
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
London
54 Employees
Year Founded: 2019

What We Do

Your Path to Enterprise AI Starts Here. Neurons Lab delivers AI transformation services to guide enterprises into the new era of AI. Our approach covers the complete AI spectrum, combining leadership alignment with technology integration to deliver measurable outcomes. As an AWS Advanced Partner and GenAI competency holder, we have successfully delivered tailored AI solutions to over 100 clients, including Fortune 500 companies and governmental organizations

Similar Jobs

Remote
27 Locations
30 Employees

Vimachem - IIoT Pharma 4.0 AI Platform Logo Vimachem - IIoT Pharma 4.0 AI Platform

Data Engineer

Artificial Intelligence • Software • Biotech • Pharmaceutical
Remote
Greece
91 Employees

Turner & Townsend Logo Turner & Townsend

Data Engineer

Professional Services • Real Estate • Consulting
Remote or Hybrid
27 Locations
17263 Employees

Satori Analytics Logo Satori Analytics

Big Data Engineer

Artificial Intelligence • Information Technology • Machine Learning • Software • Analytics
Remote
Greece
94 Employees

Similar Companies Hiring

Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account