Data Engineer, AI (Hybrid)

Posted 3 Days Ago
Be an Early Applicant
Bangalore, Bengaluru Urban, Karnataka, IND
Hybrid
Senior level
Digital Media • Internet of Things • Software
The Role
Build, operate, and optimise ELT/ETL pipelines into a Databricks Lakehouse. Implement data models, quality controls, observability, and enable AI/ML by delivering governed features and training datasets while ensuring security, privacy, and governance.
Summary Generated by Built In

Job Title 

Data Engineer, AI (Hybrid)

Job Description

The Data Engineer (India Offshore) is responsible for building, maintaining, and optimising data  pipelines and transformation processes that deliver trusted, analytics-ready datasets for Kaplan  Australia. This role will be the primary contributor driving data transformation into the Lakehouse  (Databricks), ensuring data is modelled, curated, and made available for reporting, analytics, and AI/ML  use cases. 

Reporting to the Senior Solutions Architect, the role contributes hands-on engineering expertise across  ingestion, transformation, orchestration, and monitoring, and supports the enablement of AI/ML by  providing well-governed features and training datasets.
 

Key Responsibilities 

Data Engineering Delivery (Pipelines, Transformations, and Data Modelling) Design, build, and operate reliable ELT/ETL pipelines to ingest, transform, and curate data into  analytics-ready datasets. 

  • Implement and maintain data models (dimensional and/or lakehouse curated layers) with clear  semantic definitions to support reporting, analytics, and AI use cases. 

  • Build data quality controls (validation, reconciliation, and automated checks) and implement  observability (monitoring, alerting, and SLAs) for pipelines and jobs. 

  • Optimise performance and cost through efficient compute usage, incremental processing,  partitioning, and tuning of Spark/SQL workloads. 

Lakehouse / Databricks Transformation Ownership 

  • Be the primary engineer responsible for implementing and evolving the Lakehouse  transformation layer in Databricks. 

  • Build and maintain Databricks workflows, notebooks, and jobs using Spark/SQL/Python, applying reusable patterns and standards. 

  • Contribute to Lakehouse design decisions (data layout, medallion architecture, Delta patterns,  and access strategies) in collaboration with the Head of AI and platform stakeholders. 3. AI / ML Engineering Enablement 

  • Work within the AI team to support delivery of AI/ML initiatives through hands-on engineering  (data preparation, pipeline build-out, and integration into production patterns). • Collaborate on GenAI projects, including prompt engineering iteration support (test cases,  evaluation datasets, and telemetry/metrics to assess prompt performance).

  • Ability to translate AI, analytics, and business requirements into scalable data engineering  solutions in collaboration with AI team, product owners, and domain stakeholders. 

  • Help curate and maintain governed feature/training datasets and ensure reproducibility through  versioning, documentation, and agreed engineering standards. 

Information Security, Privacy, and Governance 

  • Apply security-by-design principles across ingestion and transformation (least privilege, secure  secrets handling, environment separation). 

  • Implement governance requirements including data classification, retention, lineage, and access  controls in line with enterprise standards. 

  • Support audits and assurance activities by providing evidence of controls, data handling  practices, and operational procedures.

Minimum Qualifications

  • Demonstrated 5+ years’ experience in data engineering, building production-grade data  pipelines and transformation layers. 

  • Strong hands-on experience with Databricks and Lakehouse patterns (e.g., medallion  architecture) including Spark and SQL.  Proficiency in Python and SQL for data engineering (transformations, testing, and automation).

  • Experience working with cloud‑based data platforms (AWS) and integrated data storage,  orchestration, and analytics services. 

  • Strong understanding of data modelling (dimensional modelling, curated layers, semantic  definitions) and data quality management. 

  • Working knowledge of data governance, privacy, and information security practices (access  controls, PII handling, lineage, retention). 

  • Strong problem-solving skills, attention to detail, and ability to work autonomously with  distributed stakeholders. 

  • Experience with Jira, Confluence, and Microsoft 365 tools; comfortable working across Agile  delivery practices. 

Highly Desirable 

  • Experience enabling AI/ML use cases (feature engineering datasets, training/serving data  considerations, reproducibility). 

  • Experience supporting GenAI / LLM solutions, including RAG-style data preparation (document  ingestion, chunking, metadata enrichment) and evaluation dataset creation. 

  • Experience collaborating on prompt engineering and prompt evaluation cycles, including  collecting ground truth, defining test cases, and analysing prompt performance telemetry. • Understanding of vector search / embeddings concepts and data requirements for semantic  retrieval. 

  • Experience delivering data solutions within higher education or regulated environments.

Compliance & Governance Obligations

  • Demonstrate company values and contribute to a kind, safe, supportive and collaborative  workplace.  

  • Adherence to all Kaplan policies and procedures. 

  • Compliance with Workplace Health & Safety legislation and requirements. 

  • Completion of all mandatory training as required by Kaplan. 

  • Compliance with information security, privacy, data governance and risk management  frameworks.

Location

Bangalore, KA, India

Additional Locations 

Employee Type

Employee

Job Functional Area 

IT Development

Business Unit

00092 Kaplan Health

Diversity & Inclusion Statement:

Kaplan is committed to cultivating an inclusive workplace that values diversity, promotes equity, and integrates inclusivity into all aspects of our operations. We are an equal opportunity employer and all qualified applicants will receive consideration for employment regardless of age, race, creed, color, national origin, ancestry, marital status, sexual orientation, gender identity or expression, disability, veteran status, nationality, or sex. We believe that diversity strengthens our organization, fuels innovation, and improves our ability to serve our students, customers, and communities. Learn more about our culture here.

Kaplan considers qualified applicants for employment even if applicants have an arrest or conviction in their background check records. Kaplan complies with related background check regulations, including but not limited to, the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act.  There are various positions where certain convictions may disqualify applicants, such as those positions requiring interaction with minors, financial records, or other sensitive and/or confidential information.

Kaplan is a drug-free workplace and complies with applicable laws. 

Skills Required

  • 5+ years' experience in data engineering, building production-grade data pipelines and transformation layers.
  • Hands-on experience with Databricks and Lakehouse patterns (medallion architecture) including Spark and SQL.
  • Proficiency in Python and SQL for data engineering (transformations, testing, automation).
  • Experience working with cloud-based data platforms (AWS) and integrated storage/orchestration services.
  • Understanding of data modelling (dimensional modelling, curated layers, semantic definitions) and data quality management.
  • Experience building data quality controls, observability (monitoring, alerting, SLAs), and performance/cost optimisation of Spark/SQL workloads.
  • Working knowledge of data governance, privacy, and information security practices (access controls, PII handling, lineage, retention).
  • Experience with Jira, Confluence, Microsoft 365 and working across Agile delivery practices.
  • Experience enabling AI/ML use cases (feature engineering, training/serving data considerations, reproducibility).
  • Experience supporting GenAI/LLM solutions, including RAG-style data preparation and evaluation dataset creation.
  • Experience collaborating on prompt engineering and prompt evaluation cycles.
  • Understanding of vector search / embeddings concepts and semantic retrieval data needs.
  • Experience delivering data solutions within higher education or regulated environments.
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Arlington, Virginia
143 Employees
Year Founded: 1947

What We Do

Headquartered in Arlington, Va., Graham Holdings Company (NYSE:GHC) is a diversified holding company whose operations include: educational services, home health and hospice care, television broadcasting, online, print and local TV news, automotive dealerships, manufacturing, hospitality, consumer internet companies, digital marketing, and other emerging operations. Graham Holdings Company delivers quality products and services to today’s students, viewers, customers, patients, and advertisers. What unites our Company is a commitment to excellence across all of our business lines.

Similar Jobs

Boeing Logo Boeing

Software Engineer

Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
170000 Employees

Boeing Logo Boeing

Lead Programmer Analyst (.Net Full Stack)

Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
170000 Employees

Boeing Logo Boeing

Architect

Aerospace • Information Technology • Software • Cybersecurity • Design • Defense • Manufacturing
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
170000 Employees

Wells Fargo Logo Wells Fargo

Operations Associate

Fintech • Financial Services
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
205000 Employees

Similar Companies Hiring

Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account