Data Engineer (PySpark)

Reposted 4 Days Ago
Be an Early Applicant
Hiring Remotely in Dubai, ARE
Remote
Senior level
Fintech • Payments • Software • Financial Services • Automation
The Role
Design, build, and maintain scalable PySpark ETL/ELT pipelines on Cloudera CDP; ingest, transform, and validate large datasets; optimise Spark jobs and Cloudera components; automate and orchestrate workflows; monitor production pipelines and provide support.
Summary Generated by Built In

We are seeking a highly skilled Data Engineer with strong expertise in PySpark and the Cloudera Data Platform (CDP). The ideal candidate will design, develop, and maintain scalable data pipelines while ensuring high data quality, performance, and availability across the organisation.

This role requires hands-on experience in big data ecosystems, cloud-native technologies, and advanced data processing frameworks. You will collaborate with cross-functional teams to build reliable and high-performance data solutions that drive business insights.

Key Responsibilities1. Data Pipeline Development
  • Design, develop, and maintain scalable ETL/ELT pipelines using PySpark on CDP
  • Ensure data integrity, reliability, and performance optimisation
2. Data Ingestion
  • Develop ingestion frameworks to collect data from relational databases, APIs, streaming sources, and file systems
  • Load structured and unstructured data into Data Lake/Data Warehouse environments
3. Data Transformation & Processing
  • Process, cleanse, and transform large-scale datasets using PySpark
  • Build reusable data processing components
4. Performance Optimisation
  • Tune Spark jobs and Cloudera components for optimal performance
  • Optimise memory, partitioning, and execution plans
  • Reduce ETL runtime and improve cluster efficiency
5. Data Quality & Validation
  • Implement data validation checks and monitoring mechanisms
  • Ensure end-to-end data quality and governance standards
6. Automation & Orchestration
  • Automate workflows using tools such as Apache Oozie, Apache Airflow, or similar orchestration frameworks
  • Maintain CI/CD integration for data pipelines
7. Monitoring & Support
  • Monitor pipeline health and troubleshoot failures
  • Provide production support and continuous improvements
Required Skills & Qualifications
  • 5+ years of experience in Data Engineering
  • Strong hands-on experience in PySpark
  • Experience working on Cloudera Data Platform (CDP)
  • Strong knowledge of Hadoop ecosystem (HDFS, Hive, Impala, YARN)
  • Proficiency in SQL and data modelling concepts
  • Experience with workflow orchestration tools (Airflow, Oozie, etc.)
  • Good understanding of data warehousing concepts
  • Experience with performance tuning and optimisation
Good to Have
  • Experience with cloud platforms (AWS, Azure, GCP)
  • Knowledge of streaming tools (Kafka, Spark Streaming)
  • Exposure to DevOps practices and CI/CD pipelines
  • Banking/Financial Services domain experience

Top Skills

Apache Airflow
Apache Oozie
APIs
Ci/Cd
Cloudera Data Platform (Cdp)
Data Lake
Data Warehouse
Hadoop
Hdfs
Hive
Impala
Kafka
Pyspark
Relational Databases
Spark
Spark Streaming
SQL
Yarn
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Dubai Internet City
160 Employees
Year Founded: 1997

What We Do

Global Software Solutions Group Veracious product line is a series of robust banking platforms that provide core banking, payment systems, custom process automation, and document management solutions for banks and financial institutions in Middle East & Africa. This cutting-edge product line features the Veracious Payments Hub, Digital Banking and the DMS, all built on the Torus Lowcode development platform software. Global Software Solutions Group is a software solutions provider that aims to solve mission-critical problems that financial institutions face today. Our software solutions bring together our Low Code platform, the payments product line and customized service offerings to solve mission-critical statements in core banking, payments, process automation, and document management. The Payments Hub is GSS's flagship product.

Similar Jobs

SteelSeries Logo SteelSeries

Marketing Manager

Gaming • Hardware • Software
Remote
Dubai, ARE
320 Employees

CrowdStrike Logo CrowdStrike

Sales Development Representative

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote or Hybrid
UAE
10000 Employees

MongoDB Logo MongoDB

Senior Solutions Architect

Big Data • Cloud • Software • Database
Easy Apply
Remote or Hybrid
Dubai, ARE
5550 Employees

SailPoint Logo SailPoint

Sales Executive

Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
Dubai, ARE
2461 Employees

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account