Sr. Databricks Azure Data Engineer

Posted 4 Days Ago
Be an Early Applicant
Islamabad
In-Office
Senior level
Sharing Economy
The Role
Design, build, and maintain automated testing frameworks and a Python SDK for data integration and parity in Industrial IoT/Smart Buildings. Implement scalable Spark and Flink data solutions, enforce data quality rules, optimize Azure/Databricks/Delta Lake resources, maintain CI/CD pipelines, and promote governance, security, and best practices within Agile teams.
Summary Generated by Built In

About Fusemachines

Fusemachines is a leading AI strategy, talent, and education services provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, the United States, Canada, and the Dominican Republic) and more than 450 full-time employees, Fusemachines brings global AI expertise to transform companies worldwide. Founded in 2013, Fusemachines is a global provider of enterprise AI products and services, on a mission to democratize AI. Leveraging proprietary AI Studio and AI Engines, the company helps drive the clients’ AI Enterprise Transformation, regardless of where they are in their Digital AI journeys. With offices in North America, Asia, and Latin America, Fusemachines provides a suite of enterprise AI offerings and specialty services that allow organizations of any size to implement and scale AI. Fusemachines serves companies in industries such as retail,  manufacturing, and government.

Fusemachines continues to actively pursue the mission of democratizing AI for the masses by providing high-quality AI education in underserved communities and helping organizations achieve their full potential with AI.

 

Type: Remote, Full-time

 

Job Description:

This is a role responsible for designing, building, and maintaining the automated testing frameworks and Python SDKs required for data integration, parity verification, and user accessibility in the Industrial IoT / Smart Buildings domain.

We are seeking a Senior Data Engineer with strong Python and PySpark skills, and proven implementation of automated Data Quality frameworks and CI/CD pipelines, delivering Data and Analytics products using Agile methodology. The ideal candidate will possess strong technical, analytical, and interpersonal skills.

Qualification / Skill Set Requirement:

  • 5+ years of hands-on data engineering experience with deep expertise in the Azure ecosystem.
  • Expert Python skills (building Libraries/SDKs, Py4J).
  • Strong experience with PySpark and DataFrames.
  • Experience with Automated Testing frameworks (PyTest, JUnit) for data pipelines.
  • Experience with Data Quality tools (Great Expectations, Deequ).
  • Deep understanding of Apache Spark Internals (Catalyst Optimizer, Logical Plans).
  • Experience with Databricks and Delta Lake optimization.
  • Solid understanding of SDLC and Agile methodologies with hands-on experience in Azure DevOps, GitHub, CI/CD, and artifact management.
  • Skilled in data modeling, data design, and data warehousing solutions on Azure Databricks.
  • Knowledge of data governance, and security best practices within Azure (AD, NSG, encryption, compliance).
  • Certifications preferred: Azure Fundamentals, Azure Data Engineer Associate, Databricks Certified Data Engineer Professional and Azure Solutions Architect Expert (nice to have).

Responsibilities

  • Implement scalable and efficient data solutions on Spark and Flink.
  • Develop a Python SDK that wraps core libraries for use in Jupyter Notebooks by data scientist and analyst..
  • Build automated Parity Test Suite: CI/CD pipeline that runs queries against "Golden Datasets" in both Spark and Flink and asserts equality.
  • Implement Data Quality (DQ) Rules configuration: Defining thresholds for "Completeness" and "Validity" that can be enforced.
  • Manage and optimize Azure and Databricks resources, for performance, reliability, and cost-efficiency.
  • Transform, clean, and prepare data using SQL, Python and Java.
  • Monitor and fine-tune workloads and pipelines for optimal performance and reliability.
  • Maintain CI/CD pipelines.
  • Maintain clear documentation of solutions, configurations, and workflows.
  • Actively participate in Agile team activities and continuous improvement initiatives.
  • Promote and enforce data engineering best practices, including data governance, security, and data quality.

Fusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws.

Top Skills

Python,Pyspark,Py4J,Apache Spark,Catalyst Optimizer,Databricks,Delta Lake,Apache Flink,Pytest,Junit,Great Expectations,Deequ,Azure,Azure Devops,Github,Ci/Cd,Artifact Management,Sql,Java,Jupyter Notebooks,Azure Active Directory (Ad),Network Security Group (Nsg),Encryption
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York City, NY
428 Employees
Year Founded: 2013

What We Do

A 10+ year old AI company offering cutting-edge AI products and solutions across industries.

With over a decade of experience, we help companies in their AI Transformation journey with our suite of AI Products and AI Solutions supported by our global AI Talent from underserved communities.

On a mission to #DemocratizeAI, we aim to bridge the gap between AI advancement and global impact, bringing the most advanced technology solutions to the world.

Similar Jobs

Motive Logo Motive

Operations Analyst

Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Easy Apply
In-Office
Islamabad, PAK
4000 Employees

DigitalOcean Logo DigitalOcean

Growth Marketing Manager

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
In-Office
Karachi, Sindh, PAK
1400 Employees

Mondelēz International Logo Mondelēz International

Commercial Finance Analyst

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Hybrid
Karachi, Sindh, PAK
90000 Employees

Ericsson Logo Ericsson

Implementation Manager

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)
In-Office or Remote
2 Locations
89000 Employees

Similar Companies Hiring

Cargill Thumbnail
Food • Greentech • Logistics • Sharing Economy • Transportation • Agriculture • Industrial
Wayzata, MN
155000 Employees
Taskrabbit Thumbnail
eCommerce • Information Technology • Sharing Economy • Software
San Franscisco, CA
450 Employees
Federal Reserve Bank of Chicago Thumbnail
Social Impact • Sharing Economy • Payments • Fintech • Agency
Chicago, IL
1515 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account