NDA Databricks-native platform | Data Engineer

Posted 2 Hours Ago
Be an Early Applicant
Hiring Remotely in Argentina
Remote
Senior level
HR Tech • Information Technology • Professional Services
The Role
Design and build scalable batch and streaming data pipelines on the Databricks Lakehouse to support entity resolution, MDM, data quality, and real-time analytics. Optimize Spark/PySpark workloads, implement CDC/CDF patterns and Delta Live Tables, create reliable idempotent pipelines, design data models (Kimball, SCDs), and collaborate with AI/ML Engineers and Data Scientists while promoting best practices (testing, version control, monitoring).
Summary Generated by Built In

On behalf of NDA, Databricks-native platform, SD Solutions is looking for an experienced Data Engineer to design and build the data infrastructure that powers our Master Data Management platform, built natively on the Databricks Data Intelligence Platform.

In this role, you will develop scalable, high-performance data pipelines that support entity resolution, data quality, and real-time analytics!

You will take a hands-on role in building and optimizing batch and streaming pipelines using modern Lakehouse technologies, including Delta Live Tables and Change Data Capture patterns. This includes ensuring data reliability, consistency, and performance through robust pipeline design, testing, and optimization of Spark workloads.

Working closely with AI/ML Engineers, Product Managers, and Data Scientists, you will translate data requirements into efficient data models and pipelines that enable intelligent features and analytics. You will also contribute to best practices across data engineering, including monitoring, version control, and automated testing.

This is a highly self-directed role suited for someone who thrives in a fast-paced environment, where building scalable data systems and ensuring data quality at scale are central to success.

SD Solutions is a staffing company operating globally. Contact us to get more details about the benefits we offer.

Responsibilities:

  • Design and Develop Scalable Data Pipelines: Lead the design, development, and optimization of robust, high-performance data pipelines within the Databricks Lakehouse Platform to support LakeFusion's core functionalities, including entity resolution, data quality, and analytical reporting.
  • Implement Real-time Data Ingestion: Build and manage streaming data pipelines using Delta Live Tables (DLT) and other Databricks capabilities for real-time data ingestion, transformation, and processing, leveraging Change Data Capture (CDC) and Change Data Feed (CDF) patterns.
  • Optimize Spark Workloads: Apply advanced PySpark best practices and Spark optimization techniques to ensure efficient processing of large-scale datasets, reducing latency and cost for batch and streaming operations.
  • Ensure Data Reliability and Quality: Develop pipelines with a strong focus on reliability, testability, and data quality. Implement idempotent designs to guarantee data consistency and accuracy across all data flows.
  • Data Modeling and Architecture: Design and implement logical and physical data models for the Lakehouse, including dimensional modeling (e.g., Kimball methodology) and handling Slowly Changing Dimensions (SCDs), to support analytical and operational needs.
  • Collaborate on Data Solutions: Work closely with AI/ML Engineers, Product Managers, and Data Scientists to understand data requirements, integrate new data sources, and provide foundational data infrastructure for LakeFusion's intelligent features.
  • Promote Best Practices: Advocate for and implement best practices in data engineering, including version control, automated testing, monitoring, and alerting for data pipelines.

    Requirements:

    • 5+ years of hands-on experience as a Data Engineer or in a similar role, specifically building and managing large-scale data platforms and pipelines in a production environment.
    • Deep expertise with the Databricks Lakehouse Platform, including extensive experience with Delta Lake, Databricks SQL, Unity Catalog, and Databricks Workflows.
    • Proven proficiency in building and optimizing data pipelines using Apache Spark, particularly with PySpark for complex data transformations and processing.
    • Demonstrated experience with streaming data technologies and building real-time pipelines, ideally using Delta Live Tables (DLT).
    • Strong understanding and practical application of Change Data Capture (CDC) and Change Data Feed (CDF) patterns for incremental data loading.
    • Solid foundation in data modeling concepts, including dimensional modeling (Kimball) and techniques for managing Slowly Changing Dimensions (SCDs).
    • Experience in designing and implementing reliable, testable, and idempotent data pipelines, ensuring data quality and consistency.
    • Familiarity with data governance, metadata management, and data cataloging principles.
    • Excellent problem-solving skills and the ability to debug complex data issues across distributed systems.
    • Strong communication skills, capable of articulating complex technical concepts to both technical and non-technical stakeholders.

      Advantages:

      • Specific experience with Entity Resolution or Master Data Management (MDM) systems and their underlying data structures.
      • Experience with cloud platforms (AWS, Azure) for data engineering deployments.
      • Knowledge of MLOps practices and integrating data pipelines with machine learning workflows.
      • Experience with CI/CD for data pipelines and infrastructure as code (e.g., Terraform).

        About the company:

        NDA is a Databricks-native platform that unifies master data, product data, and relationships into a single AI-ready foundation.
        One platform. Three sources of trust:
        • MDM (Trusted Entities): Customers, suppliers, and accounts mastered and governed across systems
        • Graph (Trusted Networks): Relationships and hierarchies connected for intelligence and AI reasoning
        • PIM (Trusted Products): SKUs and catalogs enriched and ready for commerce
        Built entirely on the Databricks, NDA eliminates data duplication and brings governance directly to where your data lives.
        Powered by LLMs, we deliver explainable entity resolution, automated stewardship, and trusted golden records at scale.

        By applying for this position, you agree to the terms outlined in our Privacy Policy. Please take a moment to review our Privacy Policy https://sd-solutions.breezy.hr/privacy-notice, and make sure you understand its contents. If you have any questions or concerns regarding our Privacy Policy, please feel free to contact us.

        Skills Required

        • 5+ years hands-on experience as a Data Engineer building and managing large-scale production data platforms and pipelines
        • Deep expertise with the Databricks Lakehouse Platform (Delta Lake, Databricks SQL, Unity Catalog, Databricks Workflows)
        • Proven proficiency building and optimizing data pipelines using Apache Spark, particularly PySpark
        • Experience with streaming data technologies and building real-time pipelines, ideally using Delta Live Tables (DLT)
        • Practical experience implementing Change Data Capture (CDC) and Change Data Feed (CDF) patterns
        • Strong understanding of data modeling concepts, including dimensional modeling (Kimball) and Slowly Changing Dimensions (SCDs)
        • Experience designing and implementing reliable, testable, and idempotent data pipelines ensuring data quality and consistency
        • Familiarity with data governance, metadata management, and data cataloging principles
        • Excellent problem-solving skills and ability to debug complex data issues across distributed systems
        • Strong communication skills to articulate technical concepts to technical and non-technical stakeholders
        • Experience with Entity Resolution or Master Data Management (MDM) systems and their data structures
        • Experience with cloud platforms (AWS, Azure) for data engineering deployments
        • Knowledge of MLOps practices and integrating data pipelines with machine learning workflows
        • Experience with CI/CD for data pipelines and infrastructure as code (e.g., Terraform)
        Am I A Good Fit?
        beta
        Get Personalized Job Insights.
        Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

        The Company
        452 Employees
        Year Founded: 2014

        Similar Jobs

        Domino Data Lab Logo Domino Data Lab

        Site Reliability Engineer

        Artificial Intelligence • Machine Learning
        Easy Apply
        Remote or Hybrid
        Argentina
        200 Employees

        TrueML Logo TrueML

        Senior Software Engineer

        Fintech • Machine Learning • Payments • Social Impact • Software • Financial Services
        In-Office or Remote
        3 Locations
        450 Employees
        75K-95K Annually

        Mondelēz International Logo Mondelēz International

        Analytics Manager

        Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
        Remote or Hybrid
        6 Locations
        90000 Employees

        Circle Logo Circle

        Senior Director of Ecosystem Growth, Andean and Southern Cone

        Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
        In-Office or Remote
        Buenos Aires, Ciudad Autónoma de Buenos Aires, ARG
        1050 Employees

        Similar Companies Hiring

        Scrunch  Thumbnail
        Artificial Intelligence • Information Technology • Marketing Tech • Software • SEO
        Salt Lake City, Utah
        Standard Template Labs Thumbnail
        Artificial Intelligence • Information Technology • Software
        New York, NY
        25 Employees
        Golden Pet Brands Thumbnail
        Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
        El Segundo, California
        178 Employees

        Sign up now Access later

        Create Free Account

        Please log in or sign up to report this job.

        Create Free Account