Data Engineer (Databricks)

Posted 8 Days Ago
5 Locations
Remote
Mid level
Information Technology • Software • Consulting
The Role
Develop AI-powered data mapping recommendation platform, automate data extraction and validation, build scalable data pipelines, and manage data governance.
Summary Generated by Built In

Solvd Inc. is a rapidly growing AI-native consulting and technology services firm delivering enterprise transformation across cloud, data, software engineering, and artificial intelligence. We work with industry-leading organizations to design, build, and operationalize technology solutions that drive measurable business outcomes.

Following the acquisition of Tooploox, a premier AI and product development company, Solvd now offers true end-to-end delivery—from strategic advisory and solution design to custom AI development and enterprise-scale implementation. Our capability centers combine deep technical expertise, proven delivery methodologies, and sector-specific knowledge to address complex business challenges quickly and effectively.

We are looking for a Data Engineer to develop an AI-powered data mapping recommendation platform to speed up the integration and validation of complex datasets. The system will automate data extraction, mapping, and validation processes.

What you'll do
  • Build and maintain scalable data pipelines with Databricks, Spark, and PySpark.

  • Manage data governance, security, and credentials using Unity Catalog and Secret Scopes.

  • Develop and deploy ML models with MLflow; work with LLMs and embedding-based vector search.

  • Apply ML/DL techniques (classification, regression, clustering, transformers) and evaluate using industry metrics.

  • Design data models and warehouses leveraging dbt, Delta Lake, and Medallion architecture.

  • Work with healthcare data standards and medical terminology mapping.

What you bring

Databricks expertise

Hands-on experience with the Databricks platform, including:

  • Unity Catalog: Managing data governance, access control, and auditing across workspaces.

  • Secret Scopes: Secure handling of credentials and sensitive configurations.

  • Apache Spark / PySpark: Writing performant, scalable distributed data pipelines.

  • MLflow: Managing ML lifecycle including experiment tracking, model registry, and deployment.

  • Vector Search: Working with vector databases or search APIs to build embedding-based retrieval systems.

  • LLMs (Large Language Models): Familiarity with using or fine-tuning LLMs in Databricks or similar environments.

Data Engineering skills

Experience designing and maintaining robust data pipelines:

  • Data Modeling & Warehousing: Dimensional modeling, star/snowflake schemas, SCD (Slowly Changing Dimensions).

  • Modern Data Stack: Familiarity with dbt, Delta Lake, and the Medallion architecture (Bronze, Silver, Gold layers).

Nice to have

Machine Learning knowledge

Strong foundation in machine learning is expected, including:

  • Traditional Machine Learning Techniques: Classification, regression, clustering, etc.

  • Model Evaluation & Metrics: Precision, recall, F1-score, ROC-AUC, etc.

  • Deep Learning (DL): Understanding of neural networks and relevant frameworks.

  • Transformers & Attention Mechanisms: Knowledge of modern NLP architectures and their applications.

Preferred domain knowledge

  • Experience with healthcare data standards and medical code systems such as eCQM, VSAC, RxNorm, LOINC, SNOMED, etc.

  • Understanding of medical terminology and how to map or normalize disparate coding systems.


Tech stack

Platforms & Tools: Databricks, Unity Catalog, Secret Scopes, MLflow

Languages & Frameworks: Python, PySpark, Apache Spark

Machine Learning & AI: Traditional ML techniques, Deep Learning, Transformers, Attention Mechanisms, LLMs

Search & Retrieval: Vector databases, embedding-based vector search

Data Engineering & Modeling: dbt, Delta Lake, Medallion architecture (Bronze/Silver/Gold), Dimensional modeling, Star/Snowflake schemas

Domain (Optional): Healthcare data standards (eCQM, VSAC, RxNorm, LOINC, SNOMED)

When you join Solvd, you'll…

  • Shape real-world AI-driven projects across key industries, working with clients from startup innovation to enterprise transformation.

  • Be part of a global team with equal opportunities for collaboration across continents and cultures.

  • Thrive in an inclusive environment that prioritizes continuous learning, innovation, and ethical AI standards.

Ready to make an impact?

If you're excited to build things that matter, champion responsible AI, and grow with some of the industry’s sharpest minds. Apply today and let’s innovate together.

Top Skills

Spark
Databricks
Dbt
Delta Lake
Llms
Mlflow
Pyspark
Python
Secret Scopes
Unity Catalog
Vector Databases
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
708 Employees
Year Founded: 2011

What We Do

Solvd is an end-to-end software engineering and consulting company with over 800 employees located across eight countries in Latin America, North America, and Europe. The company is headquartered in California and has 7 development centers in Ukraine, Poland, Georgia, Argentina, Brazil, and Mexico, as well as a sales office in Hungary.

Solvd is partnering with leading technology companies such as Amazon and Adobe.

It delivers exceptional engineering and digital solutions to Fortune 500 clients across high-growth industries including financial services, retail & e-commerce, healthcare & life sciences, social media, software & Hi-Tech, etc. The services list covers all aspects and needs of modern businesses, including software product development, digital experience and design, DevOps and cloud, data, and AI/ML services.

Moreover, Solvd transformed its deep expertise in QA into a rich intellectual property library, which includes Zebrunner, an innovative proprietary quality management platform. To get more information about Solvd and Zebrunner, please visit https://www.solvd.com and https://zebrunner.com.

Similar Jobs

Solvd, Inc. Logo Solvd, Inc.

Senior Data Engineer

Information Technology • Software • Consulting
Remote
6 Locations
708 Employees

N-iX Logo N-iX

Data Engineer

Information Technology • Consulting
Remote
Poland
2135 Employees

Addepto Logo Addepto

Data Engineer

Artificial Intelligence • Big Data • Computer Vision • Machine Learning • Consulting • Conversational AI • Generative AI
In-Office or Remote
Warszawa, Mazowieckie, POL
55 Employees

Addepto Logo Addepto

Data Engineer

Artificial Intelligence • Big Data • Computer Vision • Machine Learning • Consulting • Conversational AI • Generative AI
In-Office or Remote
Warszawa, Mazowieckie, POL
55 Employees

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account