Location: Remote
Position Type: Contract
We are seeking an accomplished, technology-driven Lead Data Platform Architect / Migration Specialist to spearhead the modernization of our core enterprise financial and tax allocation engines. In this role, you will lead the architectural design, definition of migration strategies, and hands-on implementation to transition large-scale legacy relational database systems (SQL Server/T-SQL) into a modern, cloud-native Databricks Lakehouse platform.
The ideal candidate will have extensive experience in high-throughput distributed systems, Databricks compute optimization, performance tuning, and complex pipeline orchestration.
Architecture & Strategy: Validate, refine, and own the target architecture on Databricks. Define robust migration strategies and production-ready reference patterns to convert 150+ complex stored procedures into PySpark and Structured/Declarative Pipelines (SDP).
Pipeline Engineering: Design distributed processing frameworks, control flows, and configuration-driven parameter handling for both full and incremental recalculation modes.
Performance Optimization: Address performance deltas between small and large workloads. Architect and implement acceleration techniques such as caching, partition pruning, cluster sizing, and offline/pre-calculation strategies to maintain sub-30-second user-facing reporting SLAs.
Orchestration & Observability: Design and deploy enterprise-level pipeline orchestration using tools like Apache Airflow or Databricks Workflows. Integrate robust logging, error handling, and observability patterns into existing enterprise monitoring frameworks.
Governance & Security: Implement data governance models, data lineage, and schema evolution utilizing tools like Unity Catalog.
AI-Assisted Delivery & Code Quality: Establish best practices for AI-assisted code generation (e.g., using Claude or advanced LLMs), providing code-review patterns and refactoring frameworks to ensure maintainable and performant output.
Team Enablement: Lead code walkthroughs, design reviews, and pair-programming sessions with the development team to accelerate knowledge transfer and technical excellence.
Core Big Data Platform: Deep expert-level knowledge of Databricks (Lakehouse architecture, Delta Lake, Unity Catalog) and Apache Spark / PySpark.
Legacy Database Expertise: Strong background in relational databases, with advanced proficiency in SQL Server, T-SQL, and Stored Procedures. Ability to reverse-engineer and refactor legacy database logic into distributed paradigms.
Orchestration Tools: Hands-on experience with Apache Airflow or similar modern workflow orchestrators.
Performance Tuning: Proven track record in cost optimization (FinOps), cluster tuning, autoscaling configurations, and handling skewed data profiles.
CI/CD & DevOps: Experience with Infrastructure as Code (Terraform), data build tool (dbt), testing frameworks (PyTest), and automated Git-based workflows.
Experience Level: 10+ years of experience in Data Engineering/Architecture, with at least 3+ years specifically leading large-scale cloud data migrations.
Education: Bachelor’s or Master's degree in Computer Science, Engineering, or a related technical field.
Databricks Certified Data Engineer Associate / Professional
Databricks Certified Solutions Architect
AWS Certified Database Specialist or equivalent Cloud Certifications
Skills Required
- Deep expert-level knowledge of Databricks and Apache Spark/PySpark
- Strong proficiency in SQL Server, T-SQL, and Stored Procedures
- Hands-on experience with Apache Airflow or similar orchestration tools
- Experience with Infrastructure as Code (Terraform) and CI/CD practices
- 10+ years of experience in Data Engineering/Architecture
What We Do
Koantek is a Databricks-exclusive system integrator focused on enterprise-scale data and AI transformation, helping clients modernize, migrate, and scale AI solutions.






.png)

