Lead Data Engineer - Databricks

Posted Yesterday
Be an Early Applicant
2 Locations
In-Office or Remote
Senior level
Information Technology • Consulting
The Role
Lead design, development, and optimization of ETL/ELT data pipelines on Databricks using PySpark and Spark SQL. Implement Delta Lake, Unity Catalog, incremental ingestion (Auto Loader/streaming), Databricks Workflows, and performance tuning. Build reusable accelerators, participate in client technical discussions, mentor junior engineers, and resolve production issues while promoting engineering best practices.
Summary Generated by Built In

About Kanerika

Who we are:

Kanerika Inc. is a premier global software products and services firm that specializes in providing innovative solutions and services for data-driven enterprises. Our focus is to empower businesses to achieve their digital transformation goals and maximize their business impact through the effective use of data and AI.  We leverage cutting-edge technologies in data analytics, data governance, AI-ML, GenAI/ LLM and industry best practices to deliver custom solutions that help organizations optimize their operations, enhance customer experiences, and drive growth.

Awards and Recognitions

Kanerika has won several awards over the years, including:

·       CMMI Level 3 Appraised in 2024.

·       Best Place to Work 2023 by Great Place to Work

·       Top 10 Most Recommended RPA Start-Ups in 2022 by RPA today.

·       Frost & Sullivan India 2021 Technology Innovation Award for its Kompass composable solution architecture.

·        Kanerika has also been recognized for its commitment to customer privacy and data security, having achieved, ISO 9001, ISO 27701, SOC2, and GDPR compliances.

Working for us

Kanerika is rated 4.6/5 on Glassdoor, for many good reasons. We truly value our employees' growth, well-being, and diversity, and people’s experiences bear this out. At Kanerika, we offer a host of enticing benefits that create an environment where you can thrive both personally and professionally. From our inclusive hiring practices and mandatory training on creating a safe work environment to our flexible working hours and generous parental leave, we prioritize the well-being and success of our employees. Our commitment to professional development is evident through our mentorship programs, job training initiatives, and support for professional certifications. Additionally, our company-sponsored outings and various time-off benefits ensure a healthy work-life balance. Join us at Kanerika and become part of a vibrant and diverse community where your talents are recognized, your growth is nurtured, and your contributions make a real impact. See the benefits section below for the perks you’ll get while working for Kanerika.

Locations

We are located in Austin (USA), Singapore, Hyderabad, Indore and Ahmedabad (India).

Job Location: Hyderabad, Indore and Ahmedabad (India)



Requirements

Role:

As one of the founding members of Kanerika's Databricks delivery practice, you will be a hands-on builder responsible for designing, developing, and optimizing data pipelines on Databricks for client engagements. You'll work closely with the Practice Lead to translate architecture into working solutions, build reusable accelerators ahead of client demand, and help establish engineering best practices for the growing team.

Key Responsibilities

·       Design, build, and optimize ETL/ELT data pipelines on Databricks using PySpark and Spark SQL.

·       Build and maintain incremental ingestion pipelines using Auto Loader and structured streaming, including checkpoint management and schema evolution handling.

·       Implement and maintain Delta Lake tables (including Change Data Feed and liquid clustering), Unity Catalog structures, and Databricks Workflows for client and internal projects.

·       Build reusable accelerators, templates, and demo environments to support pre-sales and speed up future client delivery.

·       Collaborate with the Practice Lead/Architect on solution design for client engagements, providing engineering-level input on feasibility and effort.

·       Perform data quality checks, performance tuning, and cost optimization on Databricks clusters and jobs.

·       Participate in client-facing technical discussions as needed, including discovery sessions and technical walkthroughs.

·       Write clean, well-documented, and testable code following team engineering standards.

·       Mentor junior engineers as the team scales, and contribute to internal knowledge-sharing and best-practice documentation.

·       Troubleshoot and resolve production data pipeline issues across client environments.

·       Stay current with Databricks platform releases across Unity Catalog, Lakeflow Declarative Pipelines, and share learnings with the team through internal knowledge sessions and documentation.


Required Skills & Experience

·       5–9 years of data engineering experience, with at least 2 years of hands-on Databricks experience in production.

·       Strong proficiency in PySpark and/or Spark SQL for large-scale data processing.

·       Practical experience with Delta Lake, Unity Catalog, and Databricks Workflows/job orchestration.

·       Solid SQL skills and experience with data modeling for analytics/lakehouse architectures.

·       Experience with at least one major cloud platform (Azure, AWS, or GCP); Azure strongly preferred.

·       Experience with Python for data engineering tasks beyond Spark (scripting, automation, testing).

·       Familiarity with CI/CD practices for data pipelines (Git-based workflows, automated testing/deployment).

·       Strong debugging and performance-tuning skills for Spark jobs (partitioning, caching, cluster sizing).

·       Good communication skills and comfort working directly with client stakeholders when needed.

 

PEFERRED/ NICE TO HAVE

·       Databricks Certified Data Engineer Associate/Professional certification.

·       Experience with Lakeflow Declarative Pipelines (formerly Delta Live Tables), Lakeflow Connect for managed ingestion, MLflow for experiment tracking and model registry, Databricks SQL, and AI/BI Genie.

·       Exposure to data governance and security frameworks (row/column-level security, data masking).

·       Prior experience in a consulting/IT services environment delivering to multiple clients.

·       Familiarity with orchestration tools (Airflow) and ingestion tools (Fivetran, Kafka, Azure Data Factory).

What Success Looks Like

·       Within 3 months: Comfortable with Kanerika's delivery standards; has built or contributed to at least one reusable accelerator/demo asset.

·       Within 6 months: Independently delivering core engineering work on client engagement(s) with minimal oversight.

·       Within 12 months: Recognized as a go-to senior engineer on the team, mentoring newer hires and contributing to architecture decisions.

·       Leading databricks partnership upgrade to Gold level.



Benefits

Why join us?

·       Work with a passionate and innovative team in a fast-paced, growth-oriented environment.

·       Gain hands-on experience in content marketing with exposure to real-world projects.

·       Opportunity to learn from experienced professionals and enhance your marketing skills.

·       Contribute to exciting initiatives and make an impact from day one.

·       Competitive stipend and potential for growth within the company.

·       Recognized for excellence in data and AI solutions with industry awards and accolades.

Employee Benefits:

1. Culture:

        i.            Open Door Policy: Encourages open communication and accessibility to management.

       ii.            Open Office Floor Plan: Fosters a collaborative and interactive work environment.

     iii.            Flexible Working Hours: Allows employees to have flexibility in their work schedules.

     iv.            Employee Referral Bonus: Rewards employees for referring qualified candidates.

       v.            Appraisal Process Twice a Year: Provides regular performance evaluations and feedback.

2. Inclusivity and Diversity:

a.      Hiring practices that promote diversity: Ensures a diverse and inclusive workforce.

b.      Mandatory POSH training: Promotes a safe and respectful work environment.

3. Health Insurance and Wellness Benefits:

a.      GMC and Term Insurance: Offers medical coverage and financial protection.

b.      Health Insurance: Provides coverage for medical expenses.

c.       Disability Insurance: Offers financial support in case of disability.

4. Child Care & Parental Leave Benefits:

a.      Company-sponsored family events: Creates opportunities for employees and their families to bond.

b.      Generous Parental Leave: Allows parents to take time off after the birth or adoption of a child.

c.       Family Medical Leave: Offers leave for employees to take care of family members' medical needs.

5. Perks and Time-Off Benefits:

a.      Company-sponsored outings: Organizes recreational activities for employees.

b.      Gratuity: Provides a monetary benefit as a token of appreciation.

c.       Provident Fund: Helps employees save for retirement.

d.      Generous PTO: Offers more than the industry standard for paid time off.

e.      Paid sick days: Allows employees to take paid time off when they are unwell.

f.        Paid holidays: Gives employees paid time off for designated holidays.

g.       Bereavement Leave: Provides time off for employees to grieve the loss of a loved one.

 

6. Professional Development Benefits:

a.      L&D with FLEX- Enterprise Learning Repository: Provides access to a learning repository for professional development.

b.      Mentorship Program: Offers guidance and support from experienced professionals.

c.       Job Training: Provides training to enhance job-related skills.

d.      Professional Certification Reimbursements: Assists employees in obtaining professional certifications.

e.      Promote from Within: Encourages internal growth and advancement opportunities.

 

 



Skills Required

  • 5-9 years of data engineering experience with at least 2 years of hands-on Databricks in production
  • Proficiency in PySpark and/or Spark SQL for large-scale data processing
  • Practical experience with Delta Lake, Unity Catalog, and Databricks Workflows/job orchestration
  • Experience building incremental ingestion pipelines using Auto Loader and structured streaming (checkpointing, schema evolution)
  • Strong SQL skills and experience with data modeling for analytics/lakehouse architectures
  • Experience with at least one major cloud platform (Azure, AWS, or GCP)
  • Azure experience (strongly preferred)
  • Python experience for scripting, automation, and testing beyond Spark
  • Familiarity with CI/CD practices for data pipelines (Git-based workflows, automated testing/deployment)
  • Strong debugging and performance-tuning skills for Spark jobs (partitioning, caching, cluster sizing)
  • Good communication skills and comfort working directly with client stakeholders
  • Databricks Certified Data Engineer Associate/Professional
  • Experience with Lakeflow Declarative Pipelines (Delta Live Tables), Lakeflow Connect, MLflow, Databricks SQL, AI/BI Genie
  • Exposure to data governance and security frameworks (row/column-level security, data masking)
  • Prior consulting/IT services experience delivering to multiple clients
  • Familiarity with orchestration and ingestion tools such as Airflow, Fivetran, Kafka, Azure Data Factory
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Hyderabad, Telangana
239 Employees
Year Founded: 2015

What We Do

Established in 2015, we started by recognizing the gap when we identified crucial implementation gaps in the market. While working with a leading technology company, the core team discovered opportunities where our expertise could bridge these gaps, laying the foundation for Kanerika. Dedicated to empowering businesses with cutting-edge technology solutions, we've been on a relentless pursuit to craft efficient, future-ready enterprises, marking our journey with growth and transformative impact milestones About us: - 9+ Years in Business - 6 Offices Across the Globe - 300+ Consultants Worldwide Our Specialization: - Data Analytics - Data Integration - Data Governance - Robotic Process Automation - Generative AI Credentials: - SOC II Compliant - ISO 27001 and ISO 27701 Certified - Microsoft Partner Numbers that Matter: - 99% Client Retention - 95% Customer Satisfaction Rate - 45% YoY Growth Ready to unleash the power of tech in your business? Let's connect! Book a time that works for you here: https://www.kanerika.com/meet

Similar Jobs

ServiceNow Logo ServiceNow

Senior Tax Analyst - US Indirect Tax

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Hyderabad, Telangana, IND
29000 Employees

Capco Logo Capco

IRR Testing

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Remote or Hybrid
India
6000 Employees

Capco Logo Capco

Product Manager

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Remote or Hybrid
India
6000 Employees

Quillbot Logo Quillbot

Systems Engineer

Artificial Intelligence • Edtech • Mobile • Natural Language Processing • Productivity • Software
Easy Apply
Remote
India
232 Employees

Similar Companies Hiring

Amplify Platform Thumbnail
Fintech • Financial Services • Consulting • Cloud • Business Intelligence • Big Data Analytics
Scottsdale, AZ
62 Employees
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account