Senior Lead Software Engineer- DATA & AI

Posted 4 Days Ago
Be an Early Applicant
Bangalore, Bengaluru Urban, Karnataka
In-Office
Senior level
Analytics
The Role
Architect and deliver scalable cloud-native data platforms, design data pipelines for AI/ML workloads, and ensure data quality and governance.
Summary Generated by Built In

We are looking for Senior Lead Data Engineer, you will architect the highly scalable, cloud-native data platforms that power our Real-World Data (RWD) and DRG (Decision Resources Group) analytics solutions—critical tools that help researchers, clinicians, scientists, and business leaders make faster, more confident decisions. You’ll help build the data engine behind products used to accelerate drug discovery, evaluate treatment effectiveness, model patient journeys, and bring life-saving innovations to market.

This is an opportunity to build data systems that not only drive next-generation AI but also create measurable impact in healthcare and life sciences globally.

If you’re passionate about data engineering and excited to work on platforms that enable next-generation AI, this role is for you.

About You – Experience, Education, Skills, and Accomplishments
  • Bachelor’s degree in computer science, Engineering, or related field.
  • Minimum 8 years of experience building scalable, production-grade data systems.
  • Proven ability to design massively scalable distributed data processing pipelines.
  • Strong background in database design, schema modelling, and performance tuning.
  • Hands-on expertise building and optimizing complex ETL/ELT pipelines that power ML and analytics workloads.
  • Ability to research and work independently, & working with remote team in different time-zones
  • Experience working on interactive speed query engines like StarRocks, ClickHouse, Druid etc
  • Experience designing resilient, fault-tolerant, cloud-native data platforms with automated disaster recovery.
  • Hands-on background in Agile delivery, CI/CD, and containerized workflows.
  • Strong understanding of data versioning, lineage, reproducibility, and metadata management — critical for AI governance.
Technical Skills
  • Big Data, PySpark, Databricks, Snowflake
  • Interactive query engines like StarRocks/ClickHouse/Druid
  • Exposure to open-source technologies like DuckDB, Polars
  • Optimize Transformations: Refine complex logic, often the most resource-intensive part, using efficient code and techniques.
  • AWS Glue, AWS EMR, Delta Lake, Iceberg
  • Parquet, RDBMS (PostgreSQL)
  • Experience designing data flows that serve AI, GenAI, and algorithmic workloads
Languages
  • Proficient in Python, SQL, and PySpark
  • Bonus: experience building data prep scripts for ML model training
Cloud Technologies & Tools
  • Strong experience with AWS: EMR, Glue, S3, EC2, RDS, Aurora PostgreSQL, Lambda
  • Ability to evaluate and integrate AI-friendly tools (feature stores, vector databases, ML workflow orchestration, etc.)
It Would Be Great If You Also Have
  • Exposure to GenAI technologies, LLM data pipelines, or vector embeddings
  • Experience supporting data needs for ML, LLMs, or analytics teams
  • Experience collaborating with distributed, high-velocity global teams
  • Experience building end-to-end RAG pipelines, advanced RAG like Fusion RAG and applying Query transformation to improve the Retrieval process.
  • Experience in Python frameworks like LangChain, LlamaIndex used to build GenAI application
  • Exposure to Vector databases like Chroma, Pinecone, Milvus, Weaviate, LanceDB
What You Will Be Doing in This RoleAI-Ready Data Architecture & Technical Leadership
  • Architect and deliver a future-proof data lake platform optimized for analytics, ML, and GenAI workloads.
  • Design intelligent, automated, highly scalable data pipelines that support model training, inference, and continuous learning.
  • Provide thought leadership on emerging AI-driven data patterns such as feature stores, vectorized pipelines, and streaming ingestion.
  • Evaluate modern technologies (Delta Lake, Iceberg, Databricks ML, AWS AI services) to ensure the platform stays ahead of the curve.
  • Own the end-to-end data lake solution design, ensuring scalability, reliability, and AI-readiness.
  • Collaborate well with colleagues & business stakeholders to define and execute on technical strategy.
  • Be an active stakeholder throughout the software development life cycle, overseeing the software design & ensuring the project maintains its technical direction, while adjusting the technical design to mitigate unexpected blockers during the project.
Data Engineering & Platform Delivery
  • Build high-performance, cloud-native ETL & ELT pipelines using AWS Glue, EMR, and Databricks.
  • Ensure data quality, lineage, auditability, and governance to support trustworthy AI and analytics.
  • Embed standards for data observability, automated quality checks, and ML-ready feature transformations.
  • Help implement robust SLAs for AI data services, ensuring fast, deterministic, and reliable data flows.
  • Act as a key contributor in architectural decisions, data modelling, workflow optimization, and platform enhancements.
Innovation, GenAI Integration & Customer Impact
  • Drive R&D explorations across new AI/GenAI enablers such as automated data labelling, embeddings, or intelligent data preparation.
  • Partner with Product and Technology leaders to translate business problems into AI-ready data solutions.
  • Lead initiatives to make the data platform more “AI-native,” enabling advanced analytics, LLM-driven insights, and real-time intelligence.
  • Continuously explore how emerging AI tools can reduce operational overhead and automate previously manual processes.
  • Create technical documentation and knowledge assets to scale AI-ready engineering practices across the organization.
About the Team

You will join the RWD DRG Fusion team, a global engineering organization focused on powering the next generation of healthcare and life sciences insights. The team thrives on innovation, collaboration, diversity, and a strong sense of mission. You’ll work with product owners, scientists, data scientists, ML engineers, and architects shaping the future of our AI-driven products.

Hours of Work
  • Full-time (IST)
  • 40 hours per week
  • Hybrid working environment

At Clarivate, we are committed to providing equal employment opportunities for all  qualified persons with respect to hiring, compensation, promotion, training, and other terms, conditions, and privileges of employment. We comply with applicable laws and regulations governing non-discrimination in all locations.

Top Skills

Aurora Postgresql
AWS
Aws Emr
Aws Glue
Big Data
Clickhouse
Databricks
Delta Lake
Druid
Duckdb
Ec2
Emr
Glue
Iceberg
Lambda
Parquet
Polars
Postgres
Pyspark
Python
Rds
S3
Snowflake
SQL
Starrocks
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Belfast
10,549 Employees

What We Do

Clarivate™ is a global leader in providing solutions to accelerate the lifecycle of innovation. Our bold mission is to help customers solve some of the world’s most complex problems by providing actionable information and insights that reduce the time from new ideas to life-changing inventions in the areas of science and intellectual property. We help customers discover, protect and commercialize their inventions using our trusted subscription and technology-based solutions coupled with deep domain expertise. For more information, please visit clarivate.com.

Similar Jobs

Motorola Solutions Logo Motorola Solutions

Salesforce Engineer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Hybrid
Bangalore, Bengaluru, Karnataka, IND
23000 Employees

Motorola Solutions Logo Motorola Solutions

Senior Engineer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Hybrid
Bangalore, Bengaluru, Karnataka, IND
23000 Employees

Motorola Solutions Logo Motorola Solutions

IT Oracle CPQ Developer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Hybrid
Bangalore, Bengaluru, Karnataka, IND
23000 Employees

UL Solutions Logo UL Solutions

Project Engineer

Automotive • Professional Services • Software • Consulting • Energy • Chemical • Renewable Energy
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
15000 Employees
80K-100K Annually

Similar Companies Hiring

Prolaio Thumbnail
Wearables • Mobile • Healthtech • Big Data • Artificial Intelligence • Analytics
Chicago, IL
62 Employees
Northslope Technologies Thumbnail
Software • Information Technology • Generative AI • Consulting • Artificial Intelligence • Analytics
Denver, CO
60 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account