Sr. Data Engineer/Tech Lead

Posted 3 Hours Ago
Be an Early Applicant
Bengaluru, Bengaluru Urban, Karnataka
In-Office
200K-250K Annually
Senior level
Healthtech • Biotech • Pharmaceutical
The Role
The Senior Data Engineer will build scalable data platforms on AWS, implement data pipelines for batch and real-time processing, and mentor team members. Responsibilities include optimizing performance, maintaining data quality, and collaborating with stakeholders.
Summary Generated by Built In

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world.

As a Senior Data Engineer, you will :

  • demonstrate expert skills in ETL/ELT, data integration, ML Ops, and SQL, as well as intermediate to advanced skills in Python, Pyspark, AI/ML, and data visualization.
  • demonstrate the ability to review, optimize, document, and mentor data/visualization engineers on data pipelines, mapping, cleansing, and visual design using various tools and platforms.
  • possess ability to break down moderately complex problems to implement for increased business impact
  • support other team members and helps them to be successful. Actively shares learnings with team members
  • drive and enforce the team process improvements, ensuring others are brought along in understanding the benefits and tradeoffs
  • actively promote new and innovative ideas across multiple teams and capabilities

Key Responsibilities

Hands-On Development (75%)

  • Build, and maintain scalable data platforms and infrastructure on AWS
  • Implement end-to-end data pipelines for batch and real-time data processing
  • Build robust ETL/ELT workflows to ingest, transform, and load data from diverse sources
  • Implement data lake/Lakehouse architectures using AWS S3, Glue, Athena, and Lake Formation
  • Design and optimize data warehouse solutions (Redshift, Snowflake) for analytics and reporting
  • Establish data quality frameworks and automated monitoring systems
  • Write production-quality Python code for data processing, transformation, and automation
  • Build scalable data pipelines using Apache Airflow, AWS Step Functions, or similar orchestration tool
  • Develop streaming data solutions using Kinesis, Kafka, or AWS MSK
  • Optimize SQL queries and database performance for large-scale datasets
  • Implement data validation, cleansing, and quality checks
  • Build APIs and microservices for data access and integration
  • Create monitoring, alerting, and observability solutions for data pipelines
  • Debug and resolve data pipeline failures and performance bottlenecks

Technical Leadership & Collaboration (25%)

  • Mentor junior and mid-level data engineers through code reviews and technical guidance
  • Establish best practices for data engineering, testing, and deployment
  • Collaborate with data scientists, analysts, and business stakeholders to understand data requirements
  • Work with ML engineers to build data pipelines supporting machine learning workflows
  • Partner with platform/infrastructure teams on cloud architecture and cost optimization
  • Lead technical design discussions and architectural reviews
  • Document data architectures, pipelines, and processes
  • Evangelize data engineering best practices across the organization

Required Qualifications

Technical Expertise

  • 10+ years of professional experience in data engineering or related roles
  • Expert-level proficiency in Python for data engineering:
    • Data processing libraries: Pandas, PySpark, Dask, Polars
    • API development: FastAPI, Flask
    • Testing: Pytest, unittest
  • Strong AWS expertise with hands-on experience in:
    • Data Storage: S3, RDS/Aurora, DynamoDB, Redshift
    • Data Processing: Glue (ETL jobs, crawlers, Data Catalog), EMR, Athena
    • Streaming: Kinesis (Data Streams, Firehose, Analytics), MSK (Managed Kafka)
    • Orchestration: Step Functions, EventBridge, Lambda
    • Analytics: QuickSight, Athena, Redshift Spectrum
    • Data Lake: Lake Formation, Glue Data Catalog
    • Infrastructure: CloudFormation, CDK, IAM, VPC, CloudWatch
  • Workflow Orchestration:
    • Apache Airflow (strong preference)
  • Big Data Technologies:
    • Apache Spark (PySpark) for distributed data processing
    • Experience with EMR, Databricks, or similar platforms
    • Understanding of distributed computing concepts
    • Parquet, Avro, ORC file formats

Architecture & Design

  • Solid understanding and implementation knowledge of data modelling (dimensional modelling, star/snowflake schemas)
  • Experience with both batch and streaming data processing patterns
  • Knowledge of data lake, data warehouse, and lake-house architectures
  • Understanding of data partitioning, bucketing, and optimization strategies
  • Expertise in designing for data quality, lineage, and governance

DevOps & Best Practices

  • Strong experience with CI/CD pipelines for data engineering (GitHub Actions, GitLab CI, Jenkins)
  • Infrastructure as Code using Terraform, CloudFormation, or AWS CDK
  • Containerization with Docker; experience with ECS/Fargate/Kubernetes is a plus
  • Git version control and branching strategies
  • Monitoring and observability tools: CloudWatch, Grafana
  • Data pipeline testing strategies and frameworks

Preferred Qualifications

  • Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or related field (or equivalent experience)
  • Experience in regulated industries (healthcare/pharma, finance, government) with compliance requirements
  • Hands-on experience with:
    • Additional AWS services: Glue DataBrew, AppFlow, Data Pipeline, Lambda, SageMaker
    • Streaming platforms: Apache Kafka, Confluent, AWS MSK
    • Data quality tools: Great Expectations, dbt, Monte Carlo, Bigeye
    • Data cataloging: AWS Glue Data Catalog, Alation, Collibra
    • Alternative clouds: GCP (BigQuery, Dataflow), Azure (Synapse, Data Factory)
    • Data orchestration: dbt for transformation workflows
  • Experience with clinical data, life sciences, or statistical computing domains (CDISC standards, clinical trials data)
  • Knowledge of data mesh or data fabric architectures
  • Experience building data platforms for ML/AI workloads
  • Familiarity with data governance and metadata management frameworks

Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form (https://careers.lilly.com/us/en/workplace-accommodation) for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.

Lilly does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.

#WeAreLilly

Top Skills

Apache Airflow
Athena
AWS
Aws Glue
Ci/Cd
Docker
DynamoDB
Emr
Fastapi
Git
Kafka
Kinesis
Ml Ops
Pandas
Pyspark
Python
Rds
Redshift
S3
Snowflake
SQL
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Indianapolis, IN
39,451 Employees
Year Founded: 1876

What We Do

Eli Lilly and Company engages in the discovery, development, manufacture, and sale of products in pharmaceutical products business segment.

For more than a century, we have stayed true to a core set of values – excellence, integrity, and respect for people – that guide us in all we do: discovering medicines that meet real needs, improving the understanding and management of disease, and giving back to communities through philanthropy and volunteerism.

Similar Jobs

Easy Apply
In-Office
Bangalore, Bengaluru Urban, Karnataka, IND
913 Employees

Hypersonix Inc. Logo Hypersonix Inc.

Senior Data Engineer

Artificial Intelligence • eCommerce • Machine Learning • Software
In-Office
Bangalore, Bengaluru Urban, Karnataka, IND
81 Employees

FairMoney Logo FairMoney

Senior Data Engineer

Fintech • Payments • Financial Services
In-Office or Remote
Bengaluru, Bengaluru Urban, Karnataka, IND
923 Employees

Fractal Logo Fractal

Data Engineer

Artificial Intelligence • Consulting
In-Office
5 Locations
5262 Employees

Similar Companies Hiring

Axle Health Thumbnail
Logistics • Information Technology • Healthtech • Artificial Intelligence
Santa Monica, CA
17 Employees
Camber Thumbnail
Social Impact • Healthtech • Fintech
New York, NY
53 Employees
Sailor Health Thumbnail
Telehealth • Social Impact • Healthtech
New York City, NY
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account