What You'll Do
- Architect and build resilient, distributed data processing systems using Python and Spark on AWS
- Design and implement end-to-end ETL/ELT workflows that ingest and unify data from diverse sources —ranging from modern table formats like Iceberg and Delta to legacy business files such as Excel and CSV —ensuring a scalable and consistent single source of truth for the organization
- Lead the implementation of the Medallion Architecture, managing data maturity through Bronze, Silver, and Gold layers. You will define how data is structured, classified, and stored to maximize business value while ensuring scalability and high availability.
- Build reusable libraries and frameworks for data quality validation, metadata tracking, and pipeline monitoring
- Build CI/CD process, to automate deployment and testing to maintain a high bar for engineering excellence
- Enforce data governance standards, including security, privacy, and regulatory compliance
- Proactively monitor system health, implement automated observability, and resolve complex bottlenecks in distributed systems to ensure peak resource efficiency and cost-effectiveness
- Partner directly with Product Managers and Data Scientists to translate business requirements into innovative solutions
- Own the full feature lifecycle—from initial whiteboarding to production deployment and long-term maintenance
Requirements
- 4+ years of professional data engineering experience with a demonstrated ability to architect and deploy production-grade data platforms from scratch
- Expert-level proficiency in Python and Apache Spark, with specific experience in JVM tuning, memory management, and optimizing execution plans for large-scale distributed workloads
- Deep expertise in modern data architecture, software design patterns, and various data modeling techniques designed for scalability and performance
- Proven track record of building on AWS (primary) or GCP, including hands-on experience with managed services like EMR or Databricks
- Extensive experience designing and managing complex data lifecycles using orchestration tools such as Airflow, AWS Step Functions, or Prefect
- Deep understanding of data cleansing, curation, and transformation strategies, coupled with experience implementing data governance, security, and lifecycle management policies
- Strong background in building reusable libraries, frameworks, and internal tools that standardize data ingestion and automate ETL/ELT workflows
- Exceptional debugging skills for distributed systems and resolving performance bottlenecks at scale
- Proficiency with CI/CD tools and processes (e.g. Codefresh, Jenkins)
- Excellent verbal and written communication skills in English, with the ability to translate complex technical architectures into actionable insights for stakeholders and cross-functional teams
- Must be located in EST or CST
- Applicants must have the unrestricted right to work in the United States. Veeva will not provide sponsorship at this time
Nice to Have
- Relevant certifications (e.g., AWS, Spark, or similar)
- Familiarity with streaming and distributed technologies such as Spark Streaming, EKS, Kinesis, or Apache Kafka
- Experience implementing or managing modern cloud data warehouses or lakehouse architectures
- Prior experience working in the Life Sciences industry
Perks & Benefits
- Medical, dental, vision, and basic life insurance
- Flexible PTO and company paid holidays
- Retirement programs
- 1% charitable giving program
Compensation
- Base pay: $75,000 - $130,000
- The salary range listed here has been provided to comply with local regulations and represents a potential base salary range for this role. Please note that actual salaries may vary within the range above or below, depending on experience and location. We look at compensation for each individual and base our offer on your unique qualifications, experience, and expected contributions. This position may also be eligible for other types of compensation in addition to base salary, such as variable bonus and/or stock bonus.
Skills Required
- 4+ years of professional data engineering experience
- Expert-level proficiency in Python and Apache Spark
- Experience with AWS or GCP
- Experience designing and managing complex data lifecycles
- Exceptional debugging skills for distributed systems
Veeva Compensation & Benefits Highlights
The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Veeva and has not been reviewed or approved by Veeva.
-
Healthcare Strength — Health, dental, vision, life, disability, and mental health coverage are provided, alongside FSAs, wellness programs, and fitness stipends. These offerings present a comprehensive healthcare package relative to typical tech benefits.
-
Equity Value & Accessibility — Equity is a meaningful component of total compensation, with RSUs/options cited as part of packages. Engineering examples highlight strong base pay paired with "good RSU + ISO."
-
Wellbeing & Lifestyle Benefits — Work Anywhere flexibility, home‑office stipends, wellness programs, fitness reimbursements, and company‑sponsored events support day‑to‑day wellbeing. A 1% charitable giving allowance further enhances lifestyle and purpose-driven engagement.
Veeva Insights
What We Do
Veeva is the global leader in cloud software for the life sciences industry. Committed to innovation, product excellence, and customer success, Veeva serves more than 1,000 customers, ranging from the world’s largest pharmaceutical companies to emerging biotechs. As a Public Benefit Corporation, Veeva is committed to balancing the interests of all stakeholders, including customers, employees, shareholders, and the industries it serves.
Gallery









