About the Role
As the Lead Data Engineer, you will lead a team to architect and operate a big data platform that is real-time, stable and scalable to support data analytics, reporting, data visualization and machine learning. You will help to build the Data Engineering team, mentor and motivate team members on cutting edge big data technology and best practice.
You will also be expected to:
- Define big data strategy through collaborations with stakeholders, architecture team and product team to support data analytics, reporting, data visualization, and machine learning
- Distill high level corporate goals into projects to serve those goals
- Manage projects on a quarterly cadence including collaborating with stakeholders, tracking milestones, communicating blockers and ensuring progress
- Lead the team to build the new real time big data platform with open source technologies such as Kafka, Spark, and Presto etc.
- Maintain operational status of all data sources and pipelines including maintaining on call rotation
- Deep understanding of data warehousing concepts, data update strategies, data modeling, data pipelines, and workflows.
- Experience with leading projects of large data volume (approximately 10TB of data) stored in heterogeneous data sources
- Refactor the existing data model into an easy-to-maintain data solution across the organization
- Collaborate with QC team to assure product quality
- Collaborate with release team and operational team to automate build and deployment from DEV to PROD
- Lead team projects and initiatives from inception to rollout managing team deliverables, dependencies while communicating timelines and expectations to stakeholders
- Mentor team members to improve processes and promote the best practices
- 7+ years of work experience with a combination of data modeling, ETL and database operations
- 3+ years of experience working on data warehouses such as Snowflake
- 3+ years of experience operating production workloads on PostgreSQL(preferred), MySQL databases
- 3+ years of experience with open source technologies (Spark, Kafka, Presto, Hive, etc.)
- 2+ years of experience in architecting and building scalable data platforms processing data on a terabyte or petabyte scale
- 3+ years experience managing ETL pipelines with tools such as Fivetran, Airflow, Talend, etc.
- 2+ years of experience with AWS - EMR, Athena, Lambda, Kinesis, S3, EC2
- Experience leading teams of 3+ engineers
- Familiarity with Agile principles including but not limited to Scrum meetings and mechanics
- Proficient in programming languages such as Java, Scala and Python
- Good understanding of modern data structures and business intelligence reporting tools and track record of applying those to build a system
- Intellectually curious about emerging technologies with the ability to turn that into reasonable working solutions
- Ability to be efficient and to prioritize and deliver on time
- Excellent communication skills and dynamic team player
- Experience in real-time analytics applications.
- Experience in both batch and stream processing technologies, Java or Scala
- Machine learning experience with Spark or similar