Learning Lab Home/Software Engineering/Writing production-ready ETL pipelines in Python / Pandas

Writing production-ready ETL pipelines in Python / Pandas

In partnership With
Udemy
4.3
(451)

Topic:

Learn how to write professional ETL pipelines using best practices in Python and Data Engineering.

 

What you'll learn:

 

  • How to write professional ETL pipelines in Python.

  • Steps to write production level Python code.

  • How to apply functional programming in Data Engineering.

  • How to do a proper object oriented code design.

  • How to use a meta file for job control.

  • Coding best practices for Python in ETL/Data Engineering.

  • How to implement a pipeline in Python extracting data from an AWS S3 source, transforming and loading the data to another AWS S3 target.

 

Description:

This course will show each step to write an ETL pipeline in Python from scratch to production using the necessary tools such as Python 3.9, Jupyter Notebook, Git and Github, Visual Studio Code, Docker and Docker Hub and the Python packages Pandas, boto3, pyyaml, awscli, jupyter, pylint, moto, coverage and the memory-profiler.

Two different approaches how to code in the Data Engineering field will be introduced and applied - functional and object oriented programming.

Best practices in developing Python code will be introduced and applied:

  • design principles

  • clean coding

  • virtual environments

  • project/folder setup

  • configuration

  • logging

  • exeption handling

  • linting

  • dependency management

  • performance tuning with profiling

  • unit testing

  • integration testing

  • dockerization

 

What is the goal of this course?

In the course we are going to use the Xetra dataset. Xetra stands for Exchange Electronic Trading and it is the trading platform of the Deutsche Börse Group. This dataset is derived near-time on a minute-by-minute basis from Deutsche Börse’s trading system and saved in an AWS S3 bucket available to the public for free.

The ETL Pipeline we are going to create will extract the Xetra dataset from the AWS S3 source bucket on a scheduled basis, create a report using transformations and load the transformed data to another AWS S3 target bucket.

The pipeline will be written in a way that it can be deployed easily to almost any production environment that can handle containerized applications. The production environment we are going to write the ETL pipeline for consists of a GitHub Code repository, a DockerHub Image Repository, an execution platform such as Kubernetes and an Orchestration tool such as the container-native Kubernetes workflow engine Argo Workflows or Apache Airflow.

 

So what can you expect in the course?

You will receive primarily practical interactive lessons where you have to code and implement the pipeline and theory lessons when needed. Furthermore you will get the python code for each lesson in the course material, the whole project on GitHub and the ready to use docker image with the application code on Docker Hub.

There will be power point slides for download for each theoretical lesson and useful links for each topic and step where you find more information and can even dive deeper.

 

Who this course is for:

  • Data engineers, scientists and developers who want to write professional production-ready data pipelines in Python.
  • Everyone who is interested in writing data pipelines in Python that are ready for production.

 

Course
Intermediate
Careers

Careers Related to Writing production-ready ETL pipelines in Python / Pandas

Certifications

Certifications related to Python or Data Warehouse or Data Pipeline

Whether you have coded before or are brand new to the world of programming, this course will put you on the fast track to building confidence with this intuitive, object- oriented language. Learn programming fundamentals and build a custom application. Graduate with the ability to start applying Python within high-growth fields like analytics, data science, and web development. 

 

What you'll accomplish

This is a beginner-friendly program with no prerequisites, although some students may have coded previously. First-time programmers will have access to pre-course preparatory lessons and additional resources to boost their confidence with key concepts and set up their development environments. Throughout this expert-designed program, you’ll:

  • Learn object-oriented programming fundamentals and Python basics that get you coding from day one.
  • Build a Python program and add on increased complexity throughout the course.
  • Troubleshoot Python code and practice common debugging techniques.
  • Push your skills to the next level by adding scripting, modules, and APIs to your Python toolkit.
  • Explore introductory data science and web development as potential career directions for Python programmers.
  • Demonstrate your Python skills by creating apps that pull in data with Pandas or integrate functionality from APIs with Flask.

 

Why General Assembly

Since 2011, General Assembly has graduated more than 40,000 students worldwide from the full time & part time courses. During the 2020 hiring shutdown, GA's students, instructors, and career coaches never lost focus, and the KPMG-validated numbers in their Outcomes report reflect it. *For students who graduated in 2020 — the peak of the pandemic — 74.4% of those who participated in GA's full-time Career Services program landed jobs within six months of graduation. General Assembly is proud of their grads + teams' relentless dedication and to see those numbers rising. Download the report here.

 

Your next step? Submit an application to talk to the General Assembly Admissions team


 

Note: reviews are referenced from Career Karma - https://careerkarma.com/schools/general-assembly

 

General Assembly

General Assembly’s Data Science part-time course is a practical introduction to the interdisciplinary field of data science and machine learning, which lies at the intersection of computer science, statistics, and business. You will learn to use the Python programming language to acquire, parse, and model data for informing business strategy. 

This is a fast-paced course with some prerequisites. Students should be comfortable with programming fundamentals, core Python syntax, and basic statistics. There is an option to complete up to 25 hours of online preparatory lessons. Talk to the General Assembly Admissions team to discuss your background and confirm if this is the right fit for you..

 

What you'll accomplish

A significant portion of the course is a hands- on approach to fundamental modeling techniques and machine learning algorithms. You’ll also practice communicating your results and insights by compiling technical documentation and a stakeholder presentation. Throughout this expert-designed program, you’ll:

  • Perform exploratory data analysis with Python.
  • Build and refine machine learning models to predict patterns
  • from data sets.
  • Communicate data-driven insights to technical and non-technical audiences alike.
  • Apply what you’ve learned to create a portfolio project: a predictive model that addresses a real-world data problem.

 

Why General Assembly

Since 2011, General Assembly has graduated more than 40,000 students worldwide from the full time & part time courses. During the 2020 hiring shutdown, GA's students, instructors, and career coaches never lost focus, and the KPMG-validated numbers in their Outcomes report reflect it. *For students who graduated in 2020 — the peak of the pandemic — 74.4% of those who participated in GA's full-time Career Services program landed jobs within six months of graduation. General Assembly is proud of their grads + teams' relentless dedication and to see those numbers rising. Download the report here.

 

Your next step? Submit an application to talk to the General Assembly Admissions team


 

Note: reviews are referenced from Career Karma - https://careerkarma.com/schools/general-assembly

 

General Assembly

General Assembly’s Data Science Immersive is a transformative course designed for you to get the necessary skills for a data scientist role in three months. 

The Data Science bootcamp is led by instructors who are expert practitioners in their field, supported by career coaches that work with you since day one and enhanced by a career services team that is constantly in talks with employers about their tech hiring needs.

 

What you'll accomplish

As a graduate, you will be ready to succeed in a variety of data science and advanced analytics roles, creating predictive models that drive decision-making and strategy throughout organizations of all kinds. Throughout this expert-designed program, you’ll:

  • Collect, extract, query, clean, and aggregate data for analysis.
  • Gather, store and organize data using SQL and Git.
  • Perform visual and statistical analysis on data using Python and its associated libraries and tools.
  • Craft and share compelling narratives through data visualization.
  • Build and implement appropriate machine learning models and algorithms to evaluate data science problems spanning finance, public policy, and more.
  • Compile clear stakeholder reports to communicate the nuances of your analyses.
  • Apply question, modeling, and validation problem-solving processes to data sets from various industries to provide insight into real-world problems and solutions.
  • Prepare for the world of work, compiling a professional-grade portfolio of solo, group, and client projects.

 

Why General Assembly

Since 2011, General Assembly has graduated more than 40,000 students worldwide from the full time & part time courses. During the 2020 hiring shutdown, GA's students, instructors, and career coaches never lost focus, and the KPMG-validated numbers in their Outcomes report reflect it. *For students who graduated in 2020 — the peak of the pandemic — 74.4% of those who participated in GA's full-time Career Services program landed jobs within six months of graduation. General Assembly is proud of their grads + teams' relentless dedication and to see those numbers rising. Download the report here.

 

Your next step? Submit an application to talk to the General Assembly Admissions team


 

Note: reviews are referenced from Career Karma - https://careerkarma.com/schools/general-assembly

 

General Assembly
Courses

Courses related to Python or Data Warehouse or Data Pipeline

Flatiron School

Whether you have zero coding knowledge, are self-taught, or are somewhere in between, this course is for you. Our course takes you from foundational skills to advanced, practical knowledge in as little as 15,…

Flatiron School

Whether you have zero coding knowledge, are self-taught, or are somewhere in between, this course is for you. Our course takes you from foundational skills to advanced, practical knowledge in as little as 15 weeks.

Flatiron School…

Udemy

Topic:

Learn how to write professional ETL pipelines using best practices in Python and Data Engineering.

 

What you'll learn:

 

  • How to write professional ETL…

4.3
(451)
Udemy

Topic:

Data Science, Machine Learning, and Data Analytics Techniques for Marketing, Digital Media, Online Advertising, and More

 

What you'll learn:

  • Use adaptive…

4.6
(5018)