Learning Lab Home/Data Science/Taming Big Data with Apache Spark and Python - Hands On!

Taming Big Data with Apache Spark and Python - Hands On!

Course From:
Udemy

New! Updated for Spark 3, more hands-on exercises, and a stronger focus on DataFrames and Structured Streaming.

“Big data" analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark and specifically PySpark. Employers including AmazonEBayNASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You'll learn those same techniques, using your own Windows system right at home. It's easier than you might think.

Learn and master the art of framing data analysis problems as Spark problems through over 20 hands-on examples, and then scale them up to run on cloud computing services in this course. You'll be learning from an ex-engineer and senior manager from Amazon and IMDb.

 

  • Learn the concepts of Spark's DataFrames and Resilient Distributed Datastores

  • Develop and run Spark jobs quickly using Python and pyspark

  • Translate complex analysis problems into iterative or multi-stage Spark scripts

  • Scale up to larger data sets using Amazon's Elastic MapReduce service

  • Understand how Hadoop YARN distributes Spark across computing clusters

  • Learn about other Spark technologies, like Spark SQL, Spark Streaming, and GraphX

By the end of this course, you'll be running code that analyzes gigabytes worth of information – in the cloud – in a matter of minutes. 

This course uses the familiar Python programming language; if you'd rather use Scala to get the best performance out of Spark, see my "Apache Spark with Scala - Hands On with Big Data" course instead.

We'll have some fun along the way. You'll get warmed up with some simple examples of using Spark to analyze movie ratings data and text in a book. Once you've got the basics under your belt, we'll move to some more complex and interesting tasks. We'll use a million movie ratings to find movies that are similar to each other, and you might even discover some new movies you might like in the process! We'll analyze a social graph of superheroes, and learn who the most “popular" superhero is – and develop a system to find “degrees of separation" between superheroes. Are all Marvel superheroes within a few degrees of being connected to The Incredible Hulk? You'll find the answer.

This course is very hands-on; you'll spend most of your time following along with the instructor as we write, analyze, and run real code together – both on your own system, and in the cloud using Amazon's Elastic MapReduce service. 7 hours of video content is included, with over 20 real examples of increasing complexity you can build, run and study yourself. Move through them at your own pace, on your own schedule. The course wraps up with an overview of other Spark-based technologies, including Spark SQL, Spark Streaming, and GraphX.

Wrangling big data with Apache Spark is an important skill in today's technical world. Enroll now!

 

Course
Intermediate
Careers

Careers Related to Taming Big Data with Apache Spark and Python - Hands On!

Certifications

Certifications related to Machine Learning Python

Whether you have coded before or are brand new to the world of programming, this course will put you on the fast track to building confidence with this intuitive, object- oriented language. Learn programming fundamentals and build a custom application. Graduate with the ability to start applying Python within high-growth fields like analytics, data science, and web development. 

 

What you'll accomplish

This is a beginner-friendly program with no prerequisites, although some students may have coded previously. First-time programmers will have access to pre-course preparatory lessons and additional resources to boost their confidence with key concepts and set up their development environments. Throughout this expert-designed program, you’ll:

  • Learn object-oriented programming fundamentals and Python basics that get you coding from day one.
  • Build a Python program and add on increased complexity throughout the course.
  • Troubleshoot Python code and practice common debugging techniques.
  • Push your skills to the next level by adding scripting, modules, and APIs to your Python toolkit.
  • Explore introductory data science and web development as potential career directions for Python programmers.
  • Demonstrate your Python skills by creating apps that pull in data with Pandas or integrate functionality from APIs with Flask.

 

Why General Assembly

Since 2011, General Assembly has graduated more than 40,000 students worldwide from the full time & part time courses. During the 2020 hiring shutdown, GA's students, instructors, and career coaches never lost focus, and the KPMG-validated numbers in their Outcomes report reflect it. *For students who graduated in 2020 — the peak of the pandemic — 74.4% of those who participated in GA's full-time Career Services program landed jobs within six months of graduation. General Assembly is proud of their grads + teams' relentless dedication and to see those numbers rising. Download the report here.

 

Your next step? Submit an application to talk to the General Assembly Admissions team


 

Note: reviews are referenced from Career Karma - https://careerkarma.com/schools/general-assembly

 

General Assembly

General Assembly’s Data Science part-time course is a practical introduction to the interdisciplinary field of data science and machine learning, which lies at the intersection of computer science, statistics, and business. You will learn to use the Python programming language to acquire, parse, and model data for informing business strategy. 

This is a fast-paced course with some prerequisites. Students should be comfortable with programming fundamentals, core Python syntax, and basic statistics. There is an option to complete up to 25 hours of online preparatory lessons. Talk to the General Assembly Admissions team to discuss your background and confirm if this is the right fit for you..

 

What you'll accomplish

A significant portion of the course is a hands- on approach to fundamental modeling techniques and machine learning algorithms. You’ll also practice communicating your results and insights by compiling technical documentation and a stakeholder presentation. Throughout this expert-designed program, you’ll:

  • Perform exploratory data analysis with Python.
  • Build and refine machine learning models to predict patterns
  • from data sets.
  • Communicate data-driven insights to technical and non-technical audiences alike.
  • Apply what you’ve learned to create a portfolio project: a predictive model that addresses a real-world data problem.

 

Why General Assembly

Since 2011, General Assembly has graduated more than 40,000 students worldwide from the full time & part time courses. During the 2020 hiring shutdown, GA's students, instructors, and career coaches never lost focus, and the KPMG-validated numbers in their Outcomes report reflect it. *For students who graduated in 2020 — the peak of the pandemic — 74.4% of those who participated in GA's full-time Career Services program landed jobs within six months of graduation. General Assembly is proud of their grads + teams' relentless dedication and to see those numbers rising. Download the report here.

 

Your next step? Submit an application to talk to the General Assembly Admissions team


 

Note: reviews are referenced from Career Karma - https://careerkarma.com/schools/general-assembly

 

General Assembly

General Assembly’s Data Science Immersive is a transformative course designed for you to get the necessary skills for a data scientist role in three months. 

The Data Science bootcamp is led by instructors who are expert practitioners in their field, supported by career coaches that work with you since day one and enhanced by a career services team that is constantly in talks with employers about their tech hiring needs.

 

What you'll accomplish

As a graduate, you will be ready to succeed in a variety of data science and advanced analytics roles, creating predictive models that drive decision-making and strategy throughout organizations of all kinds. Throughout this expert-designed program, you’ll:

  • Collect, extract, query, clean, and aggregate data for analysis.
  • Gather, store and organize data using SQL and Git.
  • Perform visual and statistical analysis on data using Python and its associated libraries and tools.
  • Craft and share compelling narratives through data visualization.
  • Build and implement appropriate machine learning models and algorithms to evaluate data science problems spanning finance, public policy, and more.
  • Compile clear stakeholder reports to communicate the nuances of your analyses.
  • Apply question, modeling, and validation problem-solving processes to data sets from various industries to provide insight into real-world problems and solutions.
  • Prepare for the world of work, compiling a professional-grade portfolio of solo, group, and client projects.

 

Why General Assembly

Since 2011, General Assembly has graduated more than 40,000 students worldwide from the full time & part time courses. During the 2020 hiring shutdown, GA's students, instructors, and career coaches never lost focus, and the KPMG-validated numbers in their Outcomes report reflect it. *For students who graduated in 2020 — the peak of the pandemic — 74.4% of those who participated in GA's full-time Career Services program landed jobs within six months of graduation. General Assembly is proud of their grads + teams' relentless dedication and to see those numbers rising. Download the report here.

 

Your next step? Submit an application to talk to the General Assembly Admissions team


 

Note: reviews are referenced from Career Karma - https://careerkarma.com/schools/general-assembly

 

General Assembly
Courses

Courses related to Machine Learning Python

Udemy

New! Updated for Spark 3, more hands-on exercises, and a stronger focus on DataFrames and Structured Streaming.

“Big data" analysis is a hot and highly valuable skill – and this course will teach you the hottest…

Udemy

This is the most complete course online for learning about Python, Data Science, and Machine Learning. Join Jose Portilla's over 3 million students to learn about the future today!

What is in the course…

Udemy

Welcome to KGP Talkie's Natural Language Processing (NLP) course. It is designed to give you a complete understanding of Text Processing and Mining with the use of State-of-the-Art NLP algorithms in Python.

We will learn Spacy in detail…

Udemy

Machine Learning is a hot topic!  Python Developers who understand how to work with Machine Learning are in high demand.

But how do you get started?

Maybe you tried to get started with Machine Learning, but couldn…