Data Science Articles

Sorted By: Most Recent
Sara A. Metwalli Sara A. Metwalli
Updated on May 29, 2025

4 Essential Skills Every Data Scientist Needs

There’s more to data science than data. These 4 skills will help you land (and keep!) that dream job.

Mitchell Telatnik Mitchell Telatnik
Updated on May 28, 2025

Machine Learning for Beginners (With Weka)

Follow along with this machine learning for beginners tutorial, which walks through the basics of classification and regression algorithms and how to build a machine learning model in Weka.

Image: Shutterstock / Built In
Sergen Cansiz Sergen Cansiz
Updated on May 28, 2025

Covariance Matrix: Definition, Derivation and Applications

A covariance matrix is a square matrix that shows the covariance between every pair of variables in a given data set, where each element in the matrix represents the corresponding covariance.

Image: Shutterstock / Built In
Parul Pandey Parul Pandey
Updated on May 28, 2025

What Is the Dummy Variable Trap? (With Pandas Code Examples)

Here are a few important caveats to keep in mind when you’re encoding data with pandas.get_dummies().

Peter Grant Peter Grant
Updated on May 28, 2025

How to Create Report-Ready Plots in Python

As a data scientist, developing great models and extrapolating nuanced insights won’t get you far if you can’t communicate your findings clearly. Here’s how to present your work using bokeh.

Sara A. Metwalli Sara A. Metwalli
Updated on May 28, 2025

5 Ways to Learn Git and Version Control

Git is a distributed version control system that tracks and manages changes to code. Learn Git and how to use it with these five resources.

Image: Shutterstock / Built In
Sergen Cansiz Sergen Cansiz
Updated on May 27, 2025

Mahalanobis Distance and Multivariate Outlier Detection in R

Mahalanobis distance is a distance metric that finds the distance between a point and a distribution. It’s often used for detecting outliers in multivariate data.

Image: Shutterstock / Built In
Anthony Figueroa Anthony Figueroa
Updated on May 27, 2025

Correlation Is Not Causation

Correlation occurs when two variables change at the same time, while causation is when a change in one variable causes the other to change. Here’s why you need to understand the difference.

Image: Shutterstock / Built In
Henri Woodcock Henri Woodcock
Updated on May 27, 2025

Stop Using NumPy’s Global Random Seed

A NumPy random seed is a numerical value in Python that initializes a random number generator, allowing for reproducible results. Here's why to use np.random.default_rng() instead to set random seeds for individual functions/classes in Python.

Image: Shutterstock / Built In
Sara A. Metwalli Sara A. Metwalli
Updated on May 27, 2025

4 Probability Distributions Every Data Scientist Needs to Know

If you’re just getting started on your journey toward becoming a data scientist, these are the 4 most common distributions you’ll encounter.

Related Topics