Ordinary least squares (OLS) regression is an optimization strategy that helps you find a straight line as close as possible to your data points in a linear regression model. OLS is considered the most useful optimization strategy for linear regression models as it can help you find unbiased real value estimates for your alpha and beta.

## What Is Ordinary Least Squares (OLS) Regression?

Ordinary least squares (OLS) regression is an optimization strategy that allows you to find a straight line that’s as close as possible to your data points in a linear regression model.

Why is that? It’s helpful to first understand linear regression algorithms.

## How OLS Applies to Linear Regression

Linear regression is a family of algorithms employed in supervised machine learning tasks. Since supervised machine learning tasks are normally divided into classification and regression, we can collocate linear regression algorithms into the latter category. It differs from classification because of the nature of the target variable. In classification, the target is a categorical value (“yes/no,” “red/blue/green,” “spam/not spam,” etc.). Regression involves numerical, continuous values as a target. As a result, the algorithm will be asked to predict a continuous number rather than a class or category. Imagine that you want to predict the price of a house based on some relative features, the output of your model will be the price, hence, a continuous number.

Regression tasks can be divided into two main groups: Those that only use one feature to predict the target, and those that use more than one feature for that purpose. To give you an example, let’s consider the house task above. If you want to predict a house’s price only based on its squared meters, you will fall into the first situation (one feature), but if you are going to predict the price based on its squared meters, its position and the liveability of the surrounding environment, you are going to fall into the second group for multiple features.

In the first scenario, you are likely to employ a simple linear regression algorithm, which we’ll explore more later in this article. On the other hand, whenever you’re facing more than one feature to explain the target variable, you are likely to employ a multiple linear regression.

Simple linear regression is a statistical model widely used in machine learning regression tasks. It’s based on the idea that the relationship between two variables can be explained by the following formula:

Where εi is the error term, and α, β are the true (but unobserved) parameters of the regression. The parameter β represents the variation of the dependent variable when the independent variable has a unitary variation. If my parameter is equal to 0.75, when my x increases by one, my dependent variable will increase by 0.75. On the other hand, the parameter α represents the value of our dependent variable when the independent one is equal to zero.

Let’s visualize it graphically:

More on Machine Learning: What Is Unsupervised Learning?

## How to Find OLS in a Linear Regression Model

The goal of simple linear regression is to find those parameters α and β for which the error term is minimized. To be more precise, the model will minimize the squared errors. Indeed, we don’t want our positive errors to be compensated for by the negative ones, since they are equally penalizing our model.

This procedure is called ordinary least squares (OLS) error.

Let’s demonstrate those optimization problems step-by-step. If we reframe our squared error sum as follows: Equation for reframing the squared error sum. | Image: Valentina Alto.

We can set our optimization problem as follows:

Knowing that the sample covariance between two variables is given by: Equation for finding the covariance between two variables. | Image: Valentina Alto

And that the sample correlation coefficient between two variables is equal to:

We can reframe the above expression as follows:

The same reasoning holds for our α:

Once obtained, those values of α and β, which minimize the squared errors, our model’s equation will look like this:

More on Machine Learning: Multiclass Classification With an Imbalanced Data Set