What Is Linear Regression? Explaining Concepts and Applications With Tensorflow 2.0.

An in-depth look at linear regression analysis with TensorFlow 2.0.
Vihar Kurama
Expert Columnist
April 24, 2020
Updated: June 3, 2021
Vihar Kurama
Expert Columnist
April 24, 2020
Updated: June 3, 2021

Linear regression is probably the first algorithm that one would learn when commencing a career in machine or deep learning because it’s simple to implement and easy to apply in real-time. This algorithm is widely used in data science and statistical fields to model the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). Several types of regression techniques are available based on the data being used. Although linear regression involves simple mathematical logic, its applications are put into use across different fields in real-time. In this article, we’ll discuss linear regression in brief, along with its applications, and implement it using TensorFlow 2.0.

Regression Analysis

Regression analysis is used to estimate the relationship between a dependent variable and one or more independent variables. This technique is widely applied to predict the outputs, forecasting the data, analyzing the time series, and finding the causal effect dependencies between the variables. There are several types of regression techniques at hand based on the number of independent variables, the dimensionality of the regression line, and the type of dependent variable. Out of these, the two most popular regression techniques are linear regression and logistic regression.

Researchers use regression to indicate the strength of the impact of multiple independent variables on a dependent variable on different scales. Regression has numerous applications. For example, consider a data set consisting of weather information recorded over the past few decades. Using that data, we could forecast weather for the next couple of years. Regression is also widely used in organizations and businesses to assess risk and growth based on previously recorded data.

You can find the implementation of regression analysis directly as a deployable code chunk. In modern machine learning frameworks like TensorFlow and PyTorch, in-built libraries are available to directly proceed with the implementation of our desired application.

The Math and Logic behind Linear Regression

The goal of linear regression is to identify the best fit line passing through continuous data by employing a specific mathematical criterion. This technique falls under the umbrella of supervised machine learning. Prior to jumping into linear regression, though, we first should understand what supervised learning is all about.

Machine learning is broadly classified into three types; supervised learning, unsupervised learning, and reinforcement learning. This classification is based on the data that we give to the algorithm. In supervised learning, we train the algorithm with both input and output data. Unsupervised learning occurs when there’s no output data given to the algorithm and it has to learn the underlying patterns by analyzing the input data. Finally, reinforcement learning involves an agent taking an action in an environment to maximize the reward in a particular situation. It paves the way for choosing the best possible path for an  algorithm to traverse. Now, let’s look more closely at linear regression itself.

Linear regression assumes that the relationship between the features and the target vector is approximately linear. That is, the effect (also called coefficient, weight, or parameter) of the features on the target vector is constant. Mathematically, linear regression is represented by the equation y = mx + c + ε.

In this equation, y is our target, x is the data for a single feature, m and c are the coefficients identified by fitting the model, and ε is the error.

Now, our goal is to tune the values of m and c to establish a good relationship between the input variable x and the output variable y. The variable m in the equation is called variance and is defined as the amount by which the estimate of the target function changes if different training data were used. The variable c represents the bias, the algorithm’s tendency to consistently learn the wrong things by not taking into account all the information in the data. For the model to be accurate, bias needs to be low. If there are any inconsistencies or missing values in the data set, bias increases. Hence, we must carry out proper preprocessing of the data before we train the algorithm.

The two main metrics we use to evaluate linear regression models are accuracy and error. For a model to be highly accurate with minimum error, we need to achieve low bias and low variance. We partition the data into training and testing data sets to keep bias in check and ensure accuracy.

A Deep Dive into Linear Regression

Before we build a supervised machine learning model, all we have is data comprising inputs and outputs. To estimate the dependency between them using linear regression, we pick two random values, variance and bias. Thereby, we consider a tuple from the data set, feed the input values to the equation y = mx + c, and predict the new values. Later, we calculate the loss incurred by the predicted value using a loss function.

The values of m and c are picked randomly, but they must be updated to minimize the error. We thereby consider loss function as a metric to evaluate the model. Our goal is to obtain a line that best reduces the error.

The most common loss function used is mean squared error. It is mathematically represented as 

<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>E</mi><mi>r</mi><mi>r</mi><mi>o</mi><mi>r</mi><mo>&#xA0;</mo><mo>=</mo><mo>&#xA0;</mo><mstyle displaystyle="false"><munderover><mo>&#x2211;</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover></mstyle><msup><mfenced><mrow><mi>a</mi><mi>c</mi><mi>t</mi><mi>u</mi><mi>a</mi><msub><mi>l</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi><mi>p</mi><mi>u</mi><mi>t</mi></mrow></msub><mo>&#xA0;</mo><mo>-</mo><mo>&#xA0;</mo><mi>p</mi><mi>r</mi><mi>e</mi><mi>d</mi><mi>i</mi><mi>c</mi><mi>t</mi><mi>e</mi><msub><mi>d</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi><mi>p</mi><mi>u</mi><mi>t</mi></mrow></msub></mrow></mfenced><mn>2</mn></msup></math>

If we don’t square the error, the positive and negative points cancel each other out. The static mathematical equations of bias and variance are as follows:

<math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>b</mi><mn>0</mn></msub><mo>&#xA0;</mo><mo>=</mo><mo>&#xA0;</mo><menclose notation="top"><mi>y</mi></menclose><mo>&#xA0;</mo><mo>-</mo><mo>&#xA0;</mo><msub><mi>b</mi><mn>1</mn></msub><menclose notation="top"><mi>x</mi></menclose></math>

<math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>b</mi><mrow><mn>1</mn><mo>&#xA0;</mo></mrow></msub><mo>=</mo><mo>&#xA0;</mo><mfrac><mrow><munderover><mo>&#x2211;</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mfenced><mrow><msub><mi>x</mi><mi>i</mi></msub><mo>&#xA0;</mo><mo>-</mo><mo>&#xA0;</mo><menclose notation="top"><mi>x</mi></menclose></mrow></mfenced><mfenced><mrow><msub><mi>y</mi><mi>i</mi></msub><mo>&#xA0;</mo><mo>-</mo><mo>&#xA0;</mo><menclose notation="top"><mi>y</mi></menclose></mrow></mfenced></mrow><mrow><munderover><mo>&#x2211;</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msup><mfenced><mrow><msub><mi>x</mi><mi>i</mi></msub><mo>&#xA0;</mo><mo>-</mo><mo>&#xA0;</mo><menclose notation="top"><mi>x</mi></menclose></mrow></mfenced><mn>2</mn></msup></mrow></mfrac></math>

When we train a network to find the ideal variance and bias, different values can yield different errors. Out of all the values, there will be one point where the error value will be minimized, and the parameters corresponding to this value will yield an optimal solution. At this point, gradient descent comes into the picture.

Gradient descent is an optimization algorithm that finds the values of parameters (coefficients) of a function (f) to minimize the cost function (cost). The learning rate defines the rate at which the parameters are updated. It controls the rate at which we would be adjusting the weights of our network with respect to the loss gradient. The lower the value, the slower we travel the downward slope along which the weights get updated at every step.

Both the m and c values are updated as follows:

<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi><mo>&#xA0;</mo><mo>=</mo><mo>&#xA0;</mo><mi>m</mi><mo>&#xA0;</mo><mo>-</mo><mo>&#xA0;</mo><mi>&#x3B1;</mi><mfrac><mi>d</mi><mrow><mi>d</mi><mi>m</mi></mrow></mfrac><mfenced><mrow><mi>E</mi><mi>r</mi><mi>r</mi><mi>o</mi><mi>r</mi></mrow></mfenced></math>

<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi><mo>&#xA0;</mo><mo>=</mo><mo>&#xA0;</mo><mi>c</mi><mo>&#xA0;</mo><mo>-</mo><mo>&#xA0;</mo><mi>&#x3B1;</mi><mfrac><mi>d</mi><mrow><mi>d</mi><mi>c</mi></mrow></mfrac><mfenced><mrow><mi>E</mi><mi>r</mi><mi>r</mi><mi>o</mi><mi>r</mi></mrow></mfenced></math>

Once the model is trained and achieves a minimum error, we can fix the values of bias and variance. Ultimately, this is how the best fit line looks like when plotted between the data points:

Building a Linear Regression model with TensorFlow 2.0

So far, we’ve seen the fundamentals of linear regression, and now it’s time to implement one. We could use several data science and machine learning libraries to directly import linear regression functions or APIs and apply them to the data. In this section, we will build a model with TensorFlow that’s based on the math that we talked about in the previous sections. The code is organized as a sequence of steps. You can simultaneously implement these chunks of code in your local machine or in any of the cloud platforms like Paperspace or Google Colab. If it’s your local machine, make sure to install Python and TensorFlow. If you are using Google Colab Notebooks, TensorFlow is preinstalled. To install any other modules like sklearn or matplotlib, you can use pip. Make sure you add an exclamation (!) symbol as a prefix to the pip command, which allows you to access the terminal from the notebook.

Step 1: Importing the Necessary Modules

Getting started, first and foremost, we need to import all the necessary modules and packages. In Python, we use the import keyword to do this. We can also alias them using the keyword as. For example, to create a TensorFlow variable, we import TensorFlow first, followed by the class tensorflow.Variable(). If we create an alias for TensorFlow as tf, we can create the variable as tf.Variable(). This saves time and makes the code look clean. We then import a few other methods from the __future__ library to help port our code from Python 2 to Python 3. We also import numpy to create a few samples of data. We declare a variable rng with np.random which is later used to initialize random weights and biases.

from __future__ import absolute_import, division, print_function

import tensorflow as tf

import numpy as np

rng = np.random

Step 2: Creating a Random Data Set

The second step is to prepare the data. Here, we use numpy to initialize both the input and output arrays. We also need to make sure that both arrays are the same shape so that every element in the input array would correspond to every other element in the output array. Our goal is to identify the relationship between each corresponding element in the input array and the output array using linear regression. Below is the code snippet that we would use to load the input values into variable x and output values into variable y.

X= np.array([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,

              7.042,10.791,5.313,7.997,5.654, 9.27,3.1])


    1, 2.827,3.465,1.65,2.904,2.42,2.94,1.3])

n_samples = X.shape[0]

Step 3: Setting up the Hyperparameters

Hyperparameters are the core components of any neural network architecture because they ensure accuracy of a model. In the code snippet below, we define learning rate, number of epochs, and display steps. You can also experiment by tweaking the hyperparameters to achieve a greater accuracy.

learning_rate = 0.01

epochs = 1000

display_step = 50

Step 4: Initializing Weights and Biases

Now that we have our parameters equipped, let’s initialize weights and biases with random numerics. We do this using the rng variable that was previously declared. We define two tensorflow variables W and b and set them to random weights and biases, respectively, using the tf.Variable class.

# Weight and Bias initialized randomly.

W = tf.Variable(rng.randn(), name="weight")

b = tf.Variable(rng.randn(), name="bias")

Step 5: Defining Linear Regression and Cost Function

Here comes the essential component of our code! We now define linear regression as a simple function, linear_regression. The function takes input x as a parameter and returns the weighted sum, weights * inputs + bias. This function is later called in the training loop while training the model with data. Further, we define loss as a function called mean_square. This function takes a predicted value that is returned by the linear_regression method and a true value that is picked from the data set. We then use tf to replicate the math equation discussed above and return the computed value from the function thereupon.

# Linear regression (Wx + b).

def linear_regression(x):

    return W * x + b

# Mean square error.

def mean_square(y_pred, y_true):

    return tf.reduce_sum(tf.pow(y_pred-y_true, 2)) / (2 * n_samples)

Step 6: Building Optimizers and Gradients

We now define our optimizer as stochastic gradient descent and plug in learning rate as a parameter to it. Next, we define the optimization process as a function, run_optimization, where we calculate the predicted values and the loss that they generate using our linear_regression() and mean_square() functions as defined in the previous step. Thereafter, we compute the gradients and update the weights in the optimization process. This function is invoked in the training loop that we’ll discuss in the upcoming section.

# Stochastic Gradient Descent Optimizer

optimizer = tf.optimizers.SGD(learning_rate)

# Optimization process. 

def run_optimization():

  # Wrap computation inside a GradientTape for automatic differentiation.

    with tf.GradientTape() as g:

        pred = linear_regression(X)

        loss = mean_square(pred, Y)

    # Compute gradients.

    gradients = g.gradient(loss, [W, b])


    # Update W and b following gradients.

    optimizer.apply_gradients(zip(gradients, [W, b]))

Step 7: Constructing the Training Loop

This marks the end of our training process. We have set all the parameters, declared our models, loss function, and the optimization function. In the training loop, we stack all these and iterate the data for a certain number of epochs. The model gets trained and with every iteration, the weights get updated. Once the total number of iterations is complete, we get the ideal values of W and b.

Let’s work through the code chunk below. We write a simple for loop in Python and iterate the data until the total number of epochs is complete. We then run our optimization function by invoking the run_optimization method where the weights get updated using the previously defined SGD rule. We then display the loss and the step number using the print function, along with the metrics.

# Run training for the given number of steps.

for step in range(1, epochs + 1):

    # Run the optimization to update W and b values.



    if step % display_step == 0:

        pred = linear_regression(X)

        loss = mean_square(pred, Y)

        print("step: %i, loss: %f, W: %f, b: %f" % (step, loss, W.numpy(), b.numpy()))

Step 8: Visualizing Linear Regression

While concluding the code, we visualize the best fit line using matplotlib library.

import matplotlib.pyplot as plt

# Graphic display

plt.plot(X, Y, 'ro', label='Original data')

plt.plot(X, np.array(W * X + b), label='Fitted line')



Applications of Linear Regression

Linear regression is a powerful statistical technique that can generate insights on consumer behavior, help to understand business better, and comprehend factors influencing profitability. It can also be put to service evaluating trends and forecasting data in a variety of fields. We can use linear regression to solve a few of our day-to-day problems related to supporting decision making, minimizing errors, increasing operational efficiency, discovering new insights, and creating predictive analytics.

In this article, we have reviewed how linear regression works, along with its implementation in TensorFlow 2.0. This method sets the baseline to further explore the various ways of chalking out machine learning algorithms. Now that you have a handle on linear regression and TensorFlow 2.0, you can try experimenting further with a lot other frameworks by considering various data sets to check how each one of those fares.

Expert Contributors

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Learn More

Great Companies Need Great People. That's Where We Come In.

Recruit With Us