Basic Linear Algebra for Deep Learning

October 8, 2019
Updated: October 11, 2019
Written by Niklas Donges

The concepts of Linear Algebra are crucial for understanding the theory behind Machine Learning, especially for Deep Learning. They give you better intuition for how algorithms really work under the hood, which enables you to make better decisions. So if you really want to be a professional in this field, you cannot escape mastering some of its concepts. This post will give you an introduction to the most important concepts of Linear Algebra that are used in Machine Learning.

basic linear algebra deep learning example

Table of Contents:

  • Introduction
  • Mathematical Objects
  • Computational Rules
  • Matrix Multiplication Properties
  • Inverse and Transpose
  • Summary
  • Resources

Introduction

Linear Algebra is a continuous form of mathematics and is applied throughout science and engineering because it allows you to model natural phenomena and to compute them efficiently. Because it is a form of continuous and not discrete mathematics, a lot of computer scientists don’t have a lot of experience with it. Linear Algebra is also central to almost all areas of mathematics like geometry and functional analysis. Its concepts are a crucial prerequisite for understanding the theory behind Machine Learning, especially if you are working with Deep Learning Algorithms. You don’t need to understand Linear Algebra before getting started with Machine Learning, but at some point, you may want to gain a better understanding of how the different Machine Learning algorithms really work under the hood. This will help you to make better decisions during a Machine Learning system’s development. So if you really want to be a professional in this field, you will have to master the parts of Linear Algebra that are important for Machine Learning. In Linear Algebra, data is represented by linear equations, which are presented in the form of matrices and vectors. Therefore, you are mostly dealing with matrices and vectors rather than with scalars (we will cover these terms in the following section). When you have the right libraries, like Numpy, at your disposal, you can compute complex matrix multiplication very easily with just a few lines of code. (Note: this blog post ignores concepts of Linear Algebra that are not important for Machine Learning.)

 

Mathematical Objects

mathematical objects

Scalar

A scalar is simply a single number. For example 24.

Vector

A Vector is an ordered array of numbers and can be in a row or a column. A Vector has just a single index, which can point to a specific value within the Vector. For example, V2 refers to the second value within the Vector, which is -8 in the graphic above.

basic linear algebra deep learning matrix

Matrix

A Matrix is an ordered 2D array of numbers and it has two indices. The first one points to the row and the second one to the column. For example, M23 refers to the value in the second row and the third column, which is 8 in the yellow graphic above. A Matrix can have multiple numbers of rows and columns. Note that a Vector is also a Matrix, but with only one row or one column.

The Matrix in the example in the yellow graphic is also a 2- by 3-dimensional Matrix (rows x columns). Below you can see another example of a Matrix along with its notation:

basic linear algebra deep learning tensor

Tensor

You can think of a Tensor as an array of numbers, arranged on a regular grid, with a variable number of axes. A Tensor has three indices, where the first one points to the row, the second to the column and the third one to the axis. For example, T232 points to the second row, the third column, and the second axis. This refers to the value 0 in the right Tensor in the graphic below:

tensor

 

Tensor is the most general term for all of these concepts above because a Tensor is a multidimensional array and it can be a Vector and a Matrix, depending on the number of indices it has. For example, a first-order Tensor would be a Vector (1 index). A second-order Tensor is a Matrix (2 indices) and third-order Tensors (3 indices) and higher are called Higher-Order Tensors (3 or more indices).

 

Computational Rules

1. Matrix-Scalar Operations

If you multiply, divide, subtract, or add a Scalar to a Matrix, you do so with every element of the Matrix. The image below illustrates this perfectly for multiplication:

MATRIX-SCALAR OPERATIONS

2. Matrix-Vector Multiplication

Multiplying a Matrix by a Vector can be thought of as multiplying each row of the Matrix by the column of the Vector. The output will be a Vector that has the same number of rows as the Matrix. The image below shows how this works:

MATRIX-VECTOR MULTIPLICATION
MATRIX-VECTOR MULTIPLICATION 2

To better understand the concept, we will go through the calculation of the second image. To get the first value of the resulting Vector (16), we take the numbers of the Vector we want to multiply with the Matrix (1 and 5), and multiply them with the numbers of the first row of the Matrix (1 and 3). This looks like this:

1*1 + 3*5 = 16

We do the same for the values within the second row of the Matrix:

4*1 + 0*5 = 4

And again for the third row of the Matrix:

2*1 + 1*5 = 7

Here is another example:

MATRIX-VECTOR MULTIPLICATION 3

And here is a kind of cheat sheet:

MATRIX-VECTOR MULTIPLICATION 4

3. Matrix-Matrix Addition and Subtraction

Matrix-Matrix Addition and Subtraction is fairly easy and straightforward. The requirement is that the matrices have the same dimensions and the result is a Matrix that has also the same dimensions. You just add or subtract each value of the first Matrix with its corresponding value in the second Matrix. See below:

MATRIX-MATRIX ADDITION AND SUBTRACTION

4. Matrix-Matrix Multiplication

Multiplying two Matrices together isn’t that hard either if you know how to multiply a Matrix by a Vector. Note that you can only multiply Matrices together if the number of the first Matrix’s columns matches the number of the second Matrix’s rows. The result will be a Matrix with the same number of rows as the first Matrix and the same number of columns as the second Matrix. It works as follows:

You simply split the second Matrix into column-Vectors and multiply the first Matrix separately by each of these Vectors. Then you put the results in a new Matrix (without adding them up!). The image below explains this step by step:

MATRIX-MATRIX MULTIPLICATION

And here is again some kind of cheat sheet:

MATRIX-MATRIX MULTIPLICATION 2

 

Matrix Multiplication Properties

Matrix Multiplication has several properties that allow us to bundle a lot of computation into one Matrix multiplication. We will discuss them one by one below. We will start by explaining these concepts with Scalars and then with Matrices because this will give you a better understanding of the process.

1. Not Commutative

Scalar Multiplication is commutative but Matrix Multiplication is not. This means that when we are multiplying Scalars, 7*3 is the same as 3*7. But when we multiply Matrices by each other, A*B isn’t the same as B*A.

2. Associative

Scalar and Matrix Multiplication are both associative. This means that the Scalar multiplication 3(5*3) is the same as (3*5)3 and that the Matrix multiplication A(B*C) is the same as (A*B)C.

3. Distributive

Scalar and Matrix Multiplication are also both distributive. This means that 
3(5 + 3) is the same as 3*5 + 3*3 and that A(B+C) is the same as A*B + A*C.

4. Identity Matrix

The Identity Matrix is a special kind of Matrix but first, we need to define what an Identity is. The number 1 is an Identity because everything you multiply with 1 is equal to itself. Therefore every Matrix that is multiplied by an Identity Matrix is equal to itself. For example, Matrix A times its Identity-Matrix is equal to A.

You can spot an Identity Matrix by the fact that it has ones along its diagonals and that every other value is zero. It is also a “squared matrix,” meaning that its number of rows matches its number of columns.

IDENTITY MATRIX

We previously discussed that Matrix multiplication is not commutative but there is one exception, namely if we multiply a Matrix by an Identity Matrix. Therefore, the following equation is true: A*I = I*A = A

 

Inverse and Transpose

The Matrix inverse and the Matrix transpose are two special kinds of Matrix properties. Again, we will start by discussing how these properties relate to real numbers and then how they relate to Matrices.

1. Inverse

First of all, what is an inverse? A number that is multiplied by its inverse is equal to 1. Note that every number except 0 has an inverse. If you multiply a Matrix by its inverse, the result is its Identity Matrix. The example below shows what the inverse of Scalars looks like:

INVERSE

But not every Matrix has an inverse. You can compute the inverse of a Matrix if it is a “squared Matrix” and if it has an inverse. Discussing which Matrices have an inverse would be unfortunately out of the scope of this post.

Why do we need an inverse? Because we can’t divide Matrices. There is no concept of dividing by a Matrix but we can multiply a Matrix by an inverse, which results essentially in the same thing.

The image below shows a Matrix multiplied by its inverse, which results in a 2-by-2 identity Matrix.

2-by-2 identity Matrix

You can easily compute the inverse of a Matrix (if it has one) using Numpy. Heres the link to the documentation: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.linalg.inv.html.

2. Transpose

And lastly, we will discuss the Matrix Transpose Property. This is basically the mirror image of a Matrix, along a 45-degree axis. It is fairly simple to get the Transpose of a Matrix. Its first column is the first row of the Matrix Transpose and the second column is the second row of the Matrix Transpose. An m*n Matrix is transformed into an n*m Matrix. Also, the Aij element of A is equal to the Aji(transpose) element. The image below illustrates that:

TRANSPOSE

Summary

In this post, you learned about the mathematical objects of Linear Algebra that are used in Machine Learning. You learned how to multiply, divide, add and subtract these mathematical objects. Furthermore, you have learned about the most important properties of Matrices and why they enable us to make more efficient computations. On top of that, you have learned what inverse and transpose Matrices are and what you can do with them. Although there are also other parts of Linear Algebra used in Machine Learning, this post gave you a proper introduction to the most important concepts.

Resources

Deep Learning (book) — Ian Goodfellow, Joshua Bengio, Aaron Courville

https://machinelearningmastery.com/linear-algebra-machine-learning/

Andrew Ng’s Machine Learning course on Coursera

https://en.wikipedia.org/wiki/Linear_algebra

https://www.mathsisfun.com/algebra/scalar-vector-matrix.html

https://www.quantstart.com/articles/scalars-vectors-matrices-and-tensors-linear-algebra-for-deep-learning-part-1

https://www.aplustopper.com/understanding-scalar-vector-quantities/

 


Niklas Donges is an entrepreneur, technical writer and AI expert. He worked on an AI team of SAP for 1.5 years, after which he founded Markov Solutions. The Berlin-based company specializes in artificial intelligence, machine learning and deep learning, offering customized AI-powered software solutions and consulting programs to various companies.

RelatedRead More About Data Science

Great Companies Need Great People. That's Where We Come In.

Recruit With Us