Symmetric matrices are matrices that are symmetric along the diagonal, which means Aᵀ = A, or in other words, the matrices are equal to each other. It’s an operator with the self-adjoint property. So, it’s important to think about a matrix as an operator and study its properties. While we can’t directly read off the geometric properties from the symmetry, we can find the most intuitive explanation of a symmetric matrix in its eigenvectors, which will give us a deeper understanding of the symmetric matrices.

What Is a Symmetric Matrix?

A symmetric matrix is a matrix that is equal to its transpose. They contain three properties, including: Real eigenvalues, eigenvectors corresponding to the eigenvalues that are orthogonal and the matrix must be diagonalizable. 

A trivial example is the identity matrix. A non-trivial example can be something like:

example of a symmetric matrix
An example of a symmetric matrix. | Image: Xichu Zhang

While the definition may be as simple as that, they contain a lot of probabilities and can mean different things. I’ll walk you through the important properties, explain them and then introduce the applications for a symmetric matrix.

The Hermitian matrix is a complex extension of the symmetric matrix, which means in a Hermitian matrix, all the entries satisfy the following:

equation definition where aij equals transpose aij
Definition 0.1. | Image: Xichu Zhang
​​​​

The symmetric matrices are simply the Hermitian matrices but with the conjugate transpose being the same as themselves. Therefore, it has all the properties that a symmetric matrix has.

example of a hermitian matrix
Example of the Hermitian matrix. | Image: Xichu Zhang

In data science, we mostly encounter matrices with real entries, since we are dealing with real-world problems. So, we’ll be exploring real cases of symmetric matrices.

 

How to Tell If a Matrix Is Symmetric

Three properties of symmetric matrices are introduced in this section. They are considered to be the most important because they concern the behavior of eigenvalues and eigenvectors of those matrices. Those are the fundamental characteristics, which distinguishes symmetric matrices from non-symmetric ones.

 

Property 1: Symmetric Matrices Have Real Eigenvalues.

This can be proved algebraically through a formal, direct proof, as opposed to induction, contradiction, etc. First, a quick explanation of eigenvalues and eigenvectors. The eigenvectors of matrix A are the vectors whose directions don’t change after A is applied to it. The direction is not changed, but the vectors can be scaled. This shows the non-triviality of this property. Real eigenvalues indicate stretching or scaling in the linear transformation, unlike complex eigenvalues, which don’t have a “size.”

The proportions that the vectors are scaled are called eigenvalues. We denote them by λ. Therefore, we have the relation Ax = λx. The proof is fairly easy, but it requires some knowledge of linear algebra. So, we will still go through it step by step.

We’ll start with Ax = λx by the conjugate transpose of x, xᴴ, and we arrive at this equation:

symmetric matrix equation 1.2
Equation 1.2. | Image: Xichu Zhang

It’s important to note that λ is a scalar, which means the multiplication involving λ is commutative. Therefore, we can move it to the left of xᴴ.

Equation 1.3 for a symmetric matrix
Equation 1.3. | Image: Xichu Zhang

xᴴx is a Euclidean norm (or two-norm), which is defined as follows:

equation 1.4 symmetric matrix
Equation 1.4: A Euclidean norm. | Image: Xichu Zhang

In a two-dimensional Euclidean space, it’s the length of a vector with coordinates (x₁, …, xₙ). We can then write equation1.3 as:

equation 1.5 for a symmetric matrix
Equation 1.5. | Image: Xichu Zhang

Since the conjugate transpose (operator H) works just like an ordinary transpose (operator T), we can use the property that xᴴA = (Ax)ᴴ.

equation 1.6 symmetric matrix
Equation 1.6. | Image: Xichu Zhang

What does (Ax)ᴴ equal to? Here we will use the relation Ax = λx again, but this time (Ax)ᴴ will leave us the complex conjugate of λ, which we will denote it as λ with a bar.

Equation 1.7 for symmetric matrix
Equation 1.7. | Image: Xichu Zhang

We have seen xᴴx before in equation 1.3. After substituting in the two-norm, which we can do because both λ_bar and xᴴx are scalars, we get this new equation:

equation 1.8 symmetric matrix
Equation 1.8. | Image: Xichu Zhang

This leads to λ, and its complex conjugate being equal.

equation 1.9 symmetric matrix
Equation 1.9 | Image: Xichu Zhang

Only under one circumstance is equation 1.9 valid, which is when λ is real. And that finishes the proof.

More on Data ScienceRecursive SQL Expression Visually Explained

 

Property 2: Eigenvectors Corresponding to the Eigenvalues Are Orthogonal.

This is also a direct formal proof, but it’s simple. First, we need to recognize our target, which is the following:

equation 1.1 symmetric matrix
Equation 1.10. | Image: Xichu Zhang

Consider a symmetric matrix A, where x₁ and x₂ are eigenvectors of A corresponding to different eigenvectors. Why we need this condition will be explained a bit later). Based on the definition of eigenvalues and symmetric matrices, we can get the following equations:

Equation 1.11 and 1.12 for symmetric matrix
Equation 1.11 (top); Equation 1.12 (bottom). | Image: Xichu Zhang

Now we need to approach equation 1.10. Let’s try to put x₁ and x₂ together. To do so, we multiply (Ax₁)ᵀ by x₁ᵀ on the left side:

equation 1.13 symmetric matrix
Equation 1.13. | Image: Xichu Zhang

In equation 1.13 apart from the property of symmetric matrix, two other facts are used: 

  1. The matrix multiplication is associative (vectors are n by 1 matrix).
  2. Matrix-scalar multiplication is commutative meaning we can move the scalar freely. 

 Since dot production is commutative, this means x₁ᵀx₂ and x₂ᵀx₁ are the same things. We now have:

Equation 1.14 symmetric matrix
Equation 1.14. | Image: Xichu Zhang

In the above equation, x₁∙x₂ denotes the dot product. If λ₁ ≠ λ₂, it must be the case that x₁∙x₂ = 0, which means those two eigenvectors are orthogonal. If λ₁ = λ₂, there are two different eigenvectors corresponding to the same eigenvalue. This can happen in an identity matrix. Since the eigenvectors are in the null space of (A-λI), which is denoted as N(A-λI), when one eigenvector corresponds to multiple eigenvectors, N(A-λI) has a dimension larger than one. In this case, we have infinite choices for those eigenvectors, and we can always choose them to be orthogonal.

There are cases where a real matrix has complex eigenvalues. This happens in rotation matrices. Why is that? Let Q be a rotation matrix. We know that the eigenvectors don’t change direction after being applied to Q. But if Q is a rotation matrix, how can x not change direction if x is a non-zero vector? The conclusion is that the eigenvector must be complex.

A rotation matrix R(θ) in the two-dimensional space is shown as follows:

rotation matrix
Rotation matrix. | Image: Xichu Zhang

R(θ) rotates a vector counterclockwise by an angle θ. It is a real matrix with complex eigenvalues and eigenvectors.

 

Property 3: Symmetric Matrices Are Always Diagonalizable. 

This is known as the spectral theorem. It is also related to the other two properties of symmetric matrices. The name of this theorem might be confusing. In fact, the set of all the eigenvalues of a matrix is called a spectrum. Also, we can think about it like this: the eigenvalue-eigenvector pairs tell us in which direction a vector is distorted after the given linear transformation.

This is shown in the following figure. After transformation, in the direction of v₁, the figure is stretched a lot. But in the direction of v₂, it’s not stretched very much.

figure 1.15 a stretched vector compared to a non-stretched vector
Figure 1.15 comparing an un-stretched vector (left) to a stretched vector (right). | Image: Xichu Zhang

A matrix that is diagonalizable means there exists a diagonal matrix D (all the entries outside of the diagonal are zeros), such that P⁻¹AP = D, where P is an invertible matrix. We can also say that a matrix is diagonalizable if the matrix can be written in the form A = PDP⁻¹.

The decomposition is unique up to the permutation of the entries on the diagonal in D and the scalar multiplication of the eigenvectors in P. We also need to note that the diagonalization is equivalent to finding eigenvectors and eigenvalues, no matter if the matrix is symmetric or not. However, for a non-symmetric matrix, D doesn’t have to be an orthogonal matrix,

Those two definitions are equivalent, but they can be interpreted differently. This decomposition makes raising the matrix to power very handy. The second one, A = PDP⁻¹, tells us how A can be decomposed. Meanwhile, the first one, P⁻¹AP = D, is the one that tells us A can be diagonalized. It tells us that it is possible to align the standard basis given by the identity matrix with the eigenvectors. The orthogonality of the eigenvectors, which is shown in property two, enables this. 

The phrase “align the standard basis with the eigenvectors” sounds abstract. We need to think about this: What does matrix transformation do to the basis? A matrix consisting of basis α = {v₁, …, vₙ}, with those vectors in the column, transforms a vector x from the standard basis to the coordinate system formed by basis α.We denote this matrix by . Therefore, in the process of the diagonalization P⁻¹AP = D, P sends a vector from the standard basis to the eigenvectors, A scales it, and then P⁻¹ sends the vector back to the standard basis. From the perspective of the vector, the coordinate system is aligned with the standard basis with the eigenvectors.

figure 1.16 alignment of the basis for a symmetric matrix
Figure 1.16 alignment of the basis. | Image: Xichu Zhang

This alignment is shown in the figure above. The matrix used in this example is:

matrix equation for figure 1.16
Equation 1.17 for figure 1.16. | Image: Xichu Zhang

Where V is the matrix with eigenvectors of length one as the columns, each of them corresponds to the eigenvalues in the diagonal matrix. As for the calculation, we can let eig in MATLAB do the work.

This property follows the spectral theorem directly, which dictates that if A is Hermitian, there exists an orthonormal basis of V consisting of eigenvectors of A. Each eigenvector is real.

The theorem directly points out a way to diagonalize a symmetric matrix. To prove the property directly, we can use induction on the size or dimension of the matrix. The basic idea of the proof is that the base case, where A is a one-by-one matrix, is trivial. Assume that the n-1-by-n-1 matrix is diagonalizable, meaning it has n-1 independent eigenvectors). We can find another eigenvector in n-dimensional space, which is orthogonal to those n-1 dimensional eigenvectors. Thus, the n-by-n matrix is also diagonalizable.

 

How to Identify the Definiteness of a Symmetric Matrix

When are those properties useful? Even before the formal study of matrices, they have been used for solving systems of linear equations for a long time. Think about the matrices as operators. The information of the linear equations is stored in those operators,  meaning matrices can be used to study the behavior of functions.

Beyond symmetry, an even better property matrix can have is positive-definiteness. If a symmetric (or Hermitian) matrix is positive-definite, all of its eigenvalues are positive. If all of its eigenvalues are non-negative, then it is a semi-definite matrix. For a matrix to be positive-definite, it’s required to be symmetrical because of property one. Since it only makes sense to ask whether a number is positive or negative, how large it is or when it is real.

Video explaining the basics of a symmetric matrix. | Video: Don't Memorise

 

How to Apply a Symmetric Matrix

A good application of this is the Hessian matrix. We will use this as an example to demonstrate using matrices for analyzing function behavior. When we try to find a local extreme, it’s always good to find out that the Hessian is positive definite. The Hessian is a matrix consisting of the second partial derivatives of a real function. Formally, let f: ℝⁿ ➝ℝ be a function, the Hessian is defined as

Definition 2.1 Hessian compact representation
Defintion 2.1, a Hessian matrix (compact representation). | Image: Xichu Zhang

And we call H(x) the Hessian of f, which is an n-by-n matrix. In definition 1.18, the Hessian is written very compactly. It is the same thing as

Definition 2.2 Hessian Matrix
Definition 2.2 Hessian matrix. | Image: Xichu Zhang

How does this affect function behavior? We will look at one super simple example. Consider the function f(x, y) = y²-x². The Hessian is computed as follows:

Equation 2.4 symmetric matrix
Equation 2.3 for a Hessian matrix. | Image: Xichu Zhang

It can also be computed using the function hessian in MATLAB. Since it’s a diagonal matrix and the trace (sum of the entries on the diagonal) equals the sum of eigenvectors, we can immediately see that one of the eigenvalues is two and another one is negative two. They correspond to the eigenvectors v₁ = [1, 0]ᵀ and v₂ = [0, 1]ᵀ. This matrix is symmetric but not positive definite. Therefore, there’s no local extreme in the whole ℝ². We can only find a saddle point on point x=0, y=0. This means in the direction of v₁, where the eigenvalue is positive, the function increases. But in the direction of v₂, where the eigenvalue is negative, the function decreases. The image of the function is shown below:

Figure 2.4 figure of a function
Figure 2.4 image of a function generated with MATLAB. | Image: Xichu Zhang

The image is generated using MATLAB with the following code.

clc; clear; close all

xx = -5: 0.2: 5;
yy = -5: 0.2: 5;

[x, y] = meshgrid(xx, yy);

zz = x.^2 - y.^2;

figure
mesh(x, y, zz)
view(-10, 10)

Now, we flip the sign and change the function into f(x, y) = x² + y². The eigenvectors remain the same, but all the eigenvectors become positive. This means, in both the direction of v₁ and the direction of v₂, the function grows. Therefore, the local minimum can be found at x=0, y=0, f(x,y) = 0. This is also the global minimum. The graph looks like this:

Symmetric matrix figure 2.5
Figure 2.5. | Image: Xichu Zhang

More on Data ScienceUnderstanding and Building Neural Network (NN) Models

 

Symmetric Matrix Advantages

Matrices have wide-usage in numerous fields. When dealing with matrices, the concepts of positive-definiteness, eigenvectors, eigenvalues and symmetric matrices are often encountered. 

Expert Contributors

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Learn More

Great Companies Need Great People. That's Where We Come In.

Recruit With Us