Guide to Expectation Maximization Algorithm

The Expectation Maximization (EM) algorithm is an iterative optimization algorithm commonly used in machine learning and statistics to estimate the parameters of probabilistic models, where some of the variables in the model are hidden or unobserved.

Expectation Maximization Algorithm Uses: Examples

The EM algorithm is used throughout statistics applications in fields such as neuroscience when inferring hierarchical parameters of a Bayesian model. It is also useful for statistical modeling in computer vision, genetics, finance and a number of other fields.

The EM algorithm consists of two main steps: the expectation (E) step and the maximization (M) step.

More From Max ReynoldsHow to Make a Python Calculator

What Is the EM Algorithm?

The EM algorithm is a widely used optimization algorithm in machine learning and statistics, particularly for estimating the parameters of probabilistic models where some variables are hidden or unobserved. The EM algorithm is frequently used in clustering, where the assignment of data points to clusters is uncertain or ambiguous. The Gaussian Mixture Model (GMM) is one of the more popular clustering algorithms that uses the EM algorithm to estimate cluster parameters and assignments.

The EM algorithm is also useful for missing data estimation. The observed data is used to estimate the probability distribution of the missing data, and the algorithm iteratively estimates the missing values. The EM algorithm is also useful for estimating latent variable models like hidden Markov models or hierarchical Bayesian models.

EM Algorithm Steps

There are three main steps in the EM algorithm. We’ll go over the steps in the context of a Gaussian Mixture Model. Specifically, we assume that our data points belong to either of two Gaussian distributions:

First is the initialization step. We randomly initialize the parameters of the Gaussian distributions: µ₁, σ₁, µ₂σ₂.

Graph of the initialization step for the EM algorithm — Image: Max Reynolds / Built In

In this illustration, we have nine “missing” data points (z) in one-dimensional space that we believe belong to either of two distributions (red or green). We can do this calculation easily using the probability density functions of the Gaussians.

Next is the expectation step. Here, we compute the probability of each possible value for the points. This is straightforward using the probability density function:

In our case, we plug in one of our points z₁ for x, and we use either model’s parameters: µ₁ or µ₂ and σ₁ or σ₂ to get relative probabilities that z₁ belongs to either of the two distributions. Finally, we compute the “posterior” using Bayes Rule for the class probability. For example, our updated probability that z₁ in the first distribution is:

We do this for all of the points, and now have a probability of belonging to either distribution for each point. Here is what the probabilities for belonging to the first distribution might look like:

Graph of expectation step for EM algorithm — Image: Max Reynolds / Built In

Next is the maximization step. Here we update the parameters of the model to their most likely values, given the assignments we just gave out unknown data points. For the new µ₁, we just take a weighted average of all the points, weighted by the probability that they belong to the first distribution. So our new µ₁is:

where r₁ is just the probability that z₁ belongs to the first (red) distribution. Remember, this was computed in Step 2. The new σ₁ is similar, we just take the weighted-average squared distance of the points from our new mean.

We do the same thing for µ₂ and σ₂. Out updated distributions might now look something like this:

Graph of maximization step for EM algorithm — Image: Max Reynolds / Built In

Now, all we do is repeat the expectation and maximization steps until convergence. And that’s the EM algorithm!

More Reading on Machine LearningThe Top 10 Machine Learning Algorithms Every Beginner Should Know

Limitations of the EM Algorithm

Although the EM algorithm is a powerful statistical tool, it has some limitations. One of the main limitations is that it can get stuck in local optima. This means that the algorithm may not always converge to the global maximum likelihood estimates of the parameters.

Another limitation is that the algorithm assumes that the data is generated from a specific statistical model, which may not always be the case. For example, we may or may not actually have two Gaussian distributions generating our data but the EM algorithm of course relies on this assumption. Finally, the algorithm can be slow for larger datasets and more complicated models.

Expectation Maximization Algorithm Uses: Examples

What Is the EM Algorithm?

EM Algorithm Steps

Limitations of the EM Algorithm

Recent Machine Learning Articles