Hidden Markov Model Explained

Q: What’s the difference between a hidden Markov model vs. a Markov model?

A hidden Markov model is a probabilistic model used when the results are the product of one of several hidden internal states. A Markov model is a probabilistic model used to predict a sequence of events when the internal states can be observed.

Hidden Markov models are probabilistic frameworks where the observed data are modeled as a series of outputs generated by one of several (hidden) internal states. Both Markov and hidden Markov models are engineered to handle data that can be represented as a sequence of observations over time.

Hidden Markov Model Definition

A hidden Markov model is a probabilistic framework used to predict the results of an event based on a series of observations with one or several hidden internal states.

While a Markov model or Markov chain concerns stochastic (random) process states that are visible to the observer, a hidden Markov model pertains to stochastic processes where states can be hidden or not directly visible to the observer.

What Is a Hidden Markov Model (HMM)?

A hidden Markov model (HMM) is utilized when we can’t observe the states of a stochastic process themselves, but only the result of some probability function (observation) of the states. HMM is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.

Hidden Markov models can be used to identify underlying patterns or structures in sequential data. This makes it applicable for research and tasks in machine learning (including natural language processing and speech recognition), bioinformatics and gene analysis as well as time-series forecasting.

For example, in speech recognition tasks, a hidden Markov model algorithm may be implemented to measure the probability of a certain word or lack of words occurring in a given audio recording. In this case, the occurrence of specific words or silence in the recording can represent states, and volume of speech throughout the recording can represent observations. By knowing observations (volume), this information can be used by the algorithm to determine the likelihood of hidden states — or words and lack of words in this example — and predict the most probable word being spoken.

Mathematically, here is how Markov models and hidden Markov models differ:

Markov model: Series of (hidden) states z={z_1,z_2………….} drawn from state alphabet S ={s_1,s_2,…….𝑠_|𝑆|} where z_i belongs to S.
Hidden Markov model: Series of observed output x = {x_1,x_2,………} drawn from an output alphabet V= {𝑣1, 𝑣2, . . , 𝑣_|𝑣|} where x_i belongs to V.

Hidden Markov Model Assumptions

A hidden Markov model is built on several assumptions, including:

1. Output Independence Assumption

Output observation is conditionally independent of all other hidden states and all other observations when given the current hidden state.

2. Emission Probability Matrix

Probability of hidden state generating output v_i given that state at the corresponding time was s_j.

More on Machine LearningHistogram of Oriented Gradients: An Overview

Markov Model Assumptions

Markov models are developed based on two assumptions:

1. Limited Horizon Assumption

Probability of being in a state at a time t depend only on the state at the time (t-1).

Limited horizon assumption equation — Equation 1: Limited horizon assumption. | Image: Vivek Vinushanth Christopher

That means state at time t represents enough summary of the past to reasonably predict the future. This assumption is an order-1 Markov process. An order-k Markov process assumes conditional independence of state z_t from the states that are k + 1-time steps before it.

2. Stationary Process Assumption

Conditional (probability) distribution over the next state, given the current state, doesn’t change over time.

Stationary process assumption equation — Equation 2: Stationary process assumption. | Image: Vivek Vinushanth Christopher

That means states keep on changing over time but the underlying process is stationary.

Notation Convention

There is an initial state and an initial observation z_0 = s_0
s_0: Initial probability distribution over states at time 0.
Initial state probability: (π)
At t=1, probability of seeing first real state z_1 is p(z_1/z_0).
Since z0 = s0:

State Transition Matrix

𝐀𝐢,𝐣: probability of transitioning from state i to state j at any time t.

The following chart is a state transition matrix of four states, including the initial state:

Fig. 2: State transition matrix chart. | Image: Vivek Vinushanth Christopher

2 Questions Answered in a Markov Model

What is the probability of particular sequences of state z?
How do we estimate the parameter of state transition matrix A to maximize the likelihood of the observed sequence?

Probability of Particular Sequences in a Markov Model

Eq.4. Finding probability of particular sequence — Eq.4: Finding probability of particular sequence. | Image: Vivek Vinushanth Christopher

Consider the state transition matrix above. Let’s determine the probability of sequence:

{z1 = s_hot , z2 = s_cold , z3 = s_rain , z4 = s_rain , z5 = s_cold}

P(z) = P(s_hot|s_0 ) P(s_cold|s_hot) P(s_rain|s_cold) P(s_rain|s_rain) P(s_cold|s_rain)

= 0.33 x 0.1 x 0.2 x 0.7 x 0.2 = 0.000924

Hidden Markov Model Example

Consider the example given below, which elaborates how a person feels in different climates.

Fig.3. Markov Model as Finite State Machine — Image by Author — Fig.3: Markov model as finite state machine. | Image: Vivek Vinushanth Christopher

Set of states: (S) = {Happy, Grumpy}
Set of hidden states: (Q) = {Sunny , Rainy}
State series over time: = z∈ S_T
Observed states for four day: = {z1=Happy, z2= Grumpy, z3=Grumpy, z4=Happy}

The feeling that you understand from a person emoting is called the observations, since you observe them. The weather that influences the feeling of a person is called the hidden state, since you can’t observe it.

Emission Probabilities

In the above example, feelings (“Happy” or “Grumpy”) can be only observed. A person can observe that a person has an 80 percent chance to be “happy” given that the climate at the particular point of observation is sunny. Similarly there’s a 60 percent chance of a person being “grumpy” given that the climate is rainy. The 80 percent and 60 percent are emission probabilities since they deal with observations.

Transition Probabilities

When we consider the climates (hidden states) that influence the observations, there are correlations between consecutive days being sunny or alternate days being rainy. There is an 80 percent chance for the Sunny climate to be in successive days, whereas there’s a 60 percent chance for it to be rainy on consecutive days. The probabilities that explain the transition to/from hidden states are transition probabilities.

How Does a Hidden Markov Model Work?

A hidden Markov model answers three primary questions:

What is the probability of an observed sequence?
What is the most likely series of states to generate an observed sequence?
How can we learn the values for the HMMs parameters A and B given some data?

Probability of Observed Sequence

We have to add up the likelihood of the data x given every possible series of hidden states. This will lead to a complexity of O(|S|)^T. Hence, two alternate procedures were introduced to find the probability of an observed sequence.

Forward Procedure

Calculate the total probability of all the observations (from t_1) up to time t:

𝛼_𝑖 (𝑡) = 𝑃(𝑥_1 , 𝑥_2 , … , 𝑥_𝑡, 𝑧_𝑡 = 𝑠_𝑖; 𝐴, 𝐵)

Backward Procedure

Similarly calculate total probability of all the observations from final time (T) to t:

𝛽_i (t) = P(x_T , x_T-1 , …, x_t+1 , z_t= s_i ; A, B)

A tutorial on how the hidden markov model works. | Video: ritvikmath

More on Machine LearningEuclidean Distance Explained

Hidden Markov Model Using Forward Procedure

Below is an example of a hidden Markov model using forward procedure.

S = {hot,cold}
v = {v1=1 ice cream ,v2=2 ice cream, v3=3 ice cream}, where V is the Number of ice creams consumed in a day.
Example Sequence: = {x1=v2,x2=v3,x3=v1,x4=v2}

Fig.4: Given data as matrices. | Image: Vivek Vinushanth Christopher

Generated finite state machines for HMM. | Image: Vivek Vinushanth Christopher

We first need to calculate the prior probabilities, that is, the probability of being hot or cold previous to any actual observation. This can be obtained from S_0 or π. From Fig.4, S_0 is provided as 0.6 and 0.4, which are the prior probabilities. Then based on Markov and HMM assumptions, we follow the steps in the figures below to calculate the probability of a given sequence.

1. First Observed Output x1=v2

Step one of hidden markov model illustrated — Fig. 6: Step 1 of the HMM. | Image: Vivek Vinushanth Christopher

2. Observed Output x2=v3

Fig. 7: Step 2 of HMM illustrated. | Image: Vivek Vinushanth Christopher

3. Observed Output x3 and x4

Similarly for x3=v1 and x4=v2, we have to simply multiply the paths that lead to v1 and v2.

Fig. 8: Step 3 and 4 of HMM. | Image: Vivek Vinushanth Christopher

4. Maximum Likelihood Assignment

For a given observed sequence of outputs𝑥 𝜖 𝑉_𝑇, we intend to find the most likely series of states𝑧 𝜖 𝑆_𝑇. We can understand this with an example found below.

Fig.9. Data for example two — Fig.9: Data for example two. | Image: Vivek Vinushanth Christopher

Fig.10. Markov Model as a Finite State Machine from Fig.9. data — Fig.10: Markov model as a finite state machine from Fig.9. data. | Image: Vivek Vinushanth Christopher

The Viterbi algorithm is a dynamic programming algorithm similar to the forward procedure which is often used to find maximum likelihood. Instead of tracking the total probability of generating the observations, it tracks the maximum probability and the corresponding state sequence.

Consider the sequence of emotions: H,H,G,G,G,H for six consecutive days. Using the Viterbi algorithm we will find out the more likelihood of the series.

Fig.11. The Viterbi algorithm requires to choose the best path — Fig.11: The Viterbi algorithm requires to choose the best path. | Image: Vivek Vinushanth Christopher

There will be several paths that will lead to sunny on Saturday, and many paths that lead to rainy on Saturday. Here, we intend to identify the best path to sunny or rainy Saturday and multiply with the transition emission probability of “Happy,” since Saturday makes the person feel “Happy.”

Let’s consider a sunny Saturday. The previous day, Friday, can be sunny or rainy. Then we need to know the best path up to Friday, and then multiply with emission probabilities that lead to a grumpy feeling. Iteratively, we need to figure out the best path at each day ending up in more likelihood of the series of days.

Step 1 for final model — Fig.12: Step 1. | Image: Vivek Vinushanth Christopher

Fig. 13: Step 2. | Image: Vivek Vinushanth Christopher

Fig.14. Iterate the algorithm to choose the best path — Image by Author — Fig.14. Iterate the algorithm to choose the best path. | Image: Vivek Vinushanth Christopher

The algorithm leaves you with maximum likelihood values and we now can produce the sequence with a maximum likelihood for a given output sequence.

5. Learn the Values for the HMMs Parameters A and B

Learning in HMMs involves estimating the state transition probabilities A and the output emission probabilities B that make an observed sequence most likely. Expectation-Maximization algorithms are used for this purpose. A commonly-used algorithm known as the Baum-Welch algorithm falls under this category and uses the forward algorithm.

Frequently Asked Questions

What is a hidden Markov model?

A hidden Markov model is a statistical model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. It’s used when you can’t observe the states themselves but only the result of a probability function of the states

What’s the difference between a hidden Markov model vs. a Markov model?

A hidden Markov model is a probabilistic model used when the results are the product of one of several hidden internal states.
A Markov model is a probabilistic model used to predict a sequence of events when the internal states can be observed.

What are the basic problems of hidden Markov model?

The three basic problems of a hidden Markov model (HMM) include:

Evaluation or Likelihood Problem: Given an HMM λ = (A, B) and an observation sequence of O = O_1 O_2, … O_T, determine the likelihood P(O'λ).
Decoding Problem: Given an HMM λ = (A, B) and an observation sequence of O = O_1 O_2, … O_T, determine the optimal hidden state sequence of Q (set of finite states).
Learning Problem: Given an observation sequence of O = O_1 O_2, … O_T, learn and adjust the HMM parameters A and B to maximize the probability P(O'λ).

Understanding the Hidden Markov Model