What Is a Monte Carlo Simulation?

Monte Carlo simulations are a tool we use to predict the probability of various outcomes in a process that’s difficult to assess due to random variables. Here’s how to perform one yourself.

Written by Peter Grant
Published on Mar. 06, 2023
Image: Shutterstock / Built In
Image: Shutterstock / Built In
Brand Studio Logo

Monte Carlo simulations, also called multiple probability simulations, are a modeling technique commonly used in the financial and engineering industries to evaluate the impact of risk and uncertainty on a process. For example, we can use Monte Carlo simulations when estimating the risk associated with an investment. 

These simulations start by using numerous inputs to represent the uncertainty in those inputs. The process then creates several different outputs in order to present the uncertainty in the results, which are driven by the uncertainty in the inputs themselves.

Monte Carlo Simulation Steps

  1. Generate an average input value.
  2. Calculate the standard deviation.
  3. Generate a random value using the standard deviation. 
  4. Perform the simulation using the randomized input value.
  5. Interpret the results.

More From the Built In Tech DictionaryParametric vs. Non-Parametric Tests and When to Use Them

 

How Does the Monte Carlo Simulation Work? 

As a tool originating in the financial industry, we commonly use Monte Carlo simulations as a way to evaluate the risk of financial investments or new business practices, though they also have applications in physics and engineering fields. 

The fundamental problem Monte Carlo simulations address is that we cannot know all the inputs with certainty. For instance, the volatility in the stock market is an unknown input when estimating the rate of return from stock investments. 

Traditionally, models estimating the return of stocks assume a single value for the volatility and treat the output of the simulation as the correct answer. Since volatility isn’t a well-known input (by definition), it’s impossible to know the correct value to enter into the simulation so the final result will likely be inaccurate.

Monte Carlo simulations address this problem. Instead of using one assumed value for uncertain inputs, Monte Carlo simulations use several different simulations using randomly generated input values (within reason) to estimate the possible impact the uncertainty could have. The process generates a number of different outputs, thereby enabling the user to identify the most likely result, the highest and lowest feasible result, as well as the anticipated likely range of results.

Find out who's hiring.
See all Data + Analytics jobs at top tech companies & startups
View 3894 Jobs

 

How to Perform Monte Carlo Simulations 

You can perform Monte Carlo simulations to evaluate the potential impact of uncertainty with one input using the following five steps. To evaluate the potential impacts of uncertainty in several different inputs, perform the same process while applying step three to all inputs.

  1. Use the available data to generate an average input value. In the case of stock market predictions, this might be a historical average return of the stock. In the case of a building’s energy consumption, it could be the historical average heating energy for a day with similar weather. 
  2. In addition to generating the average value, calculate the standard deviation for values in the entire data set. 
  3. Generate a random value using the standard deviation. The Python package NumPy provides several options, and Excel offers the RAND function. 
  4. Perform the simulation using the randomized input value. In the case of financial estimates this could mean calculating the return of a stock using different random numbers representing the volatility in the market. If simulating the energy consumption of water heaters, this could mean assuming different hot water use behavior of a building’s occupants.
  5. Interpret the results. After performing many simulations using random numbers generated from the standard deviation, the results will be in the form of a bell curve. The middle value of the bell curve will be the most likely result. The spread of results shows the possible outcomes, with more extreme results being less likely than those closer to the middle.
Monte Carlo Simulation. | Video: MarbleScience

 

Monte Carlo Simulation Example

Let’s say you’re trying to estimate the energy consumption of residential water heaters, which is determined by the home’s hot water consumption and the efficiency of the water heater. Different people use different amounts of water and different water heaters operate with different efficiency, so both of these inputs come with a high degree of uncertainty. We can use a Monte Carlo simulation to estimate both a typical value and the spread.

First we need a data set. In a real-world use case you’d need to find data representing actual products and behaviors but, for the sake of an example, we can create some assumed values using NumPy and Pandas

Let’s assume that a home’s hot water consumption ranges between 40 and 75 liters per day, and the efficiency of water heaters ranges between 64 and 84 percent. We can create a sample data set representing 100 homes with the following code:

import pandas as pd
import numpy as np

df = pd.DataFrame(index = range(0, 1000), columns = [’Draw Volume (gal)’, ‘Efficiency (%)’])
df[’Draw Volume (gal)’] = np.random.uniform(low = 40, high = 75, size = (1000,))
df[’Efficiency (%)’] = np.random.uniform(low = 64, high = 84, size = (1000,))

Note that this example has two uncertain values, so we must use the Monte Carlo technique on both hot water consumption and efficiency.

The first step is to calculate the typical values for both using the mean() function.

avg_vol = df[’Draw Volume (gal)’].mean()
avg_eff = df[’Efficiency (%)’].mean()

The second step is to calculate the standard deviation in both values using the std() function.

std_vol = df[’Draw Volume (gal)’].std()
std_eff = df[’Efficiency (%)’].std()

In this example we’ll evaluate the uncertainty by considering 10,000 samples, which means that we’ll perform steps three and four ten thousand times. To do that we’ll create a Pandas DataFrame with 10,000 rows, iterate through each row, calculate the randomized values to enter and then calculate the energy consumption of the water heater using the following code.

df = pd.DataFrame(index = range(0, 10000), columns = [’Energy Consumption (kWh)’])

density_water = 1000 #g/L
specificheat_water = 4.186 #J/g-C
temperature_in = 15 #deg C
temperature_out = 51.7 #deg C
J_in_kWh = 3600000

for ix in df.index:
 vol = np.random.normal(avg_vol, std_vol)
 eff = np.random.normal(avg_eff, std_eff)
 df.loc[ix, ‘Energy Consumption (kWh)’] = vol * density_water * specificheat_water *(temperature_out - temperature_in) / (J_in_kWh * eff/100)

The first line creates the data frame with 10,000 rows. 

The column will store the energy consumption of the water heater. 

The following four lines store the density of water, the specific heat of water, assumptions about the temperature of water entering and leaving the water heater, and a conversion from Joules to kilowatt hours.

After providing the inputs, the for loop iterates through each row in the data frame to perform the 10,000 simulations. In each simulation it calculates the randomized volume of hot water consumed (vol), the randomized efficiency of the water heater (eff) and the energy consumption of the water heater given those randomized values. The energy consumption is then stored in the Energy Consumption (kWh) column for that row of the data frame.

Once the simulations are complete, the simulation’s final step is to interpret the results. In this case we want to know the average, minimum, maximum and standard deviation in energy consumption. We can calculate all four results using the following four lines:

print(df[’Energy Consumption (kWh)’].mean())
print(df[’Energy Consumption (kWh)’].min())
print(df[’Energy Consumption (kWh)’].max())
print(df[’Energy Consumption (kWh)’].std())

And we see that the output is:

3.374547974667905
1.224967913710901
6.244467468834779
0.6402038850013316

This means that the average water heater will consume 3.37 kWh each day, with a standard deviation of 0.64 kWh. The lowest use case is 1.22 kWh, which is significantly lower than the average, and the highest uses 6.24 kWh which is almost double. 

Advantages of a Monte Carlo Simulation

Monte Carlo simulations help us overcome our own inability to approximate the impacts of uncertain inputs. Historically, modeling approaches would create a single deterministic output based on a single assumed input value and provide no estimate of how the results will change if the inputs are wrong. By performing the simulation with many different inputs, Monte Carlo simulations provide results for different realistic scenarios and give us a clearer picture of the possible results given our uncertain inputs.

More From Built In ExpertsAn Introduction to Portfolio Optimization in Python

 

Limitations of a Monte Carlo Simulation

There are several possible limitations of Monte Carlo simulations which may or may not apply to different situations.

  • Monte Carlo simulations provide the ability to estimate the impact of error in inputs but can’t evaluate the impact of imperfections in the model itself. Incorrect model structure will still yield incorrect results, but the Monte Carlo simulation will not provide any insight into those impacts.
  • Generating a clear statistical picture of the possible results requires performing many simulations. Depending on the complexity of the simulation, this process may require either significant amounts of time or computing resources.
  • Monte Carlo simulations do not automatically overcome the inherent problem of uncertainty. They can provide statistical evaluations of how the results may change as a result of the uncertainty, but they cannot remove the uncertainty itself.
Explore Job Matches.