Data analysis is an aspect of data science that is all about analyzing data for different kinds of purposes. It involves inspecting, cleaning, transforming and modeling data to draw useful insights from it.

## What Are the Different Types of Data Analysis?

- Descriptive analysis
- Exploratory analysis
- Inferential analysis
- Predictive analysis
- Causal analysis
- Mechanistic analysis

With its multiple facets, methodologies and techniques, data analysis is used in a variety of fields, including — business, science and social science, among others. As businesses thrive under the influence of many technological advancements, data analysis plays a huge role in decision making, providing a better, faster and more efficacious system that minimizes risks and reduces human biases.

That said, there are different kinds of analysis catered with different goals. We’ll examine each one below.

## Two Camps of Data Analysis

Data analysis can be divided into two camps, according to the book *R for Data Science*:

**Hypothesis Generation**— This involves looking deeply at the data and combining your domain knowledge to generate hypotheses about why the data behaves the way it does.**Hypothesis Confirmation**— This involves using a precise mathematical model to generate falsifiable predictions with statistical sophistication to confirm your prior hypotheses.

## Types of Data Analysis

Data analysis can be separated and organized into six types, arranged in an increasing order of complexity.

- Descriptive analysis
- Exploratory analysis
- Inferential analysis
- Predictive analysis
- Causal analysis
- Mechanistic analysis

### 1. Descriptive Analysis

The goal of descriptive analysis is* *to describe or summarize a set of data. Here’s what you need to know:

- Descriptive analysis is the very first analysis performed.
- It generates simple summaries about samples and measurements.
- It involves common, descriptive statistics like measures of central tendency, variability, frequency, and position.

#### Descriptive Analysis Example

Take the COVID-19 statistics page on Google for example. The line graph is a pure summary of the cases/deaths, a presentation and description of the population of a particular country infected by the virus.

Descriptive analysis is the first step in analysis where you summarize and describe the data you have using descriptive statistics, and the result is a simple presentation of your data.

### 2. Exploratory Analysis (EDA)

Exploratory analysis involves examining or exploring data and finding relationships between variables that were previously unknown. Here’s what you need to know:

- EDA helps you discover relationships between measures in your data, which are not evidence for the existence of the correlation, as denoted by the phrase, “Correlation doesn’t imply causation.”
- It’s useful for discovering new connections and forming hypotheses. It drives design planning and data collection.

#### Exploratory Analysis Example

Climate change is an increasingly important topic as the global temperature is gradually rising over the years. One example of an exploratory data analysis on climate change involves taking the rise in temperature over the years from 1950 to 2020 and the increase of human activities and industrialization to find relationships from the data. For example, you may increase the number of factories, cars on the road and airplane flights to see how that correlates with the rise in temperature.

Exploratory analysis explores data to find relationships between measures without identifying the cause. It’s most useful when formulating hypotheses.

### 3. Inferential Analysis

Inferential analysis involves using a small sample of data to infer information about a larger population of data.

The goal of statistical modeling itself is all about using a small amount of information to extrapolate and generalize information to a larger group. Here’s what you need to know:

- Inferential analysis involves using estimated data that is representative of a population and gives a measure of uncertainty or standard deviation to your estimation.
- The accuracy of inference depends heavily on your sampling scheme. If the sample isn’t representative of the population, the generalization will be inaccurate. This is known as the central limit theorem.

#### Inferential Analysis Example

The idea of drawing an inference about the population at large with a smaller sample size is intuitive. Many statistics you see on the media and the internet are inferential; a prediction of an event based on a small sample. For example, a psychological study on the benefits of sleep might have a total of 500 people involved. When they followed up with the candidates, the candidates reported to have better overall attention spans and well-being with seven-to-nine hours of sleep, while those with less sleep and more sleep than the given range suffered from reduced attention spans and energy. This study drawn from 500 people was just a tiny portion of the 7 billion people in the world, and is thus an inference of the larger population.

Inferential analysis extrapolates and generalizes the information of the larger group with a smaller sample to generate analysis and predictions.

### 4. Predictive Analysis

Predictive analysis involves* *using historical or current data to find patterns and make predictions about the future. Here’s what you need to know:

- The accuracy of the predictions depends on the input variables.
- Accuracy also depends on the types of models. A linear model might work well in some cases, and in other cases it might not.
- Using a variable to predict another one doesn’t denote a causal relationship.

#### Predictive Analysis Example

The 2020 US election is a popular topic and many prediction models are built to predict the winning candidate. FiveThirtyEight did this to forecast the 2016 and 2020 elections. Prediction analysis for an election would require input variables such as historical polling data, trends and current polling data in order to return a good prediction. Something as large as an election wouldn’t just be using a linear model, but a complex model with certain tunings to best serve its purpose.

Predictive analysis takes data from the past and present to make predictions about the future.

### 5. Causal Analysis

Causal analysis* *looks at the cause and effect of relationships between variables and is focused on finding the cause of a correlation. Here’s what you need to know:

- To find the cause, you have to question whether the observed correlations driving your conclusion are valid.Just looking at the surface data won’t help you discover the hidden mechanisms underlying the correlations.
- Causal analysis is applied in randomized studies focused on identifying causation.
- Causal analysis is the gold standard in data analysis and scientific studies where cause of phenomenon is to be extracted and singled out, like separating wheat from chaff.
- Good data is hard to find and requires expensive research and studies. These studies are analyzed in aggregate (multiple groups), and the observed relationships are just average effects (mean) of the whole population. This means the results might not apply to everyone.

#### Causal Analysis Example

Say you want to test out whether a new drug improves human strength and focus. To do that, you perform randomized control trials for the drug to test its effect. You compare the sample of candidates for your new drug against the candidates receiving a mock control drug through a few tests focused on strength and overall focus and attention. This will allow you to observe how the drug affects the outcome.

Causal analysis is about finding out the causal relationship between variables, and examining how a change in one variable affects another.

### 6. Mechanistic Analysis

Mechanistic analysis is used to* *understand exact changes in variables that lead to other changes in other variables. Here’s what you need to know:

- It’s applied in physical or engineering sciences, situations that require high precision and little room for error, only noise in data is measurement error.
- It’s designed to understand a biological or behavioral process, the pathophysiology of a disease or the mechanism of action of an intervention.

#### Mechanistic Analysis* *Example

Many graduate-level research and complex topics are suitable examples, but to put it in simple terms, let’s say an experiment is done to simulate safe and effective nuclear fusion to power the world. A mechanistic analysis of the study would entail a precise balance of controlling and manipulating variables with highly accurate measures of both variables and the desired outcomes. It’s this intricate and meticulous modus operandi toward these big topics that allows for scientific breakthroughs and advancement of society.

Mechanistic analysis is in some ways a predictive analysis, but modified to tackle studies that require high precision and meticulous methodologies for physical or engineering science*. *

## When to Use the Different Types of Data Analysis

**Descriptive analysis**summarizes the data at hand and presents your data in a comprehensible way.**Exploratory data analysis**helps you discover correlations and relationships between variables in your data.**Inferential analysis**is for generalizing the larger population with a smaller sample size of data.**Predictive analysis**helps you make predictions about the future with data.**Causal analysis**emphasizes on finding the cause of a correlation between variables.**Mechanistic analysis**is for measuring the exact changes in variables that lead to other changes in other variables.

A few important tips to remember include:

- Correlation doesn’t imply causation.
- EDA helps discover new connections and forming hypothesis.
- Accuracy of inference depends on sampling scheme.
- A good prediction depends on the right input variables.
- A simple linear model with enough data usually does the trick.
- Using a variable to predict another doesn’t denote causal relationships.
- Good data is hard to find, and to produce it requires expensive research.
- Results from studies are done in aggregate and are average effects and might not apply to everyone.