I want to discuss some fundamental terms and concepts related to probability and statistics that come up in almost any machine learning and AI literature.

What Is Probability?

Probability is the measure of the likelihood that an event will occur in a random experiment. Probability statistics are quantified as a number between zero and one, where zero indicates impossibility and one indicates certainty.

 

Probability Statistics and Theory Concepts to Know

13 Probability Statistics and Theory Terms to Know

  1. Random experiment
  2. Sample space
  3. Random variables
  4. Probability
  5. Conditional probability
  6. Independence
  7. Conditional independence
  8. Expectation
  9. Variance
  10. Probability distribution
  11. Joint probability distribution
  12. Conditional probability
  13. Factor

 

Random Experiment

A random experiment is a physical situation whose outcome cannot be predicted until it is observed.

 

Sample Space

A sample space is a set of all possible outcomes of a random experiment.

Sample space example.
Example of a sample space for a random experiment flipping a coin. | Image: Parag Radke

 

Random Variables

A random variable is a variable whose possible values are numerical outcomes of a random experiment. There are two types of random variables:

  1. Discrete random variable: This is a variable that may take on only a countable number of distinct values, such as zero, one, two, three, four, etc. Discrete random variables are usually, but not necessarily, counts.
  2. Continuous random variable: This is a variable that takes an infinite number of possible values. Continuous random variables are usually measurements.
Continuous random variable example.
Example of random variables for a coin flipping experiment. | Image: Parag Radke

Want to Machine Learn More?The Top 10 Machine Learning Algorithms Every Beginner Should Know

 

Probability

Probability is the measure of the likelihood that an event will occur in a random experiment. Probability is quantified as a number between zero and one, where, loosely speaking, zero indicates impossibility and one indicates certainty. The higher the probability of an event, the more likely it is that the event will occur.

 

Example

A simple example is the tossing of a fair (unbiased) coin. Since the coin is fair, the two outcomes, “heads” and “tails,” are both equally probable. Since no other outcomes are possible, the probability of either “heads” or “tails” is 0.5 or 50%.

 

Conditional Probability

Conditional probability is a measure of the probability of an event occurring given that by assumption, presumption, assertion or evidence another event has already occurred. If the event of interest is A, and event B is known or assumed to have occurred, “the conditional probability of A given B” is usually written as P(A|B).

Conditional probability example.
Equation to calculate the conditional probability for rolling a dice. | Image: Parag Radke
Conditional probability example.
Conditional probability solution for rolling a dice. | Image: Parag Radke

 

Independence

Two events are said to be independent of each other if the probability that one event occurs in no way affects the probability of the other event occurring. In other words, if we have observations about one event, it doesn’t affect the probability of the other. For independent events A and B, below is true:

Independence example.
Known true events for two independent events. | Image: Parag Radke

 

Example

For example, let’s say you rolled a die and flipped a coin. The probability of getting any number face on the die in no way influences the probability of getting a head or a tail on the coin.

 

Conditional Independence

Events A and B are conditionally independent given a third event C precisely if the occurrence of A and the occurrence of B are independent events in their conditional probability distribution given C. In other words, A and B are conditionally independent given C, if and only if, given knowledge that C already occurred, knowledge of whether A occurs provides no additional information on the likelihood of B occurring. And knowledge of whether B occurs provides no additional information on the likelihood of A occurring.

Conditional independence example.
Conditional independence formula example. | Image: Parag Radke

 

Example

A box contains two coins, a regular coin and a fake, two-headed coin (P(H)=1P(H)=1). I choose a coin at random and toss it twice.

Let: 

  • A = First coin toss results in an HH.
  • B = Second coin toss results in an HH.
  • C = Coin 1 (regular) has been selected.

If C is already observed, i.e. we already know whether a regular coin is selected or not, the event A and B becomes independent as the outcome of one doesn’t affect the outcome of the other event.

 

Expectation

The expectation of a random variable X is written as E(X). If we observe N random values of X, then the mean of the N values will be approximately equal to E(X) for large N. In more concrete terms, expectation is what you would expect the outcome of an experiment to be on an average if you repeat the experiment a large number of times.

Expectation example.
Expectation calculation for a random variable. The expectation is 3.5 . If you think about it, 3.5 is halfway between the possible values the die can take and so this is what you should have expected. | Image: Parag Radke

 

Variance

The variance of a random variable X is a measure of how concentrated the distribution of a random variable X is around its mean. It’s defined as:

Variance example.
A calculation for variance in an experiment measuring die rolls. | Image: Parag Radke

 

Probability Distribution

Probability distribution is a mathematical function that maps all possible outcomes of a random experiment with its associated probability. It depends on the random variable X and whether it’s discrete or continues.

 

1. Discrete probability distribution 

The mathematical definition of a discrete probability function, p(x), is a function that satisfies the following properties. This is referred to as probability mass function.

Discrete probability distribution example.
Probability distribution for a single coin flip. | Image: Parag Radke

 

2. Continuous Probability Distribution 

The mathematical definition of a continuous probability function, f(x), is a function that satisfies the following properties. This is referred to as a probability density function.

Continuous probability distribution example.
Example of a probability density function for a coin flip. | Image: Parag Radke
Video explaining the basics of probability. | Video: Khan Academy

More on Data ScienceExplaining the Empirical Rule for Normal Distribution

 

Joint Probability Distribution

If X and Y are two random variables, the probability distribution that defines their simultaneous behavior during outcomes of a random experiment is called a joint probability distribution. The joint distribution function of X and Y, is defined as:

Joint Probability example.
In general if there are n random variables and each can take values v1, v2 … vn different values then there will be total (v1)^n*(v2)^n*…(vn)^n rows in the table. | Image: Parag Radke

 

Conditional Probability Distribution (CPD)

If Z is random variable who is dependent on other variables X and Y, then the distribution of P(Z|X,Y) is called a conditional probability distribution (CPD) of Z with regards to X and Y. This means that for every possible combination of random variables X, Y, we represent a probability distribution over Z.

 

Example

There is a student who has a property called “intelligence” which can be either low(I_0) or high(I_1). They enroll into a course, and that course has a property called “Difficulty,” which can take binary values easy(D_0) or difficult(D_1). The student gets a “Grade” in the course based on his performance. Grade can take three values G_1(best), (G_2) or (G_3)(worst). Then the CPD P(G|I,D) is as follows:

Conditional probability distribution table.
Conditional probability distribution table for grades. | Image: Parag Radke

There are a number of operations that one can perform over any probability distribution to get interesting results. Some of the important operations include:

 

1.Conditioning/Reduction

If we have a probability distribution of n random variables X1, X2 … Xn, and we make an observation about k variables that they acquired certain values a1, a2, …, ak, this means we already know their assignments. Then the rows in the JD which are not consistent with the observation can be removed leaving us with fewer rows. This operation is known as reduction.

Reduction example.
Reduction operation for a coin flip experiment. | Image: Parag Radke

 

2. Marginalization

This operation takes a probability distribution over a large set of random variables and produces a probability distribution over a smaller subset of the variables. This operation is known as marginalizing a subset of random variables. It’s very useful when we have a large set of random variables as features, and we’re interested in a smaller set of variables and how it affects output. 

Marginalization example.
Marginalization for two bias coins. | Image: Parag Radke

 

Factor

A factor is a function or a table that takes a number of random variables {X_1, X_2,…,X_n} as an argument and produces a real number as an output. The set of input random variables are called scope of the factor. For example, joint probability distribution is a factor that takes all possible combinations of random variables as input and produces a probability value for that set of variables that is a real number. Factors are the fundamental block to represent distributions in high dimensions, and it supports all basic operations that joint distributions can be operated on like product, reduction and marginalization.

Factor equation.
Factor equation. | Image: Parag Radke

 

Factor Product

Factor table example.
Factor product table example. | Image: Parag Radke

We can do factor products and the result will also be a factor.

Expert Contributors

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Learn More

Great Companies Need Great People. That's Where We Come In.

Recruit With Us