For a person without a background in statistics, it can be difficult to understand the difference between fundamental statistical tests —not to mention when to use them. Below we cover the differences between the most common tests, how to use null value hypotheses in these tests and the conditions under which you should use each particular test.
When to Use a T-Test vs. Chi-Square Test
- When to use a t-test: To determine if a population mean is different from a known value (one-sample t-test), to compare means of two groups (independent two-sample t-test) or to compare two means of a group at different times (paired t-test).
- When to use a chi-square test: To compare categorical variables, such as determining whether sample data matches population data (chi-square goodness of fit test) or if two categorical variables are related (chi-square test of independence).
Defining Statistical Terms
Before we learn about the tests, let’s dive into some key terms.
Null Hypothesis and Hypothesis Testing
The null hypothesis proposes that no significant difference exists between a set of given observations.
In other words:
- Null: Two sample means are equal.
- Alternate: Two sample means are not equal.
To reject a null hypothesis, one needs to calculate test statistics, then compare the result with the critical value. If the test statistic is greater than the critical value, we can reject the null hypothesis.
Critical Value
A critical value is a point (or points) on the scale of the test statistic beyond which we reject the null hypothesis. We derive the level of significance (α
) of the test.
Critical value can tell us the probability of two sample means belonging to the same distribution. The higher the critical value means the lower the probability of two samples belonging to the same distribution.
The general critical value for a two-tailed test is 1.96, which is based on the fact that 95 percent of the area of a normal distribution is within 1.96 standard deviations of the mean.
Critical values can be used to do hypothesis testing in the following ways:
- Calculate test statistic.
- Calculate critical values based on significance level alpha.
- Compare the test statistic with critical values.
If the test statistic is lower than the critical value, accept the null hypothesis; otherwise reject it.
Learn more about calculating a critical value:
Note: Some statisticians would use p-value instead of critical value for conducting null hypothesis.
P-Value
P-value is the probability to the right of the respective statistic (z, t or chi). The benefit of using p-value is that it calculates a probability estimate, which means we can test at any desired level of significance by comparing this probability directly with the significance level.
For example, assume the z-value for a particular experiment comes out to be 1.67 which is greater than the critical value at five percent (1.64). Now, to check for a different significance level of one percent, we calculate a new critical value.
However, if we calculate p-value for 1.67 and it comes to be 0.047, we can use this p-value to reject the hypothesis at a five percent significance level since 0.047 < 0.05. However, with a more stringent significance level of one percent, we’ll fail to reject the hypothesis since 0.047 > 0.01. It’s important to note here that there’s no double calculation required.
Population vs. Sample
In statistics, population refers to the total set of observations we can make. For example, if we want to calculate the average human height, the population will be the total number of people actually present on Earth.
A sample, on the other hand, is a set of data collected or selected from a predefined procedure. For our example above, a sample is a small group of people selected randomly from different regions of the globe.
To draw inferences from a sample and validate a hypothesis, the sample must be random.
For instance, if we select people randomly from all regions on Earth, we can assume our sample mean is close to the population mean, whereas if we make a selection just from the United States, then our average height estimate/sample mean cannot be considered close to the population mean. Instead, it will only represent the data of a particular region (the United States). That means our sample is biased and is not representative of the population.
Distribution
Another important statistical concept to understand is distribution. When the population is infinitely large, it’s not feasible to validate any hypothesis by calculating the mean value or test parameters on the entire population. In such cases, we assume a population is some type of a distribution.
While there are many forms of distribution, the most common are binomial, Poisson and discrete.
You must determine the distribution type to calculate the critical value and decide on the best test to validate any hypothesis.
Now that we’re clear on population, sample and distribution, let’s learn about different kinds of tests and the distribution types for which they are used.
Types of Statistical Tests
1. T-Test
We use a t-test to compare the mean of two given samples. A t-test assumes a normal distribution of the sample. When we don’t know the population parameters (mean and standard deviation), we use t-test.
There are multiple variations of the t-test:
Types of T-Tests
- One sample t-test: Tests the mean of a single group against a known value.
- Independent two-sample t-test: Compares mean for two groups.
- Paired sample t-test: Compares means from the same group at different times.
The statistic for this hypothesis testing is called t-statistic, the score for which we calculate as:
t=(x1 — x2) / (σ / √n1 + σ / √n2)
, where
x1
= mean of sample 1
x2
= mean of sample 2
n1
= sample size 1
n2
= sample size 2
Note: This article focuses on normally distributed data. You can use z-tests and t-tests for data which is non-normally distributed as well if the sample size is greater than 20, however there are other preferable methods to use in such a situation.
2. Chi-Square Test
We use the chi-square test to compare categorical variables.
Types of Chi-Square Tests
- Chi-square goodness of fit test: Determines if a sample matches the population.
- Chi-square test of independence: Used to compare two independent variables in a contingency table to check if the data fits.
A small chi-square value means that data fits.
A large chi-square value means that data doesn’t fit.
The hypothesis we’re testing is:
- Null: Variable A and Variable B are independent.
- Alternate: Variable A and Variable B are not independent.
The statistic used to measure significance, in this case, is called chi-square statistic. The formula we use to calculate the statistic is:
Χ2 = Σ [ (Or,c — Er,c)2 / Er,c ]
where
Or,c
= observed frequency count at level r of Variable A and level c of Variable B
Er,c
= expected frequency count at level r of Variable A and level c of Variable B
T-Test vs. Chi-Square Test
3. Z-Test
In a z-test, we assume the sample is normally distributed, similarly to a t-test. A z-score is calculated with population parameters such as population mean and population standard deviation. We use this test to validate a hypothesis that states the sample belongs to the same population.
- Null: Sample mean is same as the population mean.
- Alternate: Sample mean is not same as the population mean.
The statistic used for this hypothesis testing is called z-statistic, the score for which we calculate as:
z = (x — μ) / (σ / √n)
, where
x
= sample mean
μ
= population mean
σ / √n
= population standard deviation
If the test statistic is lower than the critical value, accept the hypothesis.
4. ANOVA
We use analysis of variance (ANOVA) to compare three or more samples with a single test.
Types of ANOVA Tests
- One-way ANOVA: Used to compare the difference between three or more samples/groups of a single independent variable.
- MANOVA: Allows us to test the effect of one or more independent variables on two or more dependent variables. In addition, MANOVA can also detect the difference in correlation between dependent variables given the groups of independent variables.
The hypothesis we’re testing with ANOVA is:
- Null: All pairs of samples are the same (i.e. all sample means are equal).
- Alternate: At least one pair of samples is significantly different.
The statistics used to measure the significance in this case are F-statistics. We calculate the F-value using the formula:
F= ((SSE1 — SSE2)/m)/ SSE2/n-k
, where
SSE
= residual sum of squares
m
= number of restrictions
k
= number of independent variables
There are multiple tools available such as SPSS, R packages, Excel etc. to carry out ANOVA on a given sample.
How to Choose the Right Statistical Test
In all of the tests introduced in this article, we’re comparing a statistic with a critical value to accept or reject a hypothesis. However, the statistic and the way to calculate it differ depending on the type of variable, the number of samples you’re analyzing and whether or not we know the population parameters. We can thus choose a suitable statistical test and null hypothesis. This principle is instrumental to understanding these basic statistical concepts.
Frequently Asked Questions
What is the difference between t-test and chi-square test?
A t-test compares the means of two given samples, and is best for continuous, numerical data. A chi-square test compares the differences between two categorical variables.
When would you use a chi-square test?
Chi-square tests are used to compare two categorical variables. This is to determine whether observed results align with expected results and if the null hypothesis of the two variables being independent is true.
A chi-square test may be applied to determine if a sample accurately matches the population or to compare two variables in a contingency table to check if the data fits.
When would you use a t-test?
T-tests are used to compare the means of two different samples or groups, where the samples are assumed to have a normal distribution. This test is most suitable for continuous data and when the mean and standard deviation of a population are unknown.
A t-test may be applied to compare means for two groups, compare the means from the same group at different times or to test the mean of a single group against a known mean.
What type of data is suitable for a chi-square test?
A chi-square test is used to compare discrete, categorical data. Both the chi-square goodness of fit test and chi-square test of independence require categorical variables to be applied.
What are ANOVA tests used for?
An analysis of variance (ANOVA) test is used to compare the differences in results between three or more unrelated groups. A one-way ANOVA test compares the difference in the means of three or more samples/groups. A multivariate analysis of variance (MANOVA) test is used to test the effect of one or more independent variables on two or more dependent variables.
What is the z-test used for?
A z-test is used to test a hypothesis when population variance is known, such as comparing the means of two populations or comparing a sample mean and population mean. Z-tests must use normally distributed and independent data, and they are best performed with a sample size greater than 30 data points.