What is the t-distribution

The t-distribution is a probability distribution that the sample mean \bar{X} standardized by the sample standard deviation follows.

If the population mean is \mu and the standard deviation is \sigma, the distribution of the sample mean becomes a normal distribution with mean \mu and variance \frac{\sigma^2}{n} from central limit theorem. The z-value, which is the standardized value of the sample mean, is as follows:

z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}

Hypothesis testing is performed based on the standard normal distribution, which is the distribution that the above equation z follows. However, the standard deviation \sigma of the population in the above equation is usually an unknown value. Therefore, we use the sample standard deviation s instead of \sigma. Therefore, we will consider the following distribution that t follows.

t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}
s^2 = \frac{1}{n-1} \sum^n_{i=1} (X_i - \bar{X})^2

The s^2 is also called unbiased variance. The distribution that the above equation t follows is the t-distribution. In other words, the t-distribution is an alternative distribution to the standard normal distribution.

The probability density function of the t-distribution is expressed by the following equation:

f(x) = \frac{\Gamma(\frac{n + 1}{2})}{\sqrt{n \pi} \Gamma(\frac{n}{2})}(1 + \frac{x^2}{n})^{-(\frac{n + 1}{2})}

The probability density function of a t-distribution is determined by the shape of the distribution with only degrees of freedom (n). The graph of a t-distribution with 1 to 10 degrees of freedom looks like this.

t distribution

As n degrees of freedom increase, the distribution approaches the standard normal distribution.

t and standard normal distribution

Thus, if n is sufficiently large, the t-distribution can be used as a proxy for the standard normal distribution in hypothesis testing where the population variance is unknown. This test is called the t-test.

Expected value and variance of the t-distribution

The expected value and variance of the t-distribution are respectively:

E(X)=0
V(X) = \left\{ \begin{array}{ll} \infty & (1 < n \leq 2) \\ \frac{n}{n-2} & (n > 2) \end{array} \right.

t-distribution table (upper side)

Since the t-distribution has only n parameters, the probabilities of the t-distribution can be summarized in a table called the t-distribution table. Below is a t-distribution table summarizing the degrees of freedom of the t-distribution for which the upper side probabilities \alpha are equal to 0.1, 0.05, 0.025, 0.01 and 0.005, respectively.

degree of freedom n \alpha=0.1 \alpha=0.05 \alpha=0.025 \alpha=0.01 \alpha=0.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
80 1.292 1.664 1.990 2.374 2.639
120 1.289 1.658 1.980 2.358 2.617
180 1.286 1.653 1.973 2.347 2.603
240 1.285 1.651 1.970 2.342 2.596
\infty 1.258 1.645 1.96 2.326 2.576

For example, if you want to find the upper 5% point of a t-distribution with 20 degrees of freedom, look for the value at the intersection of n=20 and \alpha=0.05. Thus, the upper 5% point you are looking for is 1.725.

Python Code

The following is the Python code used to draw the t-distribution.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

plt.style.use('ggplot')
fig, ax = plt.subplots(facecolor="w", figsize=(10, 5))

x = np.linspace(-4, 4, 100)
z = stats.norm.pdf(x, loc=0, scale=1)

for df in range(1, 11):
    t = stats.t.pdf(x, df)
    plt.plot(x, t, label=f"t dist(df={df})")
plt.legend()

t distribution

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

plt.style.use('ggplot')
fig, ax = plt.subplots(facecolor="w", figsize=(10, 5))

x = np.linspace(-4, 4, 100)
z = stats.norm.pdf(x, loc=0, scale=1)

for df in [1, 5, 10]:
    t = stats.t.pdf(x, df)
    plt.plot(x, t, label=f"t dist(df={df})")
plt.plot(x, z, label='Std norm dist', linewidth=4)
plt.legend()

t and standard normal distribution

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!