What is the Central Limit Theorem
The central limit theorem states that the distribution of the sample mean
The remarkable point of this theorem is that it can be approximated by a normal distribution regardless of the distribution of the population. No matter what the original probability distribution is, if
It is important to note that it is the distribution of the sample mean that can approximate the normal distribution, not the distribution of the sample itself taken from the population. The distribution of the sample mean is the distribution formed by the mean value when the process of extracting a sample from the population and finding its mean is repeated many times.
Check the Central Limit Theorem in Python
Dice example
Let's experiment with the distribution of the total number of dice thrown N (1, 2, 5, 10, 50, 100) times. The dice follow a uniform distribution since the probability of getting any number from 1 to 6 is equal to
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
#%matplotlib inline
sns.set()
sns.set_context(rc = {'patch.linewidth': 0.2})
sns.set_style('dark')
numIterations = np.asarray([1,2,5,10,50,100]); #number of i.i.d RVs
experiment = 'dice' #valid values: 'dice', 'coins'
maxNumForExperiment = {'dice':6,'coins':2} #max numbers represented on dice or coins
nSamp=100000
k = maxNumForExperiment[experiment]
fig, fig_axes = plt.subplots(ncols=3, nrows=2, constrained_layout=True, figsize=(12,8))
for i,N in enumerate(numIterations):
y = np.random.randint(low=1,high=k+1,size=(N,nSamp)).sum(axis=0)
row = i//3;col=i%3;
bins=np.arange(start=min(y),stop=max(y)+2,step=1)
fig_axes[row,col].hist(y,bins=bins,density=True)
fig_axes[row,col].set_title('N={} {}'.format(N,experiment))
plt.show()
As N increases (i.e., the sample size to be extracted increases), the distribution of the total value of the dice (distribution of the sample mean) approaches a normal distribution.
Next, let us experiment with the distribution of the mean value of the dice thrown N (1, 2, 5, 10, 50, 100) times.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
#%matplotlib inline
sns.set()
sns.set_context(rc = {'patch.linewidth': 0.2})
sns.set_style('dark')
numIterations = np.asarray([1,2,5,10,50,100]); #number of i.i.d RVs
experiment = 'coins' #valid values: 'dice', 'coins'
maxNumForExperiment = {'dice':6,'coins':2} #max numbers represented on dice or coins
nSamp=100000
k = maxNumForExperiment[experiment]
for i,N in enumerate(numIterations):
y = np.random.randint(low=1,high=k +1,size=(N,nSamp)).sum(axis=0)/N
row = i//3;col=i%3;
bins=np.arange(start=1,stop=7,step=0.1)
fig_axes[row,col].hist(y,bins=bins,density=True)
fig_axes[row,col].set_title('N={} {}'.format(N,experiment))
plt.show()
We can see that as N increases, the distribution of the mean of the dice rolls approaches a normal distribution. Also, since the variance of the sample mean is
Coin example
Let us experiment with the distribution of the total value of a coin tossed N (1, 2, 5, 10, 50, 100) times, with 1 for a heads-up and 2 for a tails-up. The probability of a coin being heads or tails is equal to
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
#%matplotlib inline
sns.set()
sns.set_context(rc = {'patch.linewidth': 0.2})
sns.set_style('dark')
numIterations = np.asarray([1,2,5,10,50,100]); #number of i.i.d RVs
experiment = 'coins' #valid values: 'dice', 'coins'
maxNumForExperiment = {'dice':6,'coins':2} #max numbers represented on dice or coins
nSamp=100000
k = maxNumForExperiment[experiment]
fig, fig_axes = plt.subplots(ncols=3, nrows=2, constrained_layout=True, figsize=(12,8))
for i,N in enumerate(numIterations):
y = np.random.randint(low=1,high=k+1,size=(N,nSamp)).sum(axis=0)
row = i//3;col=i%3;
bins=np.arange(start=min(y),stop=max(y)+2,step=1)
fig_axes[row,col].hist(y,bins=bins,density=True)
fig_axes[row,col].set_title('N={} {}'.format(N,experiment))
plt.show()
As N increases (i.e., the sample size to be extracted increases), the distribution of the sum of the coin values (distribution of the sample mean) approaches a normal distribution.
Next, let us experiment with the distribution of the average value of the eyes that appear when a coin is tossed N (1, 2, 5, 10, 50, 100) times.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
#%matplotlib inline
sns.set()
sns.set_context(rc = {'patch.linewidth': 0.2})
sns.set_style('dark')
numIterations = np.asarray([1,2,5,10,50,100]); #number of i.i.d RVs
experiment = 'coins' #valid values: 'dice', 'coins'
maxNumForExperiment = {'dice':6,'coins':2} #max numbers represented on dice or coins
nSamp=100000
k = maxNumForExperiment[experiment]
for i,N in enumerate(numIterations):
y = np.random.randint(low=1,high=k +1,size=(N,nSamp)).sum(axis=0)/N
row = i//3;col=i%3;
bins=np.arange(start=1,stop=3,step=0.1)
fig_axes[row,col].hist(y,bins=bins,density=True)
fig_axes[row,col].set_title('N={} {}'.format(N,experiment))
plt.show()
As N increases, we see that the distribution of the mean of the coin values approaches a normal distribution. Also, since the variance of the sample mean is
References