Interval Estimation of Population Mean
Estimating the population mean is a fundamental question in statistics. In many scenarios, we have a sample of data collected from a larger population, and we are interested in making inferences about the population parameter like mean.
In this article, I will introduce the steps to estimate population mean.
Steps to Construct Confidence Intervals
Identify Sample Statistic
Since we are trying to estimate a population mean, we choose the sample mean (denoted as
Select Confidence Level
The confidence level is defined for us in the problem. Common choices are 90%, 95%, or 99% confidence.
Find Standard Error
The standard error (SE) is a measure of statistical accuracy. It describes the typical error or the deviation of a statistic (like the mean) from the actual (but often unknown) population parameter. The standard error plays a critical role in the construction of confidence intervals.
If we had the population standard deviation (
However, in most practical cases,
If we use the sample standard deviation, the sampling distribution of the sample mean does not follow a normal distribution. Instead, it follows a t-distribution. The t-distribution is similar to the normal distribution but has heavier tails. This makes it more conservative (gives wider confidence intervals) when the sample size is small. As the sample size gets larger, the t-distribution gets closer to the normal distribution.
Construct Confidence Interval
Once we have calculated the standard error, we can construct the confidence interval. A confidence interval provides an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.
The confidence interval can be expressed as follows:
Python Implementation
Here's a simple example of how you do interval estimation of population mean using scipy.stats
. We will generate a sample of size 100 from a normal distribution, then construct a 95% confidence interval for the mean.
import numpy as np
import scipy.stats as stats
# Generate a sample from a normal distribution
np.random.seed(0) # for reproducibility
sample = np.random.normal(loc=50, scale=5, size=100) # mean = 50, std_dev = 5, sample size = 100
# Calculate the sample mean and standard deviation
sample_mean = np.mean(sample)
sample_std_dev = np.std(sample, ddof=1) # ddof=1 to use the unbiased estimator
# Choose the desired confidence level
confidence_level = 0.95
# Find the standard error
standard_error = sample_std_dev / np.sqrt(len(sample))
# Calculate the margin of error. Here we use t-distribution
degree_of_freedom = len(sample) - 1
t_score = stats.t.ppf((1 + confidence_level) / 2, degree_of_freedom)
margin_of_error = t_score * standard_error
# Construct the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)
print(f"The {confidence_level*100}% Confidence Interval for the population mean is: {confidence_interval}")
The 95.0% Confidence Interval for the population mean is: (49.294074104983, 51.30400605036188)
This is a very basic example. Depending on the specifics of your data and problem, you may need to adjust this code to suit your needs. For example, if your sample size is large enough, you may want to use a normal distribution instead of a t-distribution.