2022-04-04

Interval Estimation of Population Mean

Interval Estimation of Population Mean

Estimating the population mean is a fundamental question in statistics. In many scenarios, we have a sample of data collected from a larger population, and we are interested in making inferences about the population parameter like mean.

In this article, I will introduce the steps to estimate population mean.

Steps to Construct Confidence Intervals

Identify Sample Statistic

Since we are trying to estimate a population mean, we choose the sample mean (denoted as \bar{x}) as the sample statistic.

Select Confidence Level

The confidence level is defined for us in the problem. Common choices are 90%, 95%, or 99% confidence.

Find Standard Error

The standard error (SE) is a measure of statistical accuracy. It describes the typical error or the deviation of a statistic (like the mean) from the actual (but often unknown) population parameter. The standard error plays a critical role in the construction of confidence intervals.

If we had the population standard deviation (\sigma), we could calculate the standard error of the mean as:

SE = \frac{\sigma}{\sqrt{n}}

However, in most practical cases, \sigma is not known, and we estimate it using the sample standard deviation (s).

SE = \frac{s}{\sqrt{n}}

If we use the sample standard deviation, the sampling distribution of the sample mean does not follow a normal distribution. Instead, it follows a t-distribution. The t-distribution is similar to the normal distribution but has heavier tails. This makes it more conservative (gives wider confidence intervals) when the sample size is small. As the sample size gets larger, the t-distribution gets closer to the normal distribution.

Construct Confidence Interval

Once we have calculated the standard error, we can construct the confidence interval. A confidence interval provides an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.

The confidence interval can be expressed as follows:

CI = \bar{x} \pm Z \cdot SE

Python Implementation

Here's a simple example of how you do interval estimation of population mean using scipy.stats. We will generate a sample of size 100 from a normal distribution, then construct a 95% confidence interval for the mean.

python
import numpy as np
import scipy.stats as stats

# Generate a sample from a normal distribution
np.random.seed(0)  # for reproducibility
sample = np.random.normal(loc=50, scale=5, size=100)  # mean = 50, std_dev = 5, sample size = 100

# Calculate the sample mean and standard deviation
sample_mean = np.mean(sample)
sample_std_dev = np.std(sample, ddof=1)  # ddof=1 to use the unbiased estimator

# Choose the desired confidence level
confidence_level = 0.95

# Find the standard error
standard_error = sample_std_dev / np.sqrt(len(sample))

# Calculate the margin of error. Here we use t-distribution
degree_of_freedom = len(sample) - 1
t_score = stats.t.ppf((1 + confidence_level) / 2, degree_of_freedom)
margin_of_error = t_score * standard_error

# Construct the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

print(f"The {confidence_level*100}% Confidence Interval for the population mean is: {confidence_interval}")
The 95.0% Confidence Interval for the population mean is: (49.294074104983, 51.30400605036188)

This is a very basic example. Depending on the specifics of your data and problem, you may need to adjust this code to suit your needs. For example, if your sample size is large enough, you may want to use a normal distribution instead of a t-distribution.

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!