2022-04-04
Interval Estimation of Population Proportion
Interval Estimation of Population Proportion
In statistical inference, the objective is often to make statements or predictions about the properties of a larger population based on observations in a sample drawn from that population. One such property that we may be interested in is the population proportion.
The population proportion, denoted as
An interval estimate provides a range of values within which the parameter is estimated to lie. Unlike a point estimate, which gives a single most likely value for the parameter, an interval estimate provides a range of values which might contain the parameter. This range is often presented along with a level of confidence, which quantifies the degree of certainty that the parameter lies within the given range.
When it comes to interval estimation of population proportion, the goal is to determine a range (or interval) of values within which the true population proportion is likely to fall. This is done using sample data and statistical methods that account for sampling variability.
Steps for Interval Estimation of Population Proportion
We will discuss the steps involved in interval estimation of population proportion.
Calculation of Estimator from Sample
The first step in interval estimation is to calculate the estimator from the sample. The estimator, often denoted by
where
Setting Confidence Interval
The second step is to set the desired confidence level. This confidence level reflects our level of certainty that the true population proportion falls within our estimated interval. It's important to note that the confidence level also determines the Z-score used in the calculation of the confidence interval. Commonly used confidence levels include 90%, 95%, and 99%, which correspond to Z-scores of approximately 1.645, 1.96, and 2.576, respectively.
Consideration of Sample Distribution of Estimator
A key aspect of interval estimation involves understanding the distribution of our estimator. Depending on the nature of the data, different distributions might be applicable:
Binomial Distribution
If we were to take a large number of samples of the same size from a population and calculate
The probability mass function of a binomial distribution is given by:
where
Approximation to Normal Distribution
In practice, the binomial distribution can be unwieldy, especially for large sample sizes. However, when the sample size is large enough (usually, when both
Calculation of Interval
Finally, we calculate the interval estimate itself. For a population proportion, this is typically expressed in the form of
The standard error can be calculated as:
So, the confidence interval for the population proportion is:
This is the final result of the interval estimation for population proportion process. It gives us a range of values that, with a certain level of confidence, contains the true population proportion.
Practical Example
To cement our understanding of the interval estimation of a ratio, let's delve into practical example.
Let's consider a clinical trial where 500 out of 1000 patients recovered after treatment (a success), while the rest did not (a failure). We want to estimate the success proportion in the broader population.
1 Calculation of Estimator from Sample
The proportion of sample success is calculated as follows:
- Setting Confidence Interval
We set a 95% confidence level.
- Consideration of Sample Distribution of Estimator
Since the data is binary (success/failure), a binomial distribution applies. However, given the large sample size, we can use a normal approximation.
- Calculation of Interval
In this case, the standard error for the binomial proportion can be calculated using the formula:
where
Our confidence interval is thus calculated as follows:
We can be 95% confident that the true success proportion in the population lies within this interval.
Python Code
Here is the Python code.
import numpy as np
from scipy.stats import norm
# Sample data
successes = 500
failures = 500
total = successes + failures
# Step 1: Calculation of Estimator from Sample
estimator = successes / total
# Step 3: Consideration of Sample Distribution of Estimator
# Calculation of standard error
p = successes / total
se = np.sqrt(p * (1 - p) / total)
# Step 4: Calculation of Interval
z_value = norm.ppf(0.975) # For 95% confidence level
confidence_interval = (estimator - z_value*se, estimator + z_value*se)
print(f"95% confidence interval for the success proportion is: {confidence_interval}")
# If we use binomial distribution
from scipy.stats import binom
ci_lower, ci_upper = binom.interval(0.95, n=total, p=estimator)
print(f"95% confidence interval for the success proportion is: ({ci_lower/n}, {ci_upper/n})")
95% confidence interval for the success proportion is: (0.4690102483847719, 0.5309897516152281)
95% confidence interval for the success proportion is: (0.469, 0.531)