2022-04-04

Interval Estimation of Population Proportion

Interval Estimation of Population Proportion

In statistical inference, the objective is often to make statements or predictions about the properties of a larger population based on observations in a sample drawn from that population. One such property that we may be interested in is the population proportion.

The population proportion, denoted as p, is a measure of a particular characteristic shared by members of the population. For instance, if we are looking at the population of voters in a country, p might represent the proportion of voters who support a particular candidate.

An interval estimate provides a range of values within which the parameter is estimated to lie. Unlike a point estimate, which gives a single most likely value for the parameter, an interval estimate provides a range of values which might contain the parameter. This range is often presented along with a level of confidence, which quantifies the degree of certainty that the parameter lies within the given range.

When it comes to interval estimation of population proportion, the goal is to determine a range (or interval) of values within which the true population proportion is likely to fall. This is done using sample data and statistical methods that account for sampling variability.

Steps for Interval Estimation of Population Proportion

We will discuss the steps involved in interval estimation of population proportion.

Calculation of Estimator from Sample

The first step in interval estimation is to calculate the estimator from the sample. The estimator, often denoted by \hat{p} (read as "p-hat"), is simply the proportion of the sample that possesses the characteristic of interest. Mathematically, it is calculated as:

\hat{p} = \frac{X}{n}

where X is the number of success (individuals with the desired characteristic) in the sample, and n is the total sample size.

Setting Confidence Interval

The second step is to set the desired confidence level. This confidence level reflects our level of certainty that the true population proportion falls within our estimated interval. It's important to note that the confidence level also determines the Z-score used in the calculation of the confidence interval. Commonly used confidence levels include 90%, 95%, and 99%, which correspond to Z-scores of approximately 1.645, 1.96, and 2.576, respectively.

Consideration of Sample Distribution of Estimator

A key aspect of interval estimation involves understanding the distribution of our estimator. Depending on the nature of the data, different distributions might be applicable:

Binomial Distribution

If we were to take a large number of samples of the same size from a population and calculate \hat{p} for each, the resulting values would form a sampling distribution. Under certain conditions, this sampling distribution follows a binomial distribution.

The probability mass function of a binomial distribution is given by:

P(X=k) = C(n,k) * p^k * (1-p)^(n-k)

where P(X=k) is the probability of k successes in n trials, C(n,k) is the binomial coefficient, and p is the probability of success on an individual trial.

Approximation to Normal Distribution

In practice, the binomial distribution can be unwieldy, especially for large sample sizes. However, when the sample size is large enough (usually, when both np\hat{p} and n(1 - \hat{p}) are greater than 5), the Central Limit Theorem allows us to approximate the binomial distribution with a normal distribution.

Calculation of Interval

Finally, we calculate the interval estimate itself. For a population proportion, this is typically expressed in the form of \hat{p} \pm Z \ast SE(\hat{p}), where Z is the Z-score that corresponds to the desired level of confidence, \hat{p} is the sample proportion, and SE(\hat{p}) is the standard error of the proportion.

The standard error can be calculated as:

SE(\hat{p}) = \sqrt{\frac{\hat{p} \ast (1 - \hat{p})}{n}}

So, the confidence interval for the population proportion is:

CI = \hat{p} \pm Z \ast \sqrt{\frac{\hat{p} \ast (1 - \hat{p})}{n}}

This is the final result of the interval estimation for population proportion process. It gives us a range of values that, with a certain level of confidence, contains the true population proportion.

Practical Example

To cement our understanding of the interval estimation of a ratio, let's delve into practical example.

Let's consider a clinical trial where 500 out of 1000 patients recovered after treatment (a success), while the rest did not (a failure). We want to estimate the success proportion in the broader population.

1 Calculation of Estimator from Sample

The proportion of sample success is calculated as follows:

\text{Proportion} = \frac{\text{Number of Successes}}{\text{Total sample size}} = \frac{500}{1000} = 0.5
  1. Setting Confidence Interval

We set a 95% confidence level.

  1. Consideration of Sample Distribution of Estimator

Since the data is binary (success/failure), a binomial distribution applies. However, given the large sample size, we can use a normal approximation.

  1. Calculation of Interval

In this case, the standard error for the binomial proportion can be calculated using the formula:

SE = \sqrt{\frac{p(1-p)}{n}}

where p is the sample proportion of successes (0.5) and n is the sample size (1000). This gives us a standard error of approximately 0.0158.

Our confidence interval is thus calculated as follows:

CI = 1 \pm 1.96 \times 0.0158 = (0.469, 0.531)

We can be 95% confident that the true success proportion in the population lies within this interval.

Python Code

Here is the Python code.

python
import numpy as np
from scipy.stats import norm

# Sample data
successes = 500
failures = 500
total = successes + failures

# Step 1: Calculation of Estimator from Sample
estimator = successes / total

# Step 3: Consideration of Sample Distribution of Estimator
# Calculation of standard error
p = successes / total
se = np.sqrt(p * (1 - p) / total)

# Step 4: Calculation of Interval
z_value = norm.ppf(0.975)  # For 95% confidence level
confidence_interval = (estimator - z_value*se, estimator + z_value*se)

print(f"95% confidence interval for the success proportion is: {confidence_interval}")

# If we use binomial distribution
from scipy.stats import binom
ci_lower, ci_upper = binom.interval(0.95, n=total, p=estimator)
print(f"95% confidence interval for the success proportion is: ({ci_lower/n}, {ci_upper/n})")
95% confidence interval for the success proportion is: (0.4690102483847719, 0.5309897516152281)
95% confidence interval for the success proportion is: (0.469, 0.531)

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!