2022-04-04

Interval Estimation of Population Proportion

Interval Estimation of Population Proportion

In statistical inference, the objective is often to make statements or predictions about the properties of a larger population based on observations in a sample drawn from that population. One such property that we may be interested in is the population proportion.

The population proportion, denoted as $p$ , is a measure of a particular characteristic shared by members of the population. For instance, if we are looking at the population of voters in a country, $p$ might represent the proportion of voters who support a particular candidate.

An interval estimate provides a range of values within which the parameter is estimated to lie. Unlike a point estimate, which gives a single most likely value for the parameter, an interval estimate provides a range of values which might contain the parameter. This range is often presented along with a level of confidence, which quantifies the degree of certainty that the parameter lies within the given range.

When it comes to interval estimation of population proportion, the goal is to determine a range (or interval) of values within which the true population proportion is likely to fall. This is done using sample data and statistical methods that account for sampling variability.

Steps for Interval Estimation of Population Proportion

We will discuss the steps involved in interval estimation of population proportion.

Calculation of Estimator from Sample

The first step in interval estimation is to calculate the estimator from the sample. The estimator, often denoted by $\hat{p}$ (read as "p-hat"), is simply the proportion of the sample that possesses the characteristic of interest. Mathematically, it is calculated as:

\hat{p} = \frac{X}{n}

where $X$ is the number of success (individuals with the desired characteristic) in the sample, and $n$ is the total sample size.

Setting Confidence Interval

The second step is to set the desired confidence level. This confidence level reflects our level of certainty that the true population proportion falls within our estimated interval. It's important to note that the confidence level also determines the Z-score used in the calculation of the confidence interval. Commonly used confidence levels include 90%, 95%, and 99%, which correspond to Z-scores of approximately 1.645, 1.96, and 2.576, respectively.

Consideration of Sample Distribution of Estimator

A key aspect of interval estimation involves understanding the distribution of our estimator. Depending on the nature of the data, different distributions might be applicable:

Binomial Distribution

If we were to take a large number of samples of the same size from a population and calculate $\hat{p}$ for each, the resulting values would form a sampling distribution. Under certain conditions, this sampling distribution follows a binomial distribution.

The probability mass function of a binomial distribution is given by:

P(X=k) = C(n,k) * p^k * (1-p)^(n-k)

where $P(X=k)$ is the probability of $k$ successes in $n$ trials, $C(n,k)$ is the binomial coefficient, and $p$ is the probability of success on an individual trial.

Approximation to Normal Distribution

In practice, the binomial distribution can be unwieldy, especially for large sample sizes. However, when the sample size is large enough (usually, when both $np\hat{p}$ and $n(1 - \hat{p})$ are greater than 5), the Central Limit Theorem allows us to approximate the binomial distribution with a normal distribution.

Calculation of Interval

Finally, we calculate the interval estimate itself. For a population proportion, this is typically expressed in the form of $\hat{p} \pm Z \ast SE(\hat{p})$ , where $Z$ is the Z-score that corresponds to the desired level of confidence, $\hat{p}$ is the sample proportion, and $SE(\hat{p})$ is the standard error of the proportion.

The standard error can be calculated as:

SE(\hat{p}) = \sqrt{\frac{\hat{p} \ast (1 - \hat{p})}{n}}

So, the confidence interval for the population proportion is:

CI = \hat{p} \pm Z \ast \sqrt{\frac{\hat{p} \ast (1 - \hat{p})}{n}}

This is the final result of the interval estimation for population proportion process. It gives us a range of values that, with a certain level of confidence, contains the true population proportion.

Practical Example

To cement our understanding of the interval estimation of a ratio, let's delve into practical example.

Let's consider a clinical trial where 500 out of 1000 patients recovered after treatment (a success), while the rest did not (a failure). We want to estimate the success proportion in the broader population.

Interval Estimation of Population Proportion

Interval Estimation of Population Proportion

Steps for Interval Estimation of Population Proportion

Calculation of Estimator from Sample

Setting Confidence Interval

Consideration of Sample Distribution of Estimator

Binomial Distribution

Approximation to Normal Distribution

Calculation of Interval

Practical Example

Python Code

α error and β error

Interval Estimation of Population Mean

Ryusei Kakujo