2022-11-25

How to determine sample size

What is sample size

Sample size is the total number of data in a sample drawn from a population. The sample size should be determined so that it adequately represents the actual condition of the population at a minimum cost. If the sample size is too small, the data will be biased and the population will not be adequately represented. On the other hand, if the sample size is too large, survey costs and time will increase. Thus, the sample size determination is very important in terms of cost and ensuring the accuracy of the survey.

How to determine sample size

To determine the sample size, the following must be set:

  • Margin of error (confidence interval)
  • Confidence level

Margin of error (confidence interval)

Margin of error is a measure of how much difference is acceptable between the sample mean and the population mean. It is sometimes also called a confidence interval. The larger the margin of error, the greater the likelihood that the deviation from the population reality will be.

Confidence level

Confidence level is an indicator of the probability that the results will be within the margin of error. Typically, confidence levels are set at 90%, 95%, or 99%. For example, a confidence level of 99% means that there is a 99% probability that the results will match the results obtained from the population if the survey is repeated many times.

Calculate sample size

When the standard deviation of the population is \sigma, the standard deviation of the sample mean is the following from central limit theorem:

\frac{\sigma}{\sqrt{n}}

In this case, the confidence level is 95%. The z-value at 95% confidence level is 1.96.

Confidence level z-value
80% 1.28
85% 1.44
90% 1.65
95% 1.96
99% 2.58

Thus, the 95% confidence interval is as follows:

CI = 1.96 * \frac{\sigma}{\sqrt{n}}

Transposing n, the formula for the sample size n becomes the following:

n = (\frac{1.96 * \sigma}{CI})^2

As an example, let us calculate the sample size required when we want to estimate the population mean by measuring the size of a certain product. Assuming a standard deviation of \sigma = 6mm and the confidence interval of 2mm, the sample size would be

n = (\frac{1.96 * 6}{2})^2 = 34.6

We can see that 35 samples are needed for interval estimation at the 95% confidence level and 2mm error.

Also, if we want to estimate the ratio of the population mean, the standard deviation of the ratio is as follows when the ratio is p.

\sqrt{p(1-p)}

Errors in sample proportions are as follows:

\sqrt{\frac{p(1-p)}{n}}

Therefore, the formula for finding the sample size n is as follows:

n = (\frac{1.96 * \sqrt{p(1-p)}}{CI})^2

As an example, let us calculate the sample size required to obtain a 95% probability of a 10% penetration rate error in a town with a PC penetration rate of 60%.

n = (\frac{1.96 * \sqrt{0.6(1-0.6)}}{0.1})^2 = 68.5

We see that 69 samples are needed for interval estimation at the 95% confidence level and 10% margin of error.

References

https://www.qualtrics.com/au/experience-management/research/determine-sample-size/?rid=ip&prevsite=en&newsite=au&geo=JP&geomatch=au
https://www.geopoll.com/blog/sample-size-research/

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!