What is i.i.d.
i.i.d. stands for Independent and Identically Distributed, meaning that the random variables
Suppose you toss a coin 10 times and get 6 heads and 4 tails. Then, let us assume that the eleventh coin toss is performed. In this case, the probability of getting a "heads" or "tails" is
It is important to note that i.i.d. does not mean equal probability; it does not mean that two random variables must each have a probability of
As an example that does not follow i.i.d., here is an example of playing cards: Suppose that one card is drawn from a deck of 52 playing cards and it is the ace of hearts. In this case, the probability of drawing an ace is
The i.i,d. assumption is often used in statistical processing, hypothesis testing, and machine learning because it does not require consideration of correlations (such as covariance) and is very computationally tractable.
Identically distributed
Identically distributed means that there is no overall trend, i.e., there is no variation in the distribution, and all items in the sample are drawn from the same probability distribution.
For example, if the strength of a product is measured and the mean of the strength is higher when more samples are collected, it is difficult to draw conclusions about the strength. The average of a product's strength would be dependent on the timing of the measurement. To evaluate measurements that tend to trend over time, it is necessary to perform a time series analysis, for example.
When comparing groups in an analysis, the means, ratios, and other characteristics of the groups can be different, but each group must have the same distribution.
How to check for i.i.d.
To know if the data are independent and identically distributed, check the data independence and trend.
Data Independence
To check for data independence, understand the data collection process. You will want to understand how the data were collected, whether you used random sampling or convenience sampling, and how the data were observed.
Data Trends
To identify trends in the data, graph the data in the order in which each item was measured, such as proportion, mean, and variability, and look for patterns. Look for problematic trends in the sample that suggest the data do not follow a single probability distribution.
References