What is i.i.d.

i.i.d. stands for Independent and Identically Distributed, meaning that the random variables X_1, X_2, \cdots, X_n are independent of each other and all follow the same probability distribution. Please note that there is no such probability distribution as i.i,d.

Suppose you toss a coin 10 times and get 6 heads and 4 tails. Then, let us assume that the eleventh coin toss is performed. In this case, the probability of getting a "heads" or "tails" is \frac{1}{2}, respectively, and is not affected by the results of the first 10 tosses. In other words, the outcomes of the first through the eleventh coin tosses are independent of each other and consistently have the same distribution. Therefore, we can say that this coin toss follows i.i.d.

It is important to note that i.i.d. does not mean equal probability; it does not mean that two random variables must each have a probability of \frac{1}{2}, or that four random variables must each have a probability of \frac{1}{4} to say it follows i.i.d.

As an example that does not follow i.i.d., here is an example of playing cards: Suppose that one card is drawn from a deck of 52 playing cards and it is the ace of hearts. In this case, the probability of drawing an ace is \frac{4}{52}. If we draw another card without putting back the ace of hearts we drew earlier, the probability of drawing an ace is \frac{3}{51}. Thus, the probabilities of drawing an ace are not independent of each other, nor do they have the same probability distribution, so we cannot say that they follow i.i.d.

The i.i,d. assumption is often used in statistical processing, hypothesis testing, and machine learning because it does not require consideration of correlations (such as covariance) and is very computationally tractable.

Identically distributed

Identically distributed means that there is no overall trend, i.e., there is no variation in the distribution, and all items in the sample are drawn from the same probability distribution.

For example, if the strength of a product is measured and the mean of the strength is higher when more samples are collected, it is difficult to draw conclusions about the strength. The average of a product's strength would be dependent on the timing of the measurement. To evaluate measurements that tend to trend over time, it is necessary to perform a time series analysis, for example.

When comparing groups in an analysis, the means, ratios, and other characteristics of the groups can be different, but each group must have the same distribution.

How to check for i.i.d.

To know if the data are independent and identically distributed, check the data independence and trend.

Data Independence

To check for data independence, understand the data collection process. You will want to understand how the data were collected, whether you used random sampling or convenience sampling, and how the data were observed.

To identify trends in the data, graph the data in the order in which each item was measured, such as proportion, mean, and variability, and look for patterns. Look for problematic trends in the sample that suggest the data do not follow a single probability distribution.

References

https://towardsdatascience.com/independent-and-identically-distributed-ce250ad1bfa8
https://statisticsbyjim.com/basics/independent-identically-distributed-data/

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!