2022-03-25

Variance and Standard Deviation

Introduction

In statistics, the concepts of mean deviation, variance, and standard deviation are important measures to describe the spread of data in a dataset. They help us understand how much the data points deviate from the mean, providing insights into the dispersion and variability of the data. In this article, I will discuss the definitions, examples, and equations for these measures and how to calculate them using Python.

Mean Deviation

Mean deviation, also known as the average deviation, is the average of the absolute differences between each data point and the mean of the dataset. The mean deviation is given by the formula:

MD = \frac{1}{n} \sum_{i=1}^n |x_i - \bar{x}|

where n is the number of data points, x_i represents each data point, and \bar{x} is the mean of the dataset.

Example

Consider the dataset: {4, 6, 8, 10}

Mean, \bar{x} = \frac{4+6+8+10}{4} = 7

Mean deviation, MD = \frac{|4-7| + |6-7| + |8-7| + |10-7|}{4} = \frac{3+1+1+3}{4} = 2

Variance

Variance is the average of the squared differences between each data point and the mean of the dataset. The variance is given by the formula:

\sigma^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2

where n is the number of data points, x_i represents each data point, and \bar{x} is the mean of the dataset.

Example

Using the same dataset as before: {4, 6, 8, 10}

Variance, \sigma^2 = \frac{(4-7)^2 + (6-7)^2 + (8-7)^2 + (10-7)^2}{4} = \frac{9+1+1+9}{4} = 5

Standard Deviation

Standard deviation is the square root of the variance, and it is a measure of the dispersion or spread of the data points in a dataset. The standard deviation is given by the formula:

\sigma = \sqrt{\frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2}

Example

Using the same dataset as before: {4, 6, 8, 10}

Standard deviation, \sigma = \sqrt{5} \approx 2.24

Relationship Between Variance and Standard Deviation

Variance and standard deviation are closely related statistical measures used to describe the spread or dispersion of a dataset. They both quantify the degree to which the individual data points deviate from the mean of the dataset. While variance is the average of the squared differences between the data points and the mean, standard deviation is the square root of variance.

The relationship between variance and standard deviation can be expressed mathematically as:

\sigma = \sqrt{\sigma^2}

where \sigma represents the standard deviation and \sigma^2 represents the variance.

This relationship implies that the standard deviation is always a non-negative value, as it is the square root of a non-negative value (variance). Also, both variance and standard deviation have the same unit as the data points when squared and square rooted, respectively. This means that standard deviation has the same unit as the original data, making it easier to interpret in the context of the dataset.

Calculating Variance and Standard Deviation with Python

To calculate variance and standard deviation using Python, you can use the following code:

python
import numpy as np

data = np.array([4, 6, 8, 10])

# Calculate variance
variance = np.var(data)

# Calculate standard deviation
std_dev = np.std(data)

print("Variance:", variance)
print("Standard Deviation:", std_dev)

Running this code will output the following:

Variance: 5.0
Standard Deviation: 2.23606797749979

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!