2022-05-24

Regression Metrics

Introduction

Performance metrics are essential for evaluating and comparing machine learning models. They provide an objective way to determine the effectiveness of a model in predicting outcomes based on input data. These metrics not only allow us to identify the strengths and weaknesses of different algorithms but also guide us in choosing the most suitable model for a specific task. Furthermore, performance metrics help in model selection, hyperparameter tuning, and diagnosing potential issues in the training process.

Machine learning problems can be broadly classified into two categories: regression and classification. Regression problems involve predicting continuous values, while classification problems involve predicting discrete labels or categories.

The performance metrics for regression and classification problems differ because of the nature of their respective predictions. Regression metrics focus on the difference between the predicted and actual values, while classification metrics assess how well the model can correctly classify the input data into predefined categories.

In this article, I will show the common performance metrics for regression problem.

Regression Metrics

Regression problems involve predicting continuous values based on input data. In this chapter, I will discuss the most commonly used performance metrics for regression tasks and how they can help evaluate the effectiveness of machine learning models.

Mean Absolute Error (MAE)

Mean Absolute Error is a simple metric that calculates the average of absolute differences between the predicted values and the actual values. MAE gives an idea of how far the predictions are from the actual values, with a lower MAE indicating better performance. The equation for MAE is:

MAE = \frac{1}{n}\sum_{i=1}^{n} |y_i - \hat{y}_i|\

where y_i represents the actual values, \hat{y}_i represents the predicted values, and
n is the number of samples.

Mean Squared Error (MSE)

Mean Squared Error measures the average of the squared differences between the predicted and actual values. By squaring the errors, MSE penalizes larger deviations more severely, making it more sensitive to outliers than MAE. The equation for MSE is:

MSE = \frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2

Root Mean Squared Error (RMSE)

Root Mean Squared Error is the square root of the MSE. It provides an estimate of the average error in the same units as the predicted and actual values, making it easier to interpret. The equation for RMSE is:

RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}

R-squared

R-squared, also known as the coefficient of determination, measures the proportion of the variance in the dependent variable that can be predicted from the independent variables. R-squared ranges from 0 to 1, with higher values indicating better model performance. The equation for R-squared is:

R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}

where \bar{y} is the mean of the actual values.

Adjusted R-squared

Adjusted R-squared is an extension of R-squared that takes into account the number of predictors in the model. It provides a more accurate measure of model performance, especially when there are multiple predictors. The equation for adjusted R-squared is:

\bar{R}^2 = 1 - \frac{(1 - R^2)(n - 1)}{n - p - 1}

where n is the number of samples, p is the number of predictors, and R^2 is the R-squared value.

Mean Absolute Percentage Error (MAPE)

Mean Absolute Percentage Error calculates the average of the absolute percentage differences between the predicted and actual values. MAPE is useful when comparing errors across different scales or units. The equation for MAPE is:

MAPE = \frac{1}{n}\sum_{i=1}^{n} \left|\frac{y_i - \hat{y}_i}{y_i}\right| \times 100\%

Median Absolute Deviation (MAD)

Median Absolute Deviation is a robust metric that calculates the median of the absolute differences between the predicted and actual values. MAD is less sensitive to outliers than MAE, making it a useful alternative in cases where the data contains extreme values. The equation for MAD is:

MAD = \text{median}(|y_1 - \hat{y}_1|, |y_2 - \hat{y}_2|, \dots, |y_n - \hat{y}_n|)

References

https://neptune.ai/blog/performance-metrics-in-machine-learning-complete-guide
https://www.javatpoint.com/performance-metrics-in-machine-learning
https://www.altexsoft.com/blog/machine-learning-metrics/

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!