Introduction
Decision trees employ various algorithms to determine the optimal division of a node into two or more sub-nodes. The formation of these sub-nodes enhances their homogeneity. Essentially, this means that the purity of a node in relation to the target variable increases. The decision tree examines all possible variables for splitting and chooses the one that produces the most homogeneous sub-nodes.
In this article, I will introduces five key decision tree algorithms: ID3, C4.5, CART, CHAID, and MARS.
ID3 Algorithm
The Iterative Dichotomiser 3 (ID3) algorithm is one of the earliest and most well-known decision tree algorithms. It is primarily used for classification tasks with categorical features. The ID3 algorithm operates by selecting the best attribute for splitting the dataset based on the Information Gain criterion, which measures the reduction in entropy achieved by partitioning the data. The algorithm is recursive, continually splitting the dataset until all instances belong to a single class or a stopping criterion is met. The Information Gain (IG) is calculated as follows:
where
C4.5 Algorithm
C4.5 is an extension of the ID3 algorithm. It addresses some of the limitations of ID3, such as its inability to handle continuous attributes and missing data. The C4.5 algorithm uses the Gain Ratio criterion, a modification of Information Gain, to choose the best attribute for splitting. The Gain Ratio (GR) is defined as:
where
CART Algorithm
Classification and Regression Trees (CART) is another popular decision tree algorithm that can handle both classification and regression tasks. Unlike ID3 and C4.5, CART produces binary trees, with each internal node having exactly two child nodes. The CART algorithm uses the Gini Impurity criterion for classification tasks and Mean Squared Error (MSE) for regression tasks to select the best attribute for splitting. The Gini Impurity (GI) is calculated as follows:
For regression tasks, the MSE is defined as:
where
CHAID Algorithm
The Chi-squared Automatic Interaction Detector (CHAID) algorithm is a decision tree algorithm that relies on the Chi-squared test for independence to determine the best attribute for splitting. CHAID is particularly useful for exploring the interactions between categorical variables and can create multiway trees, with each internal node having more than two child nodes. The algorithm is also capable of handling missing data through surrogate splits, which are alternative splitting rules based on the remaining attributes.
The Chi-squared statistic is computed as:
where
MARS Algorithm
Multivariate Adaptive Regression Splines (MARS) is a decision tree-based algorithm developed by Jerome H. Friedman. MARS is primarily used for regression tasks and is designed to handle high-dimensional and noisy data. It builds a model by fitting piecewise linear regression functions called basis functions to the data. MARS incorporates techniques like forward selection, backward elimination, and knot selection to create an interpretable, flexible, and accurate model. While MARS is not a traditional decision tree algorithm, it shares some similarities in its structure and approach to modeling complex relationships.
The MARS model can be represented as:
where
where
where