2022-12-27

Ordered Logit Model

Statistics

Statistical Model

Discrete Choice Model

What is Ordered Logit Model

The Ordered Logit Model, also known as the Ordinal Logistic Regression or Proportional Odds Model, is a widely used statistical method for modeling ordinal dependent variables. In many research areas, such as social sciences, health, and business, the outcome of interest is ordinal, meaning that it has a natural order, but the distances between categories are not necessarily equal. The Ordered Logit Model serves as an ideal solution to analyze these ordinal variables, taking into account the unique features of the data.

The model allows researchers to examine the relationship between a set of predictor variables and an ordinal outcome, providing valuable insights into the factors affecting the likelihood of each outcome category. For instance, the Ordered Logit Model can be applied to understand how demographic, socioeconomic, and environmental factors influence the likelihood of developing different stages of a disease or predicting customer satisfaction levels based on product features and marketing strategies.

Assumptions and Requirements

Before applying the Ordered Logit Model to your data, it is crucial to ensure that the data meet the necessary assumptions and requirements. Violations of these assumptions may lead to biased or inconsistent results. In this chapter, I will discuss the four key assumptions of the Ordered Logit Model: the proportional odds assumption, ordinal nature of the dependent variable, independence of observations, and linearity of logits.

Proportional Odds Assumption

The proportional odds assumption, also known as the parallel lines assumption, is the central assumption of the Ordered Logit Model. It implies that the relationship between the predictor variables and the log odds of the outcome categories is the same across all categories. Mathematically, it can be expressed as:

\log\frac{P(Y \leq j | X)}{P(Y > j | X)} = \alpha_j - \beta X

Where:

$P(Y \leq j | X)$ represents the probability of the outcome $Y$ being in category $j$ or lower, given the predictor variables $X$ .
$P(Y > j | X)$ represents the probability of the outcome $Y$ being in a category higher than $j$ , given the predictor variables $X$ .
$\alpha_j$ is the threshold (or cut-point) for category $j$ .
$\beta$ is a vector of coefficients for the predictor variables $X$ .

The assumption implies that the $\beta$ coefficients are the same across all outcome categories, while the $\alpha_j$ thresholds differ.

Ordinal Nature of Dependent Variable

The Ordered Logit Model requires that the dependent variable is ordinal, meaning that it has a natural order but the distances between categories are not necessarily equal. Examples of ordinal variables include educational attainment, disease severity, and satisfaction levels. It is important to note that the model is not suitable for nominal variables or continuous variables without meaningful categories.

Independence of Observations

The independence of observations assumption states that each observation in the dataset should be independent of the others. This implies that there should be no underlying relationships or dependencies among observations, such as time-series or spatial correlations. Violations of this assumption can result in biased estimates and incorrect inferences.

Linearity of Logits

The Ordered Logit Model assumes that the relationship between the log odds of the ordinal dependent variable and the predictor variables is linear. This means that a one-unit increase in a predictor variable will have a constant effect on the log odds of the outcome categories, holding all other variables constant. It is important to assess the linearity assumption by visually examining scatterplots or residual plots, and, if necessary, transforming the predictor variables to achieve linearity.

Ordered Logit Model Estimation

In this chapter, I will discuss the estimation of the Ordered Logit Model. We will cover maximum likelihood estimation, interpretation of coefficients, and thresholds and cutoff points.

Maximum Likelihood Estimation

The Ordered Logit Model is estimated using the maximum likelihood method, which seeks to find the values of the coefficients that maximize the probability of observing the given data. The likelihood function for the Ordered Logit Model can be expressed as:

L(\beta, \alpha | Y, X) = \prod_{i=1}^{n} \prod_{j=1}^{J} \left[ F(\alpha_j - \beta X_i) - F(\alpha_{j-1} - \beta X_i) \right]^{I(Y_i = j)}

Where:

$L(\beta, \alpha | Y, X)$ is the likelihood function.
$\beta$ is a vector of coefficients for the predictor variables $X$ .
$\alpha$ is a vector of threshold parameters for each category $j$ .
$Y_i$ is the outcome of the $i$ th observation.
$X_i$ is the vector of predictor variables for the $i$ th observation.
$F(\cdot)$ is the cumulative distribution function of the logistic distribution.
$I(\cdot)$ is an indicator function, which equals 1 if the condition inside the parenthesis is true, and 0 otherwise.

The maximum likelihood estimates of the coefficients and thresholds can be obtained using optimization algorithms, such as the Newton-Raphson method or the expectation-maximization algorithm.

Interpretation of Coefficients

The coefficients in the Ordered Logit Model represent the effect of the predictor variables on the log odds of the ordinal outcome variable. A positive coefficient indicates that an increase in the predictor variable is associated with an increase in the log odds of a higher outcome category, while a negative coefficient implies that an increase in the predictor variable is associated with a decrease in the log odds of a higher outcome category.

To interpret the coefficients, we can calculate the odds ratio for each predictor variable, which represents the change in the odds of a higher outcome category given a one-unit increase in the predictor variable, holding all other variables constant:

\text{Odds Ratio} = e^{\beta}

It is important to note that the interpretation of the coefficients and odds ratios in the Ordered Logit Model is conditional on the proportional odds assumption.

Thresholds and Cutoff Points

The thresholds, or cutoff points, in the Ordered Logit Model represent the points at which the log odds of the ordinal outcome variable change between categories. Each threshold corresponds to a specific category and indicates the level of the linear predictor, $\beta X$ , at which the probability of the outcome variable being in that category or a lower one is equal to the probability of being in a higher category.

The estimated thresholds can be used to calculate the predicted probabilities of the outcome variable for a given set of predictor variables:

\hat{P}(Y = j | X) = F(\hat{\alpha}_j - \hat{\beta} X) - F(\hat{\alpha}_{j-1} - \hat{\beta} X)

Ordered Logit Model in R

In this chapter, I will demonstrate how to estimate an Ordered Logit Model using R. We will use the MASS package, which provides the polr() function for fitting proportional odds models.

Install and load the required package

First, we need to install the MASS and ordinal packages if they are not already installed, and then load them.

# Install packages if not already installed
if (!requireNamespace("MASS", quietly = TRUE)) {
  install.packages("MASS")
}

if (!requireNamespace("ordinal", quietly = TRUE)) {
  install.packages("ordinal")
}

# Load packages
library(MASS)
library(ordinal)

Load the wine dataset

Load the wine dataset.

wine <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", sep = ";")

# Convert the dependent variable to an ordered factor
wine$quality <- factor(wine$quality, ordered = TRUE)

Estimate the Ordered Logit Model

Now, we will estimate the Ordered Logit Model using the polr() function from the MASS package. We will model the relationship between wine quality and the explanatory variables.

# Estimate the Ordered Logit Model
ordered_logit_model <- polr(quality ~ ., data = wine)

# Display the model summary
summary(ordered_logit_model)

Call:
polr(formula = quality ~ ., data = wine)

Coefficients:
                          Value Std. Error    t value
fixed.acidity         2.314e-01  0.0382399     6.0519
volatile.acidity     -4.982e+00  0.3070888   -16.2231
citric.acid           1.238e-01  0.2425520     0.5105
residual.sugar        2.307e-01  0.0067782    34.0288
chlorides            -6.080e-01  1.3680802    -0.4444
free.sulfur.dioxide   1.193e-02  0.0022344     5.3394
total.sulfur.dioxide -9.073e-04  0.0009539    -0.9512
density              -4.623e+02  0.4622070 -1000.2657
pH                    2.068e+00  0.2125885     9.7296
sulphates             1.815e+00  0.2467479     7.3565
alcohol               4.299e-01  0.0314096    13.6865

Intercepts:
    Value      Std. Error t value
3|4  -451.8844     0.4703  -960.8466
4|5  -449.5243     0.4686  -959.3266
5|6  -446.4853     0.4727  -944.6359
6|7  -443.8967     0.4816  -921.7029
7|8  -441.6433     0.4908  -899.7778
8|9  -437.9633     0.6607  -662.8715

Residual Deviance: 10900.89
AIC: 10934.89

The output of the summary() function will display the estimated coefficients, standard errors, z-values, and p-values for each explanatory variable, as well as the estimated thresholds.

Coefficients
These represent the effect of each predictor variable on the log-odds of observing a higher wine quality rating, holding all other variables constant.
- For instance, volatile.acidity has a coefficient of -4.982. This means that an increase in volatile acidity is associated with a decrease in the log-odds of observing a higher wine quality rating, holding all other variables constant. This is expected, as higher volatile acidity is generally considered unfavorable for wine quality.
- alcohol has a coefficient of 0.430, indicating that an increase in alcohol content is associated with an increase in the log-odds of observing a higher wine quality rating, holding all other variables constant.
Intercepts
These are the estimated thresholds between adjacent quality rating categories. For example, the threshold between quality 3 and 4 is -451.8844. The thresholds are expressed on the same log-odds scale as the coefficients.

Interpret the results

To interpret the coefficients, we can calculate the odds ratios using the exp() function.

# Calculate the odds ratios
exp(ordered_logit_model$coefficients)

       fixed.acidity     volatile.acidity          citric.acid       residual.sugar
        1.260395e+00         6.860867e-03         1.131825e+00         1.259422e+00
           chlorides  free.sulfur.dioxide total.sulfur.dioxide              density
        5.444452e-01         1.012002e+00         9.990931e-01        1.631987e-201
                  pH            sulphates              alcohol
        7.912119e+00         6.142251e+00         1.537083e+00

Here's the interpretation of these odds ratios:

fixed.acidity
For a one-unit increase in fixed acidity, the odds of a higher wine quality rating increase by a factor of 1.260, holding all other variables constant.
volatile.acidity
For a one-unit increase in volatile acidity, the odds of a higher wine quality rating decrease by a factor of 0.0069, holding all other variables constant. This indicates that higher volatile acidity negatively impacts wine quality ratings.
citric.acid
For a one-unit increase in citric acid, the odds of a higher wine quality rating increase by a factor of 1.132, holding all other variables constant.
residual.sugar
For a one-unit increase in residual sugar, the odds of a higher wine quality rating increase by a factor of 1.259, holding all other variables constant.
chlorides
For a one-unit increase in chlorides, the odds of a higher wine quality rating decrease by a factor of 0.544, holding all other variables constant.
free.sulfur.dioxide
For a one-unit increase in free sulfur dioxide, the odds of a higher wine quality rating increase by a factor of 1.012, holding all other variables constant.
total.sulfur.dioxide
For a one-unit increase in total sulfur dioxide, the odds of a higher wine quality rating decrease by a factor of 0.999, holding all other variables constant.
density
For a one-unit increase in density, the odds of a higher wine quality rating decrease by a factor of 1.63e-201, holding all other variables constant. This value is extremely small and may be due to numerical issues or multicollinearity in the model.
pH
For a one-unit increase in pH, the odds of a higher wine quality rating increase by a factor of 7.912, holding all other variables constant.
sulphates
For a one-unit increase in sulphates, the odds of a higher wine quality rating increase by a factor of 6.142, holding all other variables constant.
alcohol
For a one-unit increase in alcohol, the odds of a higher wine quality rating increase by a factor of 1.537, holding all other variables constant.