2022-12-29

Mixed Logit Model

Statistics

Statistical Model

Discrete Choice Model

What is Mixed Logit Model

Mixed logit models, also known as random parameters logit models or random coefficients logit models, introduce flexibility and capture unobserved heterogeneity in choice behavior by allowing for random variation in individual preferences.

Random Coefficients

In traditional logit models, coefficients representing the effects of explanatory variables on choice probabilities are assumed to be fixed across the population. However, this assumption can be restrictive, as it does not allow for the possibility that different individuals have different sensitivities to the attributes of the alternatives.

Mixed logit models overcome this limitation by treating these coefficients as random variables. By specifying a distribution for each random coefficient, we can capture the variation in preferences across the population. Commonly used distributions include the normal, log-normal, and uniform distributions. These distributions can be chosen based on theoretical considerations, or they can be selected through model comparison and testing.

Error Components

Error components represent additional sources of unobserved heterogeneity in a mixed logit model. They can be thought of as unobserved factors that influence the utility of alternatives in a systematic way, such as unmeasured product characteristics or personal preferences. By including error components in the model, we can account for correlations among alternatives and capture similarities in the choice behavior of individuals.

Error components can be specified in various ways, depending on the context and the type of correlations we want to capture. For example, we can include a common error component for alternatives that belong to the same group or category, or we can model the error components as correlated random variables to capture the dependence structure among alternatives.

Benefits of Mixed Logit Models

Mixed logit models offer several advantages over traditional logit models:

They account for unobserved preference heterogeneity, leading to more accurate representations of individual preferences and choice behavior.
They can capture correlations among alternatives, allowing for more realistic modeling of substitution patterns.
They are more flexible in terms of functional form and can accommodate a wider range of utility specifications.
They can better handle the presence of panel data and repeated choices, which are common in many choice modeling applications.

Practical Applications of Mixed Logit Models

Mixed logit models have found widespread use in various fields due to their ability to account for unobserved preference heterogeneity and capture correlations among alternatives. This chapter showcases some of the most common applications of mixed logit models, highlighting their practical value and versatility in addressing real-world problems.

Market Research and Consumer Choice Analysis

In market research, mixed logit models are employed to understand and predict consumer preferences for different products and services. By incorporating random coefficients and error components, these models can capture the variation in individual preferences and account for unobserved factors that influence choice behavior. Applications in this domain include:

Brand choice modeling
Estimating the effects of product attributes, marketing mix variables, and consumer characteristics on brand preferences.
Conjoint analysis
Analyzing the trade-offs that consumers make when choosing among products with multiple attributes.
New product design
Identifying the optimal combination of product features and pricing strategies that maximize consumer satisfaction and market share.

Transportation Planning and Policy Analysis

Mixed logit models are extensively used in transportation planning to model individual travel behavior and inform policy decisions. These models can help planners understand how various factors, such as travel time, cost, and mode attributes, affect travel mode choice and route selection. Some applications in transportation planning include:

Mode choice modeling
Estimating the probabilities of individuals choosing different modes of transportation, such as car, bus, or train.
Route choice modeling: Predicting the likelihood of travelers selecting specific routes or paths based on travel time, congestion, and other route characteristics.
Policy evaluation
Assessing the potential impact of transportation policies, such as congestion pricing or transit subsidies, on travel behavior and mode choice.

Health Economics and Medical Decision-Making

In health economics, mixed logit models have been used to analyze individual preferences for health care services, insurance plans, and treatment options. By accommodating preference heterogeneity, these models can provide insights into the factors that drive health-related choices and inform policy interventions. Applications in health economics include:

Health care demand modeling
Estimating the effects of price, quality, and accessibility on the choice of health care providers or services.
Health insurance choice modeling
Analyzing the preferences of individuals for different insurance plans and coverage options.
Patient decision-making
Understanding the trade-offs that patients make when selecting medical treatments or interventions, based on factors such as risk, efficacy, and side effects.

Implementing Mixed Logit Model with Train Dataset

In this chapter, I will implement a mixed logit model using the Train dataset from the mlogit package.

Prepare the Data

First, we need to load the mlogit library and the Train dataset.

library("mlogit")
data("Train", package = "mlogit")

We will prepare the data by creating a choiceid variable, rescaling price and time, and converting the dataset to a suitable format for mlogit.

# Create a new variable choiceid in the Train dataset.
# The choiceid variable assigns a unique identifier
# (running from 1 to the number of rows in the Train dataset) to each observation.
Train$choiceid <- 1:nrow(Train)

# Reshape the dataset to long format and convert to mlogit format
Tr <- dfidx(Train, choice = "choice", varying = 4:11, sep = "_",
            opposite = c("price", "comfort", "time", "change"),
            idx = list(c("choiceid", "id")), idnames = c("chid", "alt"))

# Rescale price to Euro and time to hours
Tr$price <- Tr$price / 100 * 2.20371
Tr$time <- Tr$time / 60

Estimate the Mixed Logit Model

Now, we will estimate the mixed logit model with the following variables: price, time, change, and comfort. We will include random coefficients for time, change, and comfort to account for individual-specific preference heterogeneity. The formula can be written as follows:

choice ~ price + time + change + comfort | - 1

The part before the | represents the systematic utility component, while the - 1 after the | indicates that there are no alternative-specific constants in the model.

With the model formula specified, we can now fit the mixed logit model using the mlogit() function. We need to provide several arguments to the function:

formula: The model formula specified earlier.
data: The prepared Train dataset (Tr) in mlogit format.
panel: Set to TRUE to account for the panel structure of the data.
rpar: A vector specifying the random coefficients' distribution (in our case, normal distribution "n" for time, change, and comfort).
R: The number of Halton draws for simulating the random coefficients (100 draws in our case).
correlation: Set to FALSE to assume no correlation between random coefficients.
halton: Set to NA to use the default Halton sequence for simulation.
method: Set to "bhhh" to use the Berndt, Hall, Hall, and Hausman optimization algorithm.

The code for fitting the mixed logit model is:

Train.mxlu <- mlogit(
  choice ~ price + time + change + comfort | - 1,
  Tr,
  panel = TRUE,
  rpar = c(time = "n", change = "n", comfort = "n"),
  R = 100,
  correlation = FALSE,
  halton = NA,
  method = "bhhh"
)

Interpret the Results

Let's interpret the results by printing the summary of the model.

summary(Train.mxlu)

Call:
mlogit(formula = choice ~ price + time + change + comfort | -1,
    data = Tr, rpar = c(time = "n", change = "n", comfort = "n"),
    R = 100, correlation = FALSE, halton = NA, panel = TRUE,
    method = "bhhh")

Frequencies of alternatives:choice
      A       B
0.50324 0.49676

bhhh method
44 iterations, 0h:0m:6s
g'(-H)^-1g = 8.58E-07
gradient close to zero

Coefficients :
            Estimate Std. Error z-value  Pr(>|z|)
price      0.1373518  0.0061272 22.4166 < 2.2e-16 ***
time       4.3084957  0.2917274 14.7689 < 2.2e-16 ***
change     0.8879947  0.0956106  9.2876 < 2.2e-16 ***
comfort    2.4534514  0.1428630 17.1735 < 2.2e-16 ***
sd.time    4.9079488  0.3624716 13.5402 < 2.2e-16 ***
sd.change  1.6382549  0.1367053 11.9838 < 2.2e-16 ***
sd.comfort 2.4009629  0.1580942 15.1869 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -1551.4

random coefficients
        Min.    1st Qu.    Median      Mean  3rd Qu. Max.
time    -Inf  0.9981346 4.3084957 4.3084957 7.618857  Inf
change  -Inf -0.2169914 0.8879947 0.8879947 1.992981  Inf
comfort -Inf  0.8340265 2.4534514 2.4534514 4.072876  Inf

The output shows the estimated coefficients, standard errors, z-values, and p-values for each variable. All variables are statistically significant at the 0.001 level. The positive coefficients for time, change, and comfort indicate that passengers prefer shorter travel times, fewer changes, and more comfortable trains.

For price, the intuition is that the coefficient should be negative, but the estimated coefficient is slightly positive.

The random coefficients (standard deviations) for time, change, and comfort are also statistically significant, suggesting that preferences for these variables vary across individuals.

Willingness to Pay (WTP)

Willingness to pay (WTP) is an important concept in discrete choice models, as it provides a measure of how much monetary value passengers place on specific attributes. To calculate the WTP for time, change, and comfort, we will divide their respective coefficients by the coefficient of price:

coef(Train.mxlu)[2:4] / coef(Train.mxlu)[1]

     time    change   comfort
31.368325  6.465112 17.862536

The WTP values provide an estimate of the monetary value passengers place on the attributes of time, change, and comfort:

Time
Passengers are willing to pay approximately 31.37 units of currency to reduce travel time by one unit (e.g., one hour). This value indicates the importance of travel time as a factor in train choice.
Change
Passengers are willing to pay approximately 6.47 units of currency to reduce the number of changes by one. This suggests that passengers prefer direct trains and are willing to pay more for them.
Comfort
Passengers are willing to pay approximately 17.86 units of currency for a one-unit increase in comfort. This value shows that passengers value comfort when choosing a train and are willing to pay a premium for more comfortable trains.

References

Nested Logit Model

Causal Inference

Descriptive Statistics

Differential Equation

Dimensionality Reduction

Discrete Choice Model

Google Search Console

Hugging Face

Hypothesis Testing

Inferential Statistics

Probability Distribution

Ryusei Kakujo

Weave the future of cities through data

Transportation modeling/ Urban planning/ Machine learning/ Computer science/ GIS