2022-12-26

Binary Logit Model

Statistics

Statistical Model

Discrete Choice Model

What is Binary Logit Model

The binary logit model is a statistical tool used to model the probability of an event occurring in situations with two possible outcomes. Also known as logistic regression, it is widely used in various fields, including social sciences, economics, and health sciences, to predict binary responses based on one or more predictor variables. The model's popularity stems from its ease of interpretation, flexibility, and strong theoretical underpinnings.

At the heart of the binary logit model is the logistic function, which maps any input value to a probability between 0 and 1. This function allows for the estimation of probabilities that are not easily modeled using traditional linear regression techniques. The logit model's primary goal is to determine the relationship between a set of predictor variables and the binary outcome of interest.

Understanding the Logistic Function

The logistic function, sometimes referred to as the sigmoid function, is the cornerstone of the binary logit model. It is a mathematical function that maps any real-valued input to a probability value between 0 and 1. This transformation makes the logistic function ideal for modeling the relationship between predictor variables and binary outcomes.

The Logistic Function Defined

The logistic function is formally defined as:

f(x) = \frac{1}{1 + \exp(-x)}

where $x$ is the input value and $\exp()$ denotes the exponential function. The logistic function is S-shaped and ranges between 0 and 1. As $x$ approaches negative infinity, $f(x)$ approaches 0; as $x$ approaches positive infinity, $f(x)$ approaches 1.

Properties of the Logistic Function

There are several important properties of the logistic function that make it well-suited for modeling probabilities in binary logit models:

Boundedness
The logistic function's output is always between 0 and 1, which aligns with the range of probabilities.
Monotonicity
The logistic function is strictly increasing, meaning that as the input value increases, so does the output value.
Differentiability
The logistic function is differentiable, which makes it amenable to optimization techniques used in model fitting.

The Logit Transformation

The logit transformation is the inverse of the logistic function and is used to model the relationship between predictor variables and the log-odds of the binary outcome. The logit transformation is defined as:

\text{logit}(p) = \ln\left(\frac{p}{1 - p}\right)

where $p$ is the probability of the binary outcome, and $\ln()$ denotes the natural logarithm. The logit transformation maps probabilities from the (0,1) interval to the entire real number line, enabling the use of linear regression techniques to estimate the model parameters.

Incorporating Predictor Variables

In the binary logit model, the logit transformation is applied to the linear combination of predictor variables, represented as:

\text{logit}(p) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n

where $p$ is the probability of the binary outcome, $\beta_0$ is the intercept, $\beta_1$ to $\beta_n$ are the coefficients for predictor variables $x_1$ to $x_n$ , respectively. By applying the logistic function to this linear combination, we obtain the probability of the binary outcome as a function of the predictor variables:

p(x) = \frac{1}{1 + \exp(-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n))}

Model Interpretation and Odds Ratios

Once the binary logit model has been fitted using Maximum Likelihood Estimation, interpreting the model's coefficients is crucial for understanding the relationship between predictor variables and the binary outcome. In this chapter, I will discuss how to interpret the coefficients of the binary logit model using odds ratios and explore the implications of these interpretations for decision-making and hypothesis testing.

Interpreting Coefficients in the Logit Model

In the binary logit model, the coefficients represent the change in the log-odds of the positive outcome for a one-unit increase in the corresponding predictor variable, holding all other variables constant. While this interpretation is mathematically accurate, it is not easily interpretable in practical terms. To facilitate interpretation, we can use odds ratios.

Odds Ratios

Odds ratios are a more intuitive way to express the relationship between predictor variables and the binary outcome in the logit model. The odds ratio for a given predictor variable is the ratio of the odds of the positive outcome when the predictor variable increases by one unit, holding all other variables constant. Mathematically, the odds ratio for a predictor variable $x_j$ is:

\text{OR}_j = \exp(\beta_j)

where $\beta_j$ is the coefficient for the predictor variable $x_j$ . If the odds ratio is greater than 1, a one-unit increase in the predictor variable increases the odds of the positive outcome; if the odds ratio is less than 1, a one-unit increase in the predictor variable decreases the odds of the positive outcome.

Interpretation of Odds Ratios

To illustrate the interpretation of odds ratios, let's consider a hypothetical binary logit model that predicts the likelihood of a customer making a purchase based on their age and income:

\text{logit}(p) = \beta_0 + \beta_1 \cdot \text{Age} + \beta_2 \cdot \text{Income}

Assume that the estimated coefficients are $\beta_1 = 0.10$ and $\beta_2 = 0.05$ . The odds ratios for Age and Income are:

\text{OR}_{\text{Age}} = \exp(0.10) \approx 1.10

\text{OR}_{\text{Income}} = \exp(0.05) \approx 1.05

These odds ratios indicate that a one-year increase in age is associated with a 10% increase in the odds of making a purchase, and a one-unit increase in income is associated with a 5% increase in the odds of making a purchase, holding all other variables constant.

Hypothesis Testing and Confidence Intervals

Hypothesis testing can be performed to assess the statistical significance of each predictor variable in the binary logit model. The null hypothesis states that the predictor variable has no effect on the binary outcome, which implies that the corresponding coefficient is zero. The alternative hypothesis states that the predictor variable has a significant effect on the binary outcome, implying that the corresponding coefficient is different from zero.

Wald tests and likelihood ratio tests are commonly used for hypothesis testing in the binary logit model. Additionally, confidence intervals for the coefficients or odds ratios can be calculated to provide a range of plausible values for the true population parameters.

Binary Logit Model with R

In this chapter, I will walk through an example of fitting a binary logit model using R, a popular programming language for statistical computing. We will use the glm() function from the base R package to fit the model, evaluate its performance, and interpret the results.

Data Preparation

For this example, we will use the mtcars dataset that is built into R. This dataset consists of various car attributes and their respective fuel efficiency measured in miles per gallon (mpg). We will create a binary outcome variable indicating whether a car is fuel-efficient (1) or not (0) based on a threshold of 22.5 mpg.

First, load the data and create the binary outcome variable:

data(mtcars)
mtcars$efficient <- ifelse(mtcars$mpg > 22.5, 1, 0)

Fitting the Binary Logit Model

We will use the glm() function to fit a binary logit model with the fuel efficiency outcome variable and two predictor variables: weight (wt) and horsepower (hp). The family = binomial(link = "logit") argument specifies that we want to fit a binary logit model.

logit_model <- glm(efficient ~ wt + hp, data = mtcars, family = binomial(link = "logit"))
summary(logit_model)

The summary() function will display the model coefficients, standard errors, z-values, and p-values for each predictor variable.

Call:
glm(formula = efficient ~ wt + hp, family = binomial(link = "logit"),
    data = mtcars)

Deviance Residuals:
     Min        1Q    Median        3Q       Max
-1.72029  -0.00913  -0.00001   0.00314   1.40334

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  30.4063    18.0320   1.686   0.0917 .
wt           -3.1801     1.9659  -1.618   0.1057
hp           -0.2201     0.1447  -1.521   0.1283
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 38.0243  on 31  degrees of freedom
Residual deviance:  6.5068  on 29  degrees of freedom
AIC: 12.507

Number of Fisher Scoring iterations: 10

Call
This section displays the function call that was used to fit the model. It shows the response variable (efficient), the predictor variables (wt and hp), and the family specified for the model (binomial with logit link).
Deviance Residuals
These are the residuals of the model expressed in terms of deviance. The summary statistics (minimum, 1st quartile, median, 3rd quartile, and maximum) give an indication of how well the model fits the data. Ideally, the residuals should be small and symmetrically distributed around zero.
Coefficients
This section provides the estimated coefficients, standard errors, z-values, and p-values for each predictor variable and the intercept. The coefficients represent the change in the log-odds of the positive outcome (a car being fuel-efficient) for a one-unit increase in the predictor variable, holding all other variables constant.
- Intercept: 30.4063
- Weight (wt): -3.1801
- Horsepower (hp): -0.2201
Significance codes
The significance codes indicate the level of statistical significance for each predictor variable. In this model, none of the predictor variables are statistically significant at the 0.05 level, as indicated by the p-values (0.0917 for the intercept, 0.1057 for wt, and 0.1283 for hp).
Dispersion parameter
This value is set to 1 for the binomial family, indicating that the model assumes constant variance.
Null deviance and Residual deviance
The null deviance represents the deviance for a model with no predictor variables (i.e., only an intercept), while the residual deviance is the deviance for the fitted model. Comparing these values can give a rough indication of the model's goodness of fit. In this case, the residual deviance (6.5068) is much smaller than the null deviance (38.0243), suggesting that the model with the predictor variables provides a better fit than the null model.
AIC
The Akaike Information Criterion (AIC) is a measure of model fit that balances the likelihood of the model with the number of parameters. Smaller AIC values indicate better model fit. AIC can be used to compare different models fitted to the same data, with lower AIC values indicating better fitting models.
Number of Fisher Scoring iterations
This value indicates the number of iterations required for the algorithm to converge. In this case, 10 iterations were needed.

Model Diagnostics

To assess the model's classification performance, create a confusion matrix:

# Predict probabilities
predicted_prob <- predict(logit_model, type = "response")

# Convert probabilities to binary outcomes
predicted_outcome <- ifelse(predicted_prob > 0.5, 1, 0)

# Create confusion matrix
table(Predicted = predicted_outcome, Actual = mtcars$efficient)

        Actual
Predicted  0  1
        0 22  1
        1  1  8

Interpretation of Results

Interpret the coefficients and odds ratios of the predictor variables:

# Calculate odds ratios
exp(coef(logit_model))

 (Intercept)           wt           hp
1.604309e+13 4.157947e-02 8.024697e-01

The odds ratios are as follows:

Intercept: $1.604 \times 10^{13}$
Weight (wt): 0.0416
Horsepower (hp): 0.802

To interpret these odds ratios:

Intercept
The intercept represents the odds of a car being fuel-efficient when both weight and horsepower are zero. Since this value is very large, it has little practical meaning in this context. In practice, a car with zero weight or zero horsepower is not realistic.
Weight (wt)
For each unit increase in weight, the odds of a car being fuel-efficient decrease by approximately 96% (1 - 0.0416 = 0.9584), holding horsepower constant. This indicates that heavier cars are less likely to be fuel-efficient.
Horsepower (hp)
For each unit increase in horsepower, the odds of a car being fuel-efficient decrease by approximately 20% (1 - 0.802 = 0.198), holding weight constant. This suggests that cars with higher horsepower are less likely to be fuel-efficient.

Estimation, Interpretation, and Evaluation of Logit Model

Multinomial Logit Model

Descriptive Statistics

Differential Equation

Dimensionality Reduction

Discrete Choice Model

Google Search Console

Hugging Face

Hypothesis Testing

Inferential Statistics

Probability Distribution

Ryusei Kakujo

Weave the future of cities through data

Transportation modeling/ Urban planning/ Machine learning/ Computer science/ GIS