2022-04-15

Causal Effects

What is Causal Effects

Causal effects, the heart of causal inference, refer to the change in an outcome due to a specific intervention or treatment. The treatment could be a medicine given to a patient, a policy change implemented in a country, or a teaching method applied in a classroom.

Rubin Causal Model (RCM)

The Rubin Causal Model, named after the statistician Donald Rubin, formalizes the potential outcomes framework for causal inference. The causal effect for an individual $i$ is defined as the difference between potential outcomes under treatment, $Y_i(1)$ , and potential outcomes under control, $Y_i(0)$ .

CE_i = Y_i(1) - Y_i(0)

This individual-level causal effect is often of interest, but in many situations, we can't identify it for each individual due to the fundamental problem of causal inference. Instead, we focus on average causal effects over a population or a subpopulation.

Average Treatment Effect (ATE)

Average Treatment Effect (ATE) is one of the fundamental measures in causal inference. It represents the expected difference in outcomes due to treatment across the entire population.

Mathematically, the ATE is defined as:

ATE = E[Y_i(1) - Y_i(0)]

where $E[]$ denotes expectation. This measures the average effect of the treatment across all units, both treated and untreated.

However, because of the fundamental problem of causal inference, we cannot directly observe both $Y_i(1)$ and $Y_i(0)$ for the same unit $i$ . Therefore, we often estimate the ATE in practice by comparing the average outcomes in the treatment and control groups:

\hat{ATE} = \frac{1}{N_t}\sum_{i \in T}Y_i - \frac{1}{N_c}\sum_{i \in C}Y_i

where $T$ is the set of treated units, $C$ is the set of control units, $N_t$ is the number of treated units, and $N_c$ is the number of control units.

Example of ATE

Consider a randomized controlled trial studying the effect of a new drug. Each patient is either given the new drug (treatment) or a placebo (control). After the trial, we measure some health outcome, like recovery rate. The ATE in this case would be the average difference in recovery rate between patients who took the new drug and those who took the placebo.

If the trial is perfectly randomized, the observed difference in outcomes between the treatment and control groups is an unbiased estimator of the ATE. However, in observational studies or imperfectly randomized experiments, estimating the ATE can be more complex due to potential confounding factors. Advanced statistical methods are often needed to correct for these confounders and obtain unbiased estimates of the ATE.

Conditional Average Treatment Effect (CATE)

The Conditional Average Treatment Effect (CATE) extends the concept of the ATE by considering the effect of treatment conditional on observed characteristics (covariates) of the units. This can be especially valuable when the treatment effect varies across different subgroups.

Mathematically, CATE for a specific covariate value $x$ is defined as:

CATE(x) = E[Y_i(1) - Y_i(0) | X_i = x]

where $X_i$ represents the covariates for unit $i$ .

In practice, we often need to estimate the CATE due to the fundamental problem of causal inference. This is typically done using methods like stratification, regression adjustment, or more advanced machine learning techniques.

Example of CATE

Consider an education study investigating the impact of a new teaching method. The CATE would allow us to examine the effect of this method for different groups of students, such as those with high prior achievement versus those with low prior achievement.

Suppose the covariate $X_i$ represents prior achievement, which can take on values "high" or "low". Then, we could estimate the CATE for each group as:

\hat{CATE}("high") = \frac{1}{N_t^h}\sum_{i \in T^h}Y_i - \frac{1}{N_c^h}\sum_{i \in C^h}Y_i

\hat{CATE}("low") = \frac{1}{N_t^l}\sum_{i \in T^l}Y_i - \frac{1}{N_c^l}\sum_{i \in C^l}Y_i

where $T^h$ and $C^h$ are the sets of treated and control units with high prior achievement, and $N_t^h$ and $N_c^h$ are the number of such units; similarly, $T^l$ and $C^l$ are the sets of treated and control units with low prior achievement, and $N_t^l$ and $N_c^l$ are the number of such units.

Local Average Treatment Effect (LATE)

Local Average Treatment Effect (LATE) focuses on estimating the treatment effect for individuals who are affected by a specific treatment, known as "compliers". Compliers are individuals who receive the treatment only if certain conditions are met, such as being assigned to the treatment group or being willing to comply with the treatment protocol.

Let's consider a binary treatment variable, $D$ , that takes a value of 1 if an individual receives the treatment and 0 otherwise. Additionally, we have an outcome variable, $Y$ , which represents the response of interest. The potential outcomes are denoted as $Y(0)$ and $Y(1)$ , indicating the outcome under no treatment and under treatment, respectively.

The causal effect of the treatment on the outcome can be defined as:

LATE = \frac{E[Y(1)|Z = 1] - E[Y(0)|Z = 1]}{E[D|Z = 1] - E[D|Z = 0]}

where,

$D$ is a binary treatment indicator, where $D = 1$ represents treatment and $D = 0$ represents control.
$Y(D)$ is the potential outcome if the individual receives treatment level $D$ .
$Z$ is an instrumental variable that affects the likelihood of receiving the treatment, but does not affect the outcome directly.

The denominator of this equation, $E[D|Z = 1] - E[D|Z = 0]$ , measures the effect of the instrument $Z$ on the treatment $D$ . The numerator, $E[Y(1)|Z = 1] - E[Y(0)|Z = 1]$ , measures the difference in potential outcomes under treatment and control for those individuals for whom the instrument changes the treatment. Thus, the LATE measures the causal effect of the treatment for the subpopulation of compliers, who change their treatment status in response to the instrument $Z$ .

Comparison with ATE

ATE, by contrast, measures the expected difference in outcomes if we were to apply the treatment to everyone in the population, compared to if we applied the control to everyone.

The crucial difference between ATE and LATE lies in the populations they target. ATE gives the average effect of the treatment over the entire population, including those who would always take the treatment, never take the treatment, and those who are influenced by the instrument (compliers). LATE, on the other hand, targets specifically the compliers.

Average Treatment Effect on the Treated (ATT)

The Average Treatment Effect on the Treated (ATT), also known as the effect of the treatment on the treated, is another important measure in causal inference. This measure focuses specifically on those units that receive the treatment.

Mathematically, the ATT is defined as:

ATT = E[Y_i(1) - Y_i(0) | D_i = 1]

where $D_i$ is an indicator for whether unit $i$ receives the treatment (with $D_i = 1$ if unit $i$ is treated).

Because of the fundamental problem of causal inference, we cannot directly observe $Y_i(0)$ for treated units. Therefore, we need to estimate it, often using data from the control group. However, this can introduce bias if the treated and control units differ systematically. Various methods, such as matching or weighting based on propensity scores, are used to correct for this selection bias and estimate the ATT.

Example of ATT

Consider a job training program designed to improve employment prospects. If we're specifically interested in the effect of the training on those who actually received it, we would look at the ATT.

We might calculate the ATT by comparing the employment outcomes of those who received the training to similar individuals who did not receive the training. By focusing on similar individuals, we aim to approximate the counterfactual outcome $Y_i(0)$ for the treated units, allowing us to estimate the ATT. However, estimating the ATT can be challenging if the treatment assignment is not random, requiring careful consideration of potential confounding variables.

Average Treatment Effect on the Controls (ATC)

The Average Treatment Effect on the Controls (ATC) is another measure of interest in causal inference, which focuses on the average effect the treatment would have had on those units that did not receive the treatment.

Mathematically, the ATC is defined as:

ATC = E[Y_i(1) - Y_i(0) | D_i = 0]

where $D_i$ is an indicator for whether unit $i$ receives the treatment (with $D_i = 0$ if unit $i$ is in the control group).

Like other measures of causal effects, we face the fundamental problem of causal inference when trying to calculate the ATC: we can't directly observe the potential outcome under treatment, $Y_i(1)$ , for the control units. Therefore, we often need to estimate it using the observed outcomes of the treated units. However, this can be biased if the control and treated units are systematically different. Various methods, such as matching or weighting based on propensity scores, can be used to correct for this bias and estimate the ATC.

Example of ATC

Let's consider a scholarship program that covers tuition fees for selected students. Suppose we're interested in understanding what would have happened to non-recipients' academic performance if they had received the scholarship. This would be the ATC.

We might estimate the ATC by comparing the academic performance of scholarship recipients to similar students who did not receive the scholarship. By focusing on similar students, we aim to approximate the counterfactual outcome $Y_i(1)$ for the control units. However, estimating the ATC can be challenging if the treatment assignment (scholarship allocation in this case) is not random, requiring careful handling of potential confounding variables.

Correlation and Causation

Randomized Controlled Trial (RCT)

Descriptive Statistics

Differential Equation

Dimensionality Reduction

Discrete Choice Model

Google Search Console

Hugging Face

Hypothesis Testing

Inferential Statistics

Probability Distribution

Ryusei Kakujo

Weave the future of cities through data

Transportation modeling/ Urban planning/ Machine learning/ Computer science/ GIS