2022-04-30

Propensity Score Matching (PSM)

What is Propensity Score Matching (PSM)

Propensity score matching (PSM) is a statistical technique used in observational studies to estimate the causal effect of a particular intervention or treatment by accounting for the covariates that predict receiving the treatment. This technique allows for more accurate results by addressing issues of confounding, where the effects of certain variables might be confused with those of other variables.

Theory of PSM

The propensity score is the conditional probability of receiving a treatment given a set of observed covariates. Let's denote the treatment status as T (where T=1 indicates treatment and T=0 control), and the set of observed covariates as X. The propensity score, e(X), is defined as:

e(X) = P(T=1|X)

This represents the likelihood of receiving the treatment given the covariates.

The propensity score serves as a balancing score, meaning that the distribution of observed covariates will be similar between treated and untreated subjects with the same propensity score. Mathematically, this property can be expressed as:

T \perp X | e(X)

This equation implies that, conditional on the propensity score, the treatment assignment T is independent of the covariates X.

In observational studies, the goal is often to estimate the average treatment effect (ATE), which is defined as:

ATE = E[Y(1) - Y(0)]

where Y(1) represents the potential outcome if the subject is treated and Y(0) the potential outcome if the subject is not treated.

However, since we can only observe one potential outcome for each subject, estimating ATE directly is not feasible. PSM helps to address this problem by creating matched pairs of treated and untreated subjects that have similar propensity scores, thereby mimicking a randomized experiment.

The process begins with calculating the propensity scores for each subject, then matching treated and untreated subjects based on their propensity scores. After matching, the balance of covariates is checked to ensure that the distribution of covariates is similar in the two groups. Finally, the treatment effect is estimated based on the matched sample.

To illustrate this process, let's consider a simple hypothetical example:

Subject ID Treatment Status Covariate X1 Covariate X2 Propensity Score
1 1 5 2 0.75
2 0 4 2 0.72
3 1 7 3 0.80
4 0 5 3 0.78

By matching on propensity scores, we could pair Subject 1 with Subject 2, and Subject 3 with Subject 4, creating a matched sample that is balanced on the observed covariates.

Assumptions Behind PSM

The application of PSM relies on two main assumptions:

  • Ignorability assumption (Conditional Independence Assumption)
  • Overlap assumption (Common Support Assumption)

Ignorability Assumption

Also known as the Conditional Independence Assumption (CIA), this assumption stipulates that, given the observed covariates, the potential outcomes are independent of treatment assignment. Mathematically, this can be expressed as:

\{Y(0), Y(1)\} \perp T | X

This equation implies that, conditional on the covariates X, the potential outcomes Y(0) and Y(1) are independent of the treatment assignment T. The ignorability assumption allows us to estimate the average treatment effect by comparing outcomes between the treated and untreated groups, as the observed covariates that may confound the relationship between treatment and outcome are controlled for.

However, a key limitation of this assumption is that it is untestable. In the absence of knowledge about all potential confounders or a randomized experiment, we cannot definitively confirm whether this assumption holds.

Overlap Assumption

The overlap assumption, also known as the common support assumption, asserts that for each set of covariate values, there is a positive probability of being in either the treatment or control group. In terms of the propensity score, this means that for every propensity score, there should be both treated and untreated units. Mathematically, this is represented as:

0 < P(T=1|X=x) < 1 \quad \text{for all } x

This assumption ensures that each treated unit can be matched with a similar untreated unit. In the absence of this assumption, the estimation of treatment effects becomes difficult, especially for those treated units that have no comparable untreated units (or vice versa).

It is important to check the common support assumption when using PSM. A common practice is to plot the distributions of propensity scores for the treated and untreated groups and visually inspect the degree of overlap. Any treated units outside the range of the untreated (and vice versa) are typically removed in the matching process to satisfy this assumption.

Process of PSM

The PSM process can be broken down into three steps: defining the propensity score, matching participants based on these scores, and assessing the quality of the match.

Defining the Propensity Score

The first step in PSM involves calculating the propensity score for each individual in the study. The propensity score is the conditional probability of receiving the treatment given the observed covariates. This is typically estimated using logistic regression, although other methods can also be used.

For a binary treatment T and a set of observed covariates X, the propensity score is defined as:

e(X) = P(T=1|X)

Matching Participants

Once the propensity scores have been calculated, the next step is to match treated and untreated participants based on these scores. The goal is to create a set of treated and untreated participants who have similar propensity scores, thus mimicking a randomized experiment.

Several matching techniques can be used, including:

  • Nearest neighbor matching
    Each treated participant is matched to an untreated participant with the closest propensity score.

  • Caliper matching
    Each treated participant is matched to an untreated participant with a propensity score within a certain range (the caliper).

  • Stratification matching
    The range of propensity scores is divided into intervals (strata), and treated and untreated participants within the same stratum are matched.

  • Kernel matching
    A weighted average of all untreated participants is used to create a match for each treated participant, with the weights determined by the propensity score.

Each of these methods has its strengths and weaknesses, and the choice of method depends on the specifics of the study.

Assessing the Quality of the Match

The final step in PSM is to assess the quality of the match. This involves checking whether the distribution of covariates is similar between the treated and untreated groups after matching.

A common method for assessing balance is to calculate the standardized difference in means for each covariate before and after matching. If the matching process is successful, the standardized differences should be small (typically less than 0.1) for all covariates after matching.

In addition, graphical methods can be used to visually assess balance. For example, one can plot the distributions of propensity scores or covariates for the treated and untreated groups before and after matching and compare the degree of overlap.

If the covariates are not balanced after matching, it may be necessary to adjust the matching process (for example, by changing the caliper in caliper matching) or to include additional covariates in the propensity score model. The process of defining the propensity score, matching participants, and assessing the quality of the match may need to be iterated several times to achieve satisfactory balance.

Limitations and Pitfalls of PSM

Despite its numerous applications and advantages, PSM comes with certain limitations and pitfalls. Understanding these is crucial for correct interpretation of results and for knowing when and how to use PSM.

  • Hidden Bias due to Unmeasured Confounders
    One of the main limitations of PSM is the possibility of hidden bias due to unmeasured confounders. PSM can balance observed covariates between the treatment and control groups, but if there are unobserved covariates that affect both the treatment assignment and the outcome, there can still be bias in the estimated treatment effect. This issue is especially problematic as the ignorability assumption, which assumes no unmeasured confounders, is untestable.

  • Over-reliance on the Propensity Score Model
    The propensity score model, often a logistic regression, is used to estimate the probability of treatment assignment based on observed covariates. If this model is misspecified – if important interaction terms or non-linear relationships are overlooked, for instance – the resulting propensity scores may be biased, leading to biased estimates of the treatment effect.

  • Reduction of Data Size
    Another potential pitfall is the reduction of data size due to matching. When treated units are matched to untreated units, it often results in excluding unmatched units from the analysis, especially when exact or caliper matching is used. This not only reduces the sample size, but can also introduce bias if the excluded units are systematically different from the included ones. In the worst-case scenario, the overlap assumption may be violated, meaning there are treated units for which there are no similar untreated units, or vice versa.

  • Sensitivity to the Choice of Matching Algorithm
    The choice of matching algorithm can have a significant impact on the estimated treatment effect. Different algorithms (nearest neighbor, caliper, kernel, etc.) have different strengths and weaknesses, and there is no universally "best" choice. The choice of algorithm should be guided by the specifics of the study, and researchers should ideally check the robustness of their results to the choice of algorithm.

Case Study of PSM

Let's consider a hypothetical case study in the context of educational research.

Case Background

Suppose we are interested in evaluating the effect of a new teaching strategy (treatment) on students' final exam scores. The teaching strategy was implemented in some classrooms (treatment group), but not in others (control group). Given that this was not a randomized trial, there might be confounding factors, such as students' previous academic performance and socio-economic status. Our dataset contains information about students' final exam scores, whether they were exposed to the new teaching strategy, their grade point average (GPA) in the previous year, and an index of socio-economic status.

The following table illustrates a small subset of our data:

Student ID Treatment Status Previous GPA Socio-Economic Index Final Exam Score
1 1 3.5 7 85
2 0 3.2 5 80
3 1 3.8 8 88
4 0 3.0 6 78
5 0 3.1 6 81
6 1 3.7 7 87

Propensity Score Estimation

We start by estimating the propensity scores using a logistic regression model, where the dependent variable is the treatment status, and the independent variables are the students' previous GPA and the socio-economic index. The propensity score for each student represents the predicted probability of receiving the new teaching strategy given their covariates.

Matching

Next, we match students in the treatment group with students in the control group based on their propensity scores. In this case, we'll use nearest neighbor matching without replacement, meaning each student in the control group can only be matched to one student in the treatment group.

Checking the Balance

After matching, we check the balance of covariates in the treatment and control groups. If the standardized differences in means for the covariates are small (typically less than 0.1), we can conclude that the matching was successful.

Estimating the Treatment Effect

Finally, we estimate the treatment effect by comparing the average final exam scores in the treatment and control groups. This difference in means gives us an estimate of the average effect of the new teaching strategy on students' final exam scores.

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!