next up previous contents home.gif
Next: Optimal matching Up: A User's Guide by Previous: Exact Matching   Contents

Propensity Score Matching

When insufficient exact matches can be found, as this becomes increasingly common as the number of covariates increase, we need to find a way to identify matches that are ``close.'' In this situation, matching on the estimated propensity score is a useful alternative. The propensity score is the probability that a unit receives treatment, given the covariates. To conduct propensity score matching, with pre-treatment covariates composed of real earnings in 1974 and 1975:

> foo2 <- matchit(treat ~ re74 + re75, data=lalonde)

You may again check basic statistics of the MATCHIT object by the print command:

> print(foo2)
 
Assignment model specification:
matchit(formula = treat ~ re74 + re75, data = lalonde)
 
Summary of propensity score for full and matched samples:
 
        Means Treated Means Control      SD     T-stat       Bias
Full           0.3519        0.2795 0.11817  8.2436469  8.202e-01
Matched        0.3519        0.3520 0.08806 -0.0003493 -3.624e-05
 
Sample sizes:
 
        Treated Control Total
Full        185     429   614
Matched     185     185   370
We see that 185 control units were matched to the 185 treated units (a ``1-1'' match). The average propensity scores in the matched treated and control groups are much more similar than in the original groups, with both groups having propensity score means of roughly $ 0.35$ in the matched samples. The summary command gives further information on the original and matched samples:

> summary(foo2)
 
Assignment model specification:
matchit(formula = treat ~ re74 + re75, data = lalonde)
 
Summary of covariates for all data:
 
       Means Treated Means Control        SD T-stat    Bias
pscore        0.3519        0.2795    0.1182  8.244  0.8202
re74       2095.5737     5619.2365 6477.9645 -7.246 -0.7211
re75       1532.0553     2466.4844 3295.6790 -3.278 -0.2903
 
Summary of covariates for matched data:
 
       Means Treated Means Control        SD     T-stat       Bias Reduction
pscore        0.3519        0.3520 8.806e-02 -0.0003493 -3.624e-05         1
re74       2095.5737     2040.3275 4.697e+03  0.1129664  1.131e-02         1
re75       1532.0553     1436.3806 2.849e+03  0.3225721  2.972e-02         1
 
Sample sizes:
 
        Treated Control Total
Full        185     429   614
Matched     185     185   370
 
Problematic covariates:
Number of units discarded:   0

This reveals simple statistics of the propensity score and the covariates used in the propensity score specification for the full and matched samples, including t-statistics and balance bias statistics used to assess whether there was a reduction in bias in the covariates. All three variables (propensity score, 1974 income, and 1975 income) had reductions in bias due to the matching. For example, the original bias in 1974 income was $ -0.72$ standard deviations, but is only $ 0.011$ standard deviations in the matched samples. More specifically, job training participants on average earned roughly $3,523 less in 1974 and $934 less in 1975 than non-participants, significant differences with t-statistics of -7.25 and -3.28, respectively. In the matched sample, the earnings difference is only $56 (t-statistic=0.11) in 1974 and $96 (t-statistic=0.32) in 1975. This one-to-one matching algorithm has thus chosen 185 control individuals who do look very similar to the treated group on the covariates used in the matching process (1974 income and 1975 income).

The summary command will additionally report (a) the original call of the MATCHIT object, (b) whether there are any ``Problematic covariates'' that may still be imbalanced in the assignment model,6 and (c) how many units were discarded due to the discard option (described below). In this case there were no units discarded and no ``problematic covariates.''

For further information on the balance in the full and matched samples we can use the verbose=T option with summary, which shows the balance of all squares and interactions of the covariates used in the matching procedure. This is helpful for diagnosing whether balance across matched pairs has been attained. Significant differences in higher order interactions usually are a good indication that the assignment model needs to be respecified, as discussed in Section 2.17.

> summary(foo2, verbose=T)
 
Assignment model specification:
matchit(formula = treat ~ re74 + re75, data = lalonde)
 
Summary of covariates and interactions for all data:
 
              Means Treated Means Control        SD  T-stat     Bias
pscore            3.519e-01     2.795e-01 1.182e-01  8.2436  0.82019
re74              2.096e+03     5.619e+03 6.478e+03 -7.2456 -0.72108
re75              1.532e+03     2.466e+03 3.296e+03 -3.2776 -0.29026
pscorexpscore     1.316e-01     9.312e-02 5.896e-02  8.5621  0.82170
pscorexre74       3.284e+02     7.620e+02 6.627e+02 -8.1426 -0.74246
pscorexre75       3.819e+02     4.960e+02 6.953e+02 -1.8020 -0.15444
re74xre74         2.814e+07     7.756e+07 1.353e+08 -4.5738 -0.43306
re74xre75         1.312e+07     2.543e+07 5.354e+07 -2.6991 -0.24252
re75xre75         1.265e+07     1.690e+07 4.478e+07 -0.9365 -0.07568
 
Summary of covariates and interactions for matched data:
 
              Means Treated Means Control        SD     T-stat       Bias
pscore            3.519e-01     3.520e-01 8.806e-02 -0.0003493 -3.624e-05
re74              2.096e+03     2.040e+03 4.697e+03  0.1129664  1.131e-02
re75              1.532e+03     1.436e+03 2.849e+03  0.3225721  2.972e-02
pscorexpscore     1.316e-01     1.316e-01 4.675e-02  0.0139071  1.444e-03
pscorexre74       3.284e+02     3.340e+02 5.772e+02 -0.0927353 -9.542e-03
pscorexre75       3.819e+02     3.959e+02 6.999e+02 -0.1916425 -1.891e-02
re74xre74         2.814e+07     2.442e+07 1.001e+08  0.3570086  3.261e-02
re74xre75         1.312e+07     8.919e+06 4.064e+07  0.9939819  8.270e-02
re75xre75         1.265e+07     7.942e+06 4.229e+07  1.0720362  8.410e-02
              Reduction
pscore                1
re74                  1
re75                  1
pscorexpscore         1
pscorexre74           1
pscorexre75           1
re74xre74             1
re74xre75             1
re75xre75             0
 
Sample sizes:
 
        Treated Control Total
Full        185     429   614
Matched     185     185   370
 
Problematic covariates:
Number of units discarded:   0

We can also check the propensity score and covariate distribution with diagnostic plots, which are depicted in Figure 1. These plot functions are interactive. For example, the first menu asks whether you would like to see density estimates of the propensity scores. Inputting 1 will yield the top panel in Figure 1.

> plot(foo2)
  Choices
0 No     
1 Yes    
Would you like to see density estimates of the propensity scores?

The density curves overlay control and treatment units for full and matched samples. Next, the menu will prompt you whether you would like to see jitter plots of the propensity scores.

Would you like to see density estimates of the propensity scores?1
  Choices
0 No     
1 Yes    
Would you like to see a jitterplot of the propensity scores?

Entering a 1 also reveals instructions on how to interactively identify particular units, which may be useful for identifying particular outliers:

[1] "To identify the units, use first mouse button; to stop, use
second."

Clicking the first mouse button near the units will bring up the observation name specified in the data frame. You may end this by clicking the second mouse button.

Lastly, the plot command allows you to plot density estimates for any covariates:7

  Choices         
0 No              
1 Yes :  pscore   
2 Yes :  re74     
3 Yes :  re75     
Would you like to see density estimates of any other covariates?

Figure 1: Sample interactive diagnostic graphs
\includegraphics[scale=0.5]{figs/f2figa} \includegraphics[scale=0.5]{figs/f2figb}

Examining these graphs in Figure 1, we see that the matched samples are very well matched on the propensity score, with very similar distributions in the matched treated and control groups.


next up previous contents home.gif
Next: Optimal matching Up: A User's Guide by Previous: Exact Matching   Contents
Gary King 2005-03-09