Next: Exact Matching
Up: Arguments
Previous: Arguments
  Contents
All Matching Methods
- formula: formula used to calculate the distance measure for matching.
It takes the usual syntax of R formulas, treat
~ x1 + x2, where treat is a binary treatment indicator
and x1 and x2 are the pre-treatment covariates. Both the
treatment indicator and pre-treatment covariates must be contained
in the same data frame, which is specified as data (see
below). All of the usual R syntax for formulas work here. For
example, x1:x2 represents the first order interaction term
between x1 and x2, and I(x1 ^ 2) represents the
square term of x1. See help(formula) for details.
- data: the data frame containing the variables
called in formula. You may find it helpful for the
diagnostics to specify observation names in the data frame (see
Section 5.2.2).
- method: the matching method (default=nearest). Currently,
exact (exact matching), full (full matching),
nearest (nearest neighbor matching), optimal
(optimal matching), subclass (subclassification), and genetic
(genetic matching) are
available. Note that within each of
these matching methods, MATCHIT offers a variety of options. See
Section 3 for more details.
- distance: the method used to estimate the
distance measure (default=logistic regression, logit).
Before using any of these techniques, it is best to understand the
theoretical groundings of these techniques and to evaluate the
results. Most of these methods (such as logistic or probit
regression) are estimating the propensity score, defined as the
probability of receiving treatment, conditional on the covariates
(Rosenbaum & Rubin (1983)). The distance measures used are the predicted
probabilities from the model (the propensity scores). Currently,
the following methods are available:
- mahalanobis computes the Mahalanobis distance measure
(mahalanobis() in the stats package).
- binomial generalized linear models with various links ( glm() in the stats package); logit (logistic
link), linear.logit (logistic link with linear propensity
score)4.1, probit (probit
link), linear.probit (probit link with linear propensity
score), cloglog (complementary log-log link), linear.cloglog (complementary log-log link with linear
propensity score), log (log link), linear.log (log
link with linear propensity score), cauchit (Cauchy CDF
link), linear.cauchit (Cauchy CDF link with linear
propensity score).
- binomial generalized additive model with various links ( gam() in the mgcv package); GAMlogit (logistic
link), GAMlinear.logit (logistic link with linear propensity
score), probit (probit link), GAMlinear.probit
(probit link with linear propensity score), GAMcloglog
(complementary log-log link), GAMlinear.cloglog
(complementary log-log link with linear propensity score), GAMlog (log link), GAMlinear.log (log link with linear
propensity score), GAMcauchit (Cauchy CDF link), GAMlinear.cauchit (Cauchy CDF link with linear propensity
score). Beck & Jackman (1998); Hastie & Tibshirani (1990) and many others discuss the
generalized additive models.
- nnet, neural network model (nnet() in the nnet package).
King & Zeng (2002); Zeng (1999); Beck, King & Zeng (2000); White (1992); Bishop (1995) among many
others discuss neural networks.
- rpart, classification trees (rpart() in the
rpart package). Ruger et al. (2003); Breiman et al. (1984) and many
others discuss classification trees.
- distance.options specifies the optional arguments that
are passed to the model for estimating the distance measure. The
input to this argument should be a list. For example, if the
distance measure is estimated with a logistic regression, users can
increase the maximum IWLS iterations by distance.options =
list(maxit = 5000).
- discard: whether to discard units that fall
outside some measure of support of the distance score before
matching, and not allow them to be used at all in the matching
procedure (default=none). Note that discarding units may change the quantity of
interest being estimated.
- none (default) discards no units before matching.
Use this option when the units to be matched are substantially
similar, such as in the case of matching treatment and control
units from a field experiment that was close to (but not fully)
randomized (e.g., Imai 2005), when caliper matching will
restrict the donor pool, or when you do not wish to change the
quantity of interest and the parametric methods to be used
post-matching can be trusted to extrapolate.
- both discards all units (treated and control) that
are outside the support of the distance measure. Use this option
when the units to be matched are substantially different (when
there is a large degree of non-overlapping support on the distance
score), such as in the case of measuring the effect of democracy
on economic growth.
- control discards only control units outside the
support of the distance measure of the treated units. Use this
option when the average treatment effect on the treated is of most
interest and when unwilling to discard non-overlapping treatment
units (which would change the quantity of interest), such as
possibly in the case of the effect of job training on those
individuals that actually participated in a job evaluation program
or a drug study where interest is in all patients treated with the
drug.
- treat discards only treated units outside the support
of the distance measure of the control units. Use this option
when the average treatment effect on the control units is of most
interest and when unwilling to discard control units.
- convex.hull discards control units not within the
convex hull of the treated units using the method developed in
(King & Zeng, 2005).
- reestimate: whether the model for distance
measure should be re-estimated after units are discarded (default=FALSE). The input
must be a logical value.
Re-estimation may be desirable for efficiency reasons, especially if
many units were discarded and so the post-discard samples are quite
different from the original samples.
- verbose: whether or not to print out comments
indicating the status of the matching (default=FALSE). The input must be a logical
value.
Next: Exact Matching
Up: Arguments
Previous: Arguments
  Contents
Gary King
2005-09-26