| Previous: Quick Overview | Up: Conducting Analyses after Matching | Next: Reference Manual |
We now give four examples using the Lalonde data. They are meant to be read sequentially. You can run these example commands by typing demo(analysis). Although we use the linear least squares model in these examples, a wide range of other models are available in Zelig (for the list of supported models, see http://gking.harvard.edu/zelig/docs/Models_Zelig_Can.html.
To load the Zelig package after installing it, type
> library(Zelig)
> m.out <- matchit(treat ~ age + educ + black + hispan + nodegree +
married + re74 + re75, method = "nearest", discard
= "hull.control", data = lalonde)
Then we check balance using the summary and plot procedures (which
we don't show here). When the best balance is achieved,
we run the parametric analysis:
> z.out <- zelig(re78 ~ treat + age + educ + black + hispan + nodegree +
married + re74 + re75, data = match.data(m.out),
model = "ls")
and then set the explanatory variables at their means (the default)
and change the treatment variable from a 0 to a 1:
> x.out <- setx(z.out, treat=0) > x1.out <- setx(z.out, treat=1)and finally compute the result and examine a summary:
> s.out <- sim(z.out, x = x.out, x1 = x1.out) > summary(s.out)
We begin by conducting nearest neighbor matching with a logistic regression-based propensity score:
> m.out1 <- matchit(treat ~ age + educ + black + hispan + nodegree +
married + re74 + re75, method = "nearest", data = lalonde)
Then we check balance using the summary and plot procedures (which
we don't show here). We reestimate the matching procedure until we
achieve the best balance possible. (The running examples here are
meant merely to illustrate, not to suggest that we've achieved the
best balance.) Then we go to Zelig, and in this case choose to fit
a linear least squares model to the control group only:
> z.out1 <- zelig(re78 ~ age + educ + black + hispan + nodegree +
married + re74 + re75, data = match.data(m.out1, "control"),
model = "ls")
where the "control" option in match.data() extracts only
the matched control units and ls specifies least squares
regression. In a smaller data set, this example should probably be
changed to include all the data in this estimation (using
data = match.data(m.out1) for the data) and by including
the treatment indicator (which is excluded in the example since its
a constant in the control group.) Next, we use the coefficients
estimated in this way from the control group, and combine them with
the values of the covariates set to the values of the treated units.
We do this by choosing conditional prediction (which means use the
observed values) in setx(). The sim() command does
the imputation.
> x.out1 <- setx(z.out1, data = match.data(m.out1, "treat"), cond = TRUE) > s.out1 <- sim(z.out1, x = x.out1)Finally, we obtain a summary of the results by
> summary(s.out1)
> z.out2 <- zelig(re78 ~ age + educ + black + hispan + nodegree +
married + re74 + re75, data = match.data(m.out1, "treat"),
model = "ls")
We then conduct the same simulation procedure in order to impute the
counterfactual outcome for the control group,
> x.out2 <- setx(z.out2, data = match.data(m.out1, "control"), cond = TRUE) > s.out2 <- sim(z.out2, x = x.out2)In this calculation, Zelig is computing the difference between observed and the expected values. This means that the treatment effect for the control units is the effect of control (observed control outcome minus the imputed outcome under treatment from the model). Hence, to combine treatment effects just reverse the signs of the estimated treatment effect of controls.
> ate.all <- c(s.out1$qi$att.ev, -s.out2$qi$att.ev)The point estimate, its standard error, and the
> mean(ate.all) > sd(ate.all) > quantile(ate.all, c(0.025, 0.975))
We begin this example by conducting subclassification with four subclasses,
> m.out2 <- matchit(treat ~ age + educ + black + hispan + nodegree +
married + re74 + re75, data = lalonde,
method = "subclass", subclass = 4)
When balance is as good as we can get it, we then fit a linear
regression within each subclass by controlling for the estimated
propensity score (called distance) and other covariates.
In most software, this would involve running four separate
regressions and then combining the results. In Zelig, however, all
we need to do is to use the by option:
> z.out3 <- zelig(re78 ~ re74 + re75 + distance,
data = match.data(m.out2, "control"),
model = "ls", by = "subclass")
The same set of commands as in the first example are used to do the
imputation of the counterfactual outcomes for the treated units:
> x.out3 <- setx(z.out3, data = match.data(m.out2, "treat"), fn = NULL,
cond = TRUE)
> s.out3 <- sim(z.out3, x = x.out3)
> summary(s.out3)
It is also possible to get the summary result for each subclass. For
example, the following command summarizes the result for the second
subclass.
> summary(s.out3, subset = 2)
> m.out <- matchit(treat ~ educ + black + hispan, data = lalonde,
method = "exact")
> m.data <- match.data(m.out)
> ## weighted diff in means
> weighted.mean(mdata$re78[mdata$treat == 1], mdata$weights[mdata$treat==1])
- weighted.mean(mdata$re78[mdata$treat==0], mdata$weights[mdata$treat==0])
[1] 807
> ## weighted least squares without covariates
> zelig(re78 ~ treat, data = m.data, model = "ls", weights = "weights")
Call:
zelig(formula = re78 ~ treat, model = "ls", data = m.data, weights =
"weights")
Coefficients:
(Intercept) treat
5524 807
> ## weighted least squares with covariates
> zelig(re78 ~ treat + black + hispan + educ, data = m.data, model = "ls",
weights = "weights")
Call:
zelig(formula = re78 ~ treat + black + hispan + educ, model = "ls",
data = m.data, weights = "weights")
Coefficients:
(Intercept) treat black hispan educ
314 807 -1882 258 657