Multiple imputation involves imputing
values for each missing cell
in your data matrix and creating
``completed'' data sets. (Across
these completed data sets, the observed values are the same, but the
missing values are filled in with different imputations that reflect
our uncertainty about the missing data.) After imputation with our
EMis algorithm,
will then save the
data sets. You then
apply whatever statistical method you would have used if there had
been no missing values to each of the
data sets, and use a simple
procedure, described in the next paragraph, to combine the results.
(If you use the Stata package for statistical analysis, you may be
interested in our MI procedures, or the
CLARIFY
package, both of which can
combine the results automatically.) Under normal circumstances, you
only need to impute once and can then analyze the
imputed data
sets as many times and for as many purposes as you wish. The
advantage of
is that it combines the comparative speed and
ease-of-use of our EMis algorithm with the power of multiple
imputation, to let you focus on your substantive research questions
rather than spending time developing complex application-specific
models for nonresponse in each new data set. Unless the rate of
missingness is exceptionally high,
(the program default) is
probably adequate.
In order to combine the results, first decide on the quantity of
interest to compute, such as a univariate mean, regression
coefficient, predicted probability, or first difference. The multiple
imputation estimate of this parameter,
, is the average of the
separate estimates,
:
If, instead of point estimates and standard errors, simulations of
are desired (as would be used to compute quantities of interest
directly; see King, Tomz, and Wittenberg,
2000), use each completed data set to
create
the needed number of simulations and then combine them
all into one set of simulations.
Users should see especially Pp. 57-58 of our article for a variety of practical suggestions in making imputations, such as what variables to include in the imputation stage, how to keep imputations within logically possible ranges, etc.