next up previous contents home.gif
Next: Versions of Up: : A Program for Previous: Introduction   Contents

What $ {\mathfrak{A}melia}$ Does

Multiple imputation involves imputing $ m$ values for each missing cell in your data matrix and creating $ m$ ``completed'' data sets. (Across these completed data sets, the observed values are the same, but the missing values are filled in with different imputations that reflect our uncertainty about the missing data.) After imputation with our EMis algorithm, $ {\mathfrak{A}melia}$ will then save the $ m$ data sets. You then apply whatever statistical method you would have used if there had been no missing values to each of the $ m$ data sets, and use a simple procedure, described in the next paragraph, to combine the results. (If you use the Stata package for statistical analysis, you may be interested in our MI procedures, or the CLARIFY package, both of which can combine the results automatically.) Under normal circumstances, you only need to impute once and can then analyze the $ m$ imputed data sets as many times and for as many purposes as you wish. The advantage of $ {\mathfrak{A}melia}$ is that it combines the comparative speed and ease-of-use of our EMis algorithm with the power of multiple imputation, to let you focus on your substantive research questions rather than spending time developing complex application-specific models for nonresponse in each new data set. Unless the rate of missingness is exceptionally high, $ m=5$ (the program default) is probably adequate.

In order to combine the results, first decide on the quantity of interest to compute, such as a univariate mean, regression coefficient, predicted probability, or first difference. The multiple imputation estimate of this parameter, $ \bar q$, is the average of the $ m$ separate estimates, $ q_j$ $ (j=1,\dots,m)$:

$\displaystyle \bar q = \frac{1}{m}\sum_{j=1}^m q_j.$ (1)

The variance of the point estimate is the average of the estimated variances from within each completed data set, plus the sample variance in the point estimates across the data sets (multipled by a factor that corrects for bias because $ m<\infty$). Let SE$ (q_j)^2$ denote the estimated variance (squared standard error) of $ q_j$ from data set $ j$, and $ S^2_q=\sum_{j=1}^m (q_j-\bar
q)^2/(m-1)$ be the sample variance across the $ m$ point estimates. Then the standard error of the multiple imputation point estimate is the square root of

SE$\displaystyle (q)^2 = \frac{1}{m}\sum_{j=1}^m$   SE$\displaystyle (q_j)^2 +
 S^2_q\left(1+1/m\right).$ (2)

If, instead of point estimates and standard errors, simulations of $ q$ are desired (as would be used to compute quantities of interest directly; see King, Tomz, and Wittenberg, 2000), use each completed data set to create $ 1/m$ the needed number of simulations and then combine them all into one set of simulations.

Users should see especially Pp. 57-58 of our article for a variety of practical suggestions in making imputations, such as what variables to include in the imputation stage, how to keep imputations within logically possible ranges, etc.


next up previous contents home.gif
Next: Versions of Up: : A Program for Previous: Introduction   Contents
Gary King 2003-07-25