Sometimes imputations from the normal model are not appropriate to the user's analysis or fail known bounds or identities of the data. For the special case of ordinal and nominal variables, there is an internal module to produce valid imputations from the normal model (see Section 7.2).
Two other simple approaches are either censoring or truncating the
posterior density. Consider a variable
with values known to be
between zero and one. Normally, we recommend taking transformations
(such as
) before Amelia and then untransforming
afterwards (with
), but in some cases (such as when
sometimes takes on the exact values of 0 or 1), it is better to
put the untransformed
directly into Amelia. Of course, in this
situation, a multivariate normal imputation model will sometimes
impute a missing value below zero or above one, which is logically
impossible. If there are only a few out of bounds predictions, a
reasonable strategy is to round the imputations below zero up to zero
and likewise those above one down to one. In other situations with
more out of bounds imputations, it is better to create a different set
of imputations altogether by discarding imputations that do not
satisfy the constraint (fall within the unit interval) and impute
again until a value between zero and one is reached.
implements a rejection sampler that allows constraints to
be implemented at the end of the simulation process in this way.
The AMrs routine in Amelia, together with a user written
constraint checking procedure, implements the rejection sampler. The
call to AMrs is made after the call to
,
as in the following example.
library amelia;
dataset=loadd("ukpoll");
dbuf=amelia(dataset);
call amrs(dbuf,dataset,&myconstraint);
The argument myconstraint is the name of a user written
procedure, myconstraint.g, that you must place in the current
directory. This procedure identifies which observations in some data
set meet the user-defined constraint. This procedure should accept as
arguments the data buffer, the original data set, and the imputed data
set. It then returns a column vector, flag, of ones and
zeros with length equal to the number of observations in the dataset.
Each value in this vector flag
corresponds to observation
in the imputed dataset. A one should be assigned if the
observation fails the constraint and a zero if the observation meets
the constraint. For example, the following constraint checks that
every observation of the second variable is not greater than ten.
proc myconstraint(dbuf,xorig,ximp); local flag; flag=ximp[.,2].>10; retp(flag); endp;
Not all constraints require the original data or the information in
the buffer. The above example uses neither but only refers to values
in the imputed dataset. However, they are provided as they are often
useful. The rejection sampler will cycle through each imputed dataset
and make sure that every observation passes the user provided
constraint. If an observation in a dataset fails the constraint, that
observation will be reimputed using the sufficient statistics
used to originally impute that particular
dataset. Then the dataset is checked again and this whole cycle is
iterated until all observations pass the user provided constraint.
The procedure AMrs can only be called once in a program, since in later constraints imputations may be created that would fail an earlier constraint. Multiple constraints, or constraints on multiple variables should be written by running a series of constraints within the same procedure and combining results, as in the next example which checks that the fifth variable is less than the fourth and the ninth variable is non-negative:
proc myconstraint(dbuf,xorig,ximp); local flag1, flag2, allflags, flag; flag1=ximp[.,5].>ximp[.,4]; flag2=ximp[.,9].<=0; allflags= flag1+flag2; flag=allflags.>0; retp(flag); endp;
Screen output is provided if _AMprt=1 (default) allows printing to the screen. Example screen output looks like:
* Running Rejection Sampler * Rejected 0 >> Dataset 1 meets constraint. Rejected 11 , 3 , 0 >> Dataset 3 meets constraint. Rejected 1 , 0 >> Dataset 2 meets constraint. Rejected 0 >> Dataset 4 meets constraint. Rejected 5 , 1 , 1 , 1 , 1 , 1 , 0 >> Dataset 5 meets constraint.
Each line corresponds to one imputed dataset from
being tested. In the above example, five datasets were imputed by
(as governed by _AMnds). The second
dataset initially failed the user's constraint on eleven observations.
After these eleven were reimputed, three continued to fail the
constraint and were again reimputed in the next round, after which all
observations passed the constraint. Since only failing observations
are reimputed the number of failing observations is equal or
decreasing in each round. In the fifth dataset we see an observation
where very little of the probability density in the unconstrained
model is within the bounds of the constraint and thus takes six draws
to pass.
There is an upper limit to the number of times the rejection sampler will resample in a dataset, set with the global _AMrs; its default is 20. After the maximum has been reached a message will appear like the one below, and the observations which continue to fail the user constraint are identified so the user can check them in the dataset. These may indicate a problem with the model not fitting the data, an outlier in the data, or a bug in the the constraint procedure.
Maximium number of 20 iterations reached. In dataset 8 the following observations currently fail user constraint: 29 118
Since only the missing data are being reimputed, the user should take care that none of the observed data fails the constraint as then that particular observation will never pass.