Amelia can impute time series (cross sectional) data with an
autoregressive distributed lag, ADL(1,1), model. To use this feature,
you must set the global _AMts to the column in the data
which indexes time, and you must define the global _AMlagvs
to be the set of variables for which a lag is to be included in the
imputation model. Finally you must set the global _Amtstep
to be equal to the difference in time between any two sequential
observations. If there is additionally a cross section to the time
series, the global _AMcs should be set to the location of
the cross sectional variable. Although the data do not have to be
sorted by cross-section, they do have to be sorted by time within each
cross section. If you want to include the time trend in the
imputation model, set the global _AMusets equal to 1,
otherwise this global will be 0 and the imputation will not use this
variable. An analogous global _AMusecs exists for including
the cross section variable in the imputation model. If the cross
sectional variable is included it will be declared as nominal and
treated as explained in Section 7.2.2. Users with a broad
cross section should understand the demands this places on the
imputation model as a
-category cross sectional variable will be
properly treated as
dummy variables by the imputation model
(i.e. a dataset with a cross section of forty countries would result
in adding 39 dummy variables to the imputation model).
For example consider the following fictional time series cross sectional dataset:
Post WWII Decade Country x1 x2 x3 x4
1 1960 England 1.6 3 7.2 5.8
1 1970 England 1.8 2 6.5 2.1
1 1980 Englannd 1.9 3 6.7 1.6
1 1960 Scotland 0.9 1 8.2 1.3
1 1970 Scotland 4 9.4 2.4
1 1980 Scotland 1.1 4 9.0 5.9
0 1930 Wales 0.8 1 7.0 1.5
0 1940 Wales 0.7 2 6.0
1 1960 Wales 1.9 1
1 1970 Wales 2.1 2 0.6
One possible implementation of the globals would be:
_AMcs=3;
_AMts=2;
_AMtstep=10;
_AMusecs=1;
_AMusets=0;
_AMlagvs={4,6};
Here the first two globals have correctly located the positions of the
time and cross section variables. Notice that in the dataset the
cross section variable is a text variable and not a numeric variable.
Either is fine, however, if a dataset has both a text identifier and a
numeric identifier, then both should not be included in the dataset
for imputation. Also, the user has identified the timestep as
since that is the number of units in the time scale between two
consecutive observations.
The dataset that will be used in the imputation model is represented below3. Dummies have been created for each of the cross sectional elements because this is a nominal variable and the user said to use this variable by setting _AMusecs=1. Lagged values of the appropriate variables defined by _AMlagvs have been added to the dataset. Notice that these carry not only the missing values in the original dataset, but also have some structurally missing values since the lagged values of the first observation in any series is necessarily missing. Also the time indicator variable, although necessary for construction of the lagged variables has been removed from the imputation model, because the user set _AMusets=0.
Post WWII Engl. Scot. Wales x1 x2 x3 x4 lagx1 lagx3
1 1 0 0 1.6 3 7.2 5.8
1 1 0 0 1.8 2 6.5 2.1 1.6 7.2
1 1 0 0 1.9 3 6.7 1.6 1.8 6.5
1 0 1 0 0.9 1 8.2 1.3
1 0 1 0 4 9.4 2.4 0.9 8.2
1 0 1 0 1.1 4 9.0 5.9 9.4
0 0 0 1 0.8 1 7.0 1.5
0 0 0 1 0.7 2 6.0 0.8 7.0
1 0 0 1 1.9 1
1 0 0 1 2.1 2 0.6 1.9
At the end of the imputation model when all missing values have been
imputed, the dataset returned to the user will be retransformed back
to the original form submitted by the user. That is the dataset will
look exactly like the dataset at the beginning of our example, where
countries are defined by their proper names, the time indicator in the
dataset and no lags exist, rather than looking like the dataset
immediately above.
always returns imputed datasets that are
in the same form as the original dataset passed to the program,
regardless of what transformations take place internally.
Some additional points to be aware of when using the time series routines: