next up previous contents home.gif
Next: Fully-Observed Covariates Up: Modules for Special Data Previous: Nominal   Contents

Time Series, or Time Series Cross Sectional Data

Amelia can impute time series (cross sectional) data with an autoregressive distributed lag, ADL(1,1), model. To use this feature, you must set the global _AMts to the column in the data which indexes time, and you must define the global _AMlagvs to be the set of variables for which a lag is to be included in the imputation model. Finally you must set the global _Amtstep to be equal to the difference in time between any two sequential observations. If there is additionally a cross section to the time series, the global _AMcs should be set to the location of the cross sectional variable. Although the data do not have to be sorted by cross-section, they do have to be sorted by time within each cross section. If you want to include the time trend in the imputation model, set the global _AMusets equal to 1, otherwise this global will be 0 and the imputation will not use this variable. An analogous global _AMusecs exists for including the cross section variable in the imputation model. If the cross sectional variable is included it will be declared as nominal and treated as explained in Section 7.2.2. Users with a broad cross section should understand the demands this places on the imputation model as a $ p$-category cross sectional variable will be properly treated as $ p-1$ dummy variables by the imputation model (i.e. a dataset with a cross section of forty countries would result in adding 39 dummy variables to the imputation model).

For example consider the following fictional time series cross sectional dataset:

Post WWII Decade Country      x1   x2   x3   x4
      1   1960   England     1.6   3   7.2  5.8
      1   1970   England     1.8   2   6.5  2.1
      1   1980   Englannd    1.9   3   6.7  1.6
      1   1960   Scotland    0.9   1   8.2  1.3
      1   1970   Scotland          4   9.4  2.4
      1   1980   Scotland    1.1   4   9.0  5.9
      0   1930   Wales       0.8   1   7.0  1.5
      0   1940   Wales       0.7   2   6.0   
      1   1960   Wales       1.9   1         
      1   1970   Wales       2.1   2        0.6

One possible implementation of the globals would be:

_AMcs=3;
_AMts=2;
_AMtstep=10;
_AMusecs=1;
_AMusets=0;
_AMlagvs={4,6};

Here the first two globals have correctly located the positions of the time and cross section variables. Notice that in the dataset the cross section variable is a text variable and not a numeric variable. Either is fine, however, if a dataset has both a text identifier and a numeric identifier, then both should not be included in the dataset for imputation. Also, the user has identified the timestep as $ 10$ since that is the number of units in the time scale between two consecutive observations.

The dataset that will be used in the imputation model is represented below3. Dummies have been created for each of the cross sectional elements because this is a nominal variable and the user said to use this variable by setting _AMusecs=1. Lagged values of the appropriate variables defined by _AMlagvs have been added to the dataset. Notice that these carry not only the missing values in the original dataset, but also have some structurally missing values since the lagged values of the first observation in any series is necessarily missing. Also the time indicator variable, although necessary for construction of the lagged variables has been removed from the imputation model, because the user set _AMusets=0.

Post WWII Engl. Scot. Wales   x1   x2   x3   x4  lagx1  lagx3
      1     1     0     0    1.6   3   7.2  5.8    
      1     1     0     0    1.8   2   6.5  2.1   1.6    7.2
      1     1     0     0    1.9   3   6.7  1.6   1.8    6.5
      1     0     1     0    0.9   1   8.2  1.3    
      1     0     1     0          4   9.4  2.4   0.9    8.2
      1     0     1     0    1.1   4   9.0  5.9          9.4
      0     0     0     1    0.8   1   7.0  1.5   
      0     0     0     1    0.7   2   6.0        0.8    7.0
      1     0     0     1    1.9   1                  
      1     0     0     1    2.1   2        0.6   1.9

At the end of the imputation model when all missing values have been imputed, the dataset returned to the user will be retransformed back to the original form submitted by the user. That is the dataset will look exactly like the dataset at the beginning of our example, where countries are defined by their proper names, the time indicator in the dataset and no lags exist, rather than looking like the dataset immediately above. $ {\mathfrak{A}melia}$ always returns imputed datasets that are in the same form as the original dataset passed to the program, regardless of what transformations take place internally.

Some additional points to be aware of when using the time series routines:

  1. You cannot presently create lags of variables declared as ordinal or nominal (although you can declare some variables as ordinal or nominal in a time series dataset, you just can not use lags of them).
  2. You can not presently implement any prior distribution other than the ridge prior if you use these globals.
  3. You can have gaps in your time series. For example, you can have data in the sixties and seventies and then the nineties, and the gaps need not be the same for every cross sectional unit. The completely missing observations will not be imputed, but partially observed observations will of course (that is, in this example, the sixties seventies and nineties would have missing values imputed, but the eighties, which are completely missing in the original dataset, would not be imputed from scratch).


next up previous contents home.gif
Next: Fully-Observed Covariates Up: Modules for Special Data Previous: Nominal   Contents
Gary King 2003-07-25