Gary King Homepage Previous: Inputs Up: Inputs Next: Smoothing over Age Groups

Model and Data

formula
A standard R formula of the form y $ \sim$ x1 + x2, except that an explanatory variable is included for a particular cross-section only if it is both listed in the formula and available in that cross-section's data set (see dataobj). Explanatory variables in the formula but not available for a cross-section (or in a cross-sectional dataset but not in the formula) are excluded. (For mortality forecasting, the specification looks like log(deaths/population)$ \sim$ x1 + x2, with deaths and population stored as separate variables in each dataframe.) (May be set to NULL if savetmp was set to TRUE on the last run, in which case the value of formula will come from the saved file.)

model
A string indicating the forecasting method, including: Bayes maximum a posteriori (``MAP''), Bayes with Gibbs sampling (``Bayes''), Ordinary Least Squares (``OLS''), Poisson (``POISSON''), and Lee-Carter (``LC''). Default: ``OLS''. (We usually recommend MAP.)

YourCast also includes a procedure to help users set the sigma parameters below automatically for the case of model=MAP, and smoothing over age, time, or age and time, but for only one country. You may do this by running a preprocessing instance of YourCast first by setting this parameter to ``EBAYES'' and using either the data to be analyzed or a larger data set which is likely to have similar or related parameter values. When EBAYES is chosen, the YourCast output object will contain only the parameter values to feed into the next run of YourCast.

dataobj
Four types of inputs are allowed. If (1) dataobj is an object (in working memory) or (2) a string with the name of a file on disk, the object must contain a list with the following items (the first two of which are required):
index.code
A string indicating how the index variable is coded in the input data. Between 0 and 4 of each of the following characters are used in order: - to ignore a character, G for the geographic index (such as country), A for a grouped continuous variable like an age group, and T for a time period. For example, -GGGGAATTTT means parse 920004172005 by ignoring 92, using 0004 as the country code, 17 as the age group, and 2005 as the year. Default: -GGGGAATTTT.
data
A list of dataframes, one for each cross-sectional unit, with names corresponding to the geographic areas and age group cross-sectional indices GGGGAA, and rows labeled according to time periods TTTT (using notation in index.code, so one item in the list might be labeled 432101 for geographic area 4321 and age group 01). Columns must include at least one variable common to all dataframes, to be used as a dependent variable, and a possibly different set of explanatory variables in each cross-sectional unit.
G.names, A.names, T.names
Optional two-column dataframes that list all valid numerical codes (in the first column, labeled code) and corresponding alphanumeric names (optionally in the second column, labeled name) for the indices corresponding to the geographic areas in G.names, age groups A.names, and time periods T.names. The alphanumeric names are most commonly only used for geographic areas, since numerical values for age groups and time periods are usually meaingful on their own.
proximity
Includes codes to construct the symmetric matrix (geographic region by geographic region) of proximity scores for geographic smoothing used by methods MAP and Bayes. The larger each element of the matrix, the more proximate that pair of countries is in the prior; a zero element means the two geographic areas are unrelated (the diagonal is ignored). This symmetric matrix is constructed by yourcast() from the proximity object in dataobj. Each row of proximity has three entries, consisting of geographic codes for two countries (corresponding to the row and column, and column and row, of the symmetric matrix to be built) and a score indicating how proximate or similar are the two geographic regions; thus, each row represents one element of the symmetric matrix. For convenience, geographic regions that are unrelated (and have zero entries in the symmetric matrix) may be omitted from proximity. In addition, proximity may include rows corresponding to geographic regions not in the present analysis.
(3) If dataobj is a string referring to a directory on disk, then each element of the list above should be stored in a file in that directory, with element data consisting of a subdirectory, containing separate ASCII data files. (If this option is chosen, a complete data object, called dataobj.Rdata, will be stored in the directory named, and it will be loaded automatically if YourCast is run again with this chosen option.) (4) The last option is for dataobj to be set to NULL, which an be useful if savetmp was set to TRUE on the last run, in which case the value of dataobj will come from the saved file.

sample.frame
A 4 element vector containing, in order, the start and end time periods to be used for the observed data and the start and end time periods to be forecast. Years identified here that are not available for a cross-section are ignored. Default: c(1950,2000,2001,2020). (Note that this makes it easy to reserve a range of values of the dependent variable for out-of-sample forecasting evaluation; our summary() and plot() functions will make these comparisons automatically if the out-of-sample data are included.)



Gary King 2009-07-13