Gary King Homepage Previous: Estimation. Up: JudgeIt A Program for Next: The Logic of JudgeIt


Preparing Data

JudgeIt requires all your independent and dependent variables to be in one or more Gauss-format datasets. If your data are not in a Gauss dataset, you can create one from an ASCII dataset using the program ATOG (which stands for ASCII To Gauss), distributed with the JudgeIt program. (You can tell if you have data in an ASCII file by typing it to the screen from the DOS prompt (e.g. type filename.asc); if the file is readable, then it is probably in ASCII.) If your data are already in the format of some other statistical, database, or spreadsheet program, you might consider purchasing a program such as DBMS/Copy, which will convert datasets between Gauss format and numerous others. (DBMS/Copy is available from SPSS Inc., 444 Michigan Avenue, Chicago, Illinois 60611; 312-329-3410.)

The columns in your ASCII dataset must represent variables (for example, Democratic proportion of the two-party vote for the legislature). The rows must contain individual observations (usually election districts). All data must be numbers; letters and most special symbols are not permitted. You may have up to 2,000 variables and any number of observations in a single dataset, and as many datasets as desired for each JudgeIt run. The only limit on the amount of data you can use for each model estimated is the amount of RAM (or virtual RAM) in your computer system. Keeping track of your ASCII files is easier if they have a common extension, such as ``.ASC" (for example, congress.asc).

Your ASCII data file may be in one of two formats: freefield or fixedfield. For freefield ASCII data files, all numbers must have blanks, commas, or carriage returns between them. Zeros must be written out as 0 or 0.00, not left blank. Rows may be greater than 80 characters, and each observation may take up more than a single row in the data file (as long as each observation has the same number of elements). Make sure that you have a carriage return at the end of the last line. For fixedfield ASCII data files, every record must have a fixed length.The term ``record" refers to a row in the data file, consisting of a series of ``fields." Each field is set aside for a single variable value. Individual variables are identified by the column numbers they occupy. Missing values must appear as a period (``."), with blanks, commas, or carriage returns on either side, for freefield, and as blank or a period for fixedfield.If you prefer to use a missing value symbol other than a period for freefield data, you can set the new symbol using the MSYM command in your ATOG command file. For example, if you prefer the ampersand, add this line: MSYM &;. (During any procedure, JudgeIt will skip each observation with a missing value for one of the requested variables, a process called listwise deletion.)

ATOG will take your ASCII file and convert it to a pair of Gauss data files. If your ASCII file is congress.ASC, ATOG will create congress.DAT, with the data, and congress.DHT, with the ``header'' information, including variable names and locations (both in non-readable, binary format).

To use ATOG, you must create a command file, containing commands for ATOG to execute. This file must be an ASCII text file.To enter commands into a text file, you can use EDLIN, which comes with DOS 1-4, or EDIT which comes with DOS 5. WordPerfect program editor, XyWrite, or WordStar in nondocument mode all work fine. In general, you can use any word processing program which will save text in an ASCII file without special codes for underlining or formatting, etc. Your ATOG command file has only three commands. All end in semicolons, and case is not significant. The first two commands are the same for freefield and fixedfield data.

INPUT filename.asc;
specifies the name of your ASCII data file.
Example: input c:$ \backslash$congress.asc;

OUTPUT filename;
specifies the name of your Gauss output file (with no extension). To help keep track of files, we recommend that you use the same name as your input file (but without the ``.ASC'' extension). Example: output c:$ \backslash$congress;

The last required command in an ATOG command file is invar, which identifies the input variable names. Variable names may be up to eight characters and should include only letters and numbers, with the first character a letter.

For freefield ASCII files, the invar command is as follows:

INVAR variables;
where variables is a list of variables with blanks between them. Example: invar vote88 money88 incumbnt;

For fixedfield ASCII files, the invar command has a different format:

INVAR RECORD=reclen (format) variables (format) variables;
where reclen is the total record length in characters, including the final carriage return/line feed (two extra characters) if applicable. Each record must be of the same length. format is of the form (start,length.prec), where start is the starting position of the field in the record; 1 is the first position and the default. length is the length of the field in characters (the default is 8). prec, optional, is an integer indicating the number of places from the RIGHT edge of the field where a decimal point should be inserted automatically. variables is a list of variables with blanks between them. If several variables are listed after a format definition, each succeeding field will be assumed to start immediately after the preceding field and to be the same length. If an asterisk is used to specify the starting position, the current logical default will be assumed. An asterisk in the length position will select the current default for both length and prec. Any data in the record not defined is ignored. For example, suppose one line of your input ASCII data file has the seventeen characters 12345678.0.2345$ <$cr$ >$$ <$lf$ >$ and you use this invar statement: invar record=17 (2,1) age race (4,2) party (6,3) district (9,1) sex (10,6) demvote;. Your resulting variable names would have the following values for this record: age=2, race=3, party=45, district=678, sex=missing value, and demvote=0.2345.

An example of an entire command file for freefield input is,

input snct.asc;
output snct;
invar v1 v2 vote88 incum88;
and for fixedfield input is
input snct.asc;
output snct;
invar record=954 (22,4) drugs (52,5) sex money
                 (95,3) rock roll;

To run your command file and convert your ASCII data to Gauss format, just type the following from the DOS prompt: ATOG atogfile;, where atogfile is the name of your ATOG command filename. If it works, you will receive a second DOS prompt with no message; your two Gauss data files have been stored.

If an error shows up, here are a few likely culprits. With freefield data, you may not have the same number of data items in each row. With fixedfield data, you may have miscounted the number of columns in your dataset (or not counted the carriage return and linefeed as two extra columns).



Gary King 2006-01-07