Gary King Homepage Previous: Graphics: Up: DISTS Next: FREQ

Output:

The PLOT version of DIST produces a scatterplot graphing observed district vote on predicted district vote. A perfect model, which is of course not possible either in practice or in theory, would produce a plot of points along a 45-degree angle-that is, every predicted value would be equal to its observed value. This line will be drawn on all DIST plots as a standard for comparison. In practice, the vertical distance between predicted values and this hypothetical 45-degree angle represents the error in the model's guesses. For example, if we are using the Democratic proportion of the district vote as dependent variable, points below the line voted more Republican than predicted. The plot provides a quick, intuitive way to gauge whether a predictive model is subject to systematic error. For example, if points appear consistently below the line, we know that the model is given to overpredicting the Republican vote. Finally, the DIST plot is divided into four quadrants, which can be used to evaluate the actual number of incorrect predictions made by the model. Since all observations that fall into the first (top left) quadrant are districts predicted Republican that went Democratic, and all observations in the fourth (bottom right) quadrant are districts predicted Democratic that went Republican, we can see at a glance both the number of incorrect predictions and whether they favored one party or the other.

The printout that follows, which uses the same Michigan congressional district data as is used in Section 9, is one example of the LIST version of a DIST command used for Evaluation:

District-level Analyses
YVOTE:  uc86  Year:  1986   Lambda=0.2199   Sigma=0.0545   Sims=100  N=18
YVOTE2: uc88
XVARS:  const inc86 pr84 us84 ag86 go86
XVARS2: inc88 unc!uc88
DISTS:  cd

            Observed  Expected  Standard
  Number     Vote      Vote      Error     Pr(Vote>.5)
   1.0000    0.9006    0.9038    0.0613    1.0000
   2.0000    0.4095    0.3799    0.0561    0.0161
   3.0000    0.6037    0.6285    0.0575    0.9873
   4.0000    0.3718    0.3903    0.0624    0.0394
   5.0000    0.2876    0.2585    0.0590    0.0000
   6.0000    0.5667    0.6401    0.0556    0.9942
   7.0000    0.8029    0.7685    0.0567    1.0000
   8.0000    0.7264    0.7061    0.0572    0.9998
   9.0000    0.3558    0.3474    0.0577    0.0041
  10.0000    0.4874    0.4411    0.0583    0.1559
  11.0000    0.3660    0.4539    0.0602    0.2218
  12.0000    0.6635    0.6526    0.0572    0.9962
  13.0000    0.8605    0.8639    0.0610    1.0000
  14.0000    0.7318    0.7371    0.0573    1.0000
  15.0000    0.7566    0.7590    0.0591    1.0000
  16.0000    0.7782    0.7547    0.0574    1.0000
  17.0000    0.7728    0.7401    0.0579    1.0000
  18.0000    0.2623    0.2786    0.0615    0.0002

E(Average District Vote) =    0.5947
Average Standard Error   =    0.0586

The text at the top of the printout merely reports information specified as input: independent and dependent variables, the number of observations (N), the number of simulations (SIMS), the values of LAMBDA and SIGMA as determined by preliminary analysis. You should be careful in interpreting the information at the bottom of the printout, the Expected Average District Vote. Sometimes, as in this case, the value there will contain information--the actual district vote predicted by the model. On the other hand, if the DIST command sets either VBAR or DELTA, then the value reported here either will be the number input for VBAR or the expected vote that would be required to produce the chosen DELTA. In such a case, the value reported here does not contain information; it only is reporting something the user has provided.

The first column of output merely lists the district numbers (in this case specified by variable cd), which the user provides in calling the command. The second column lists the actual values of the dependent variable (YVOTE). The third column itemizes the specific district-by-district vote predictions made by the model, thereby allowing you to evaluate the predictive model with precision. The list can be useful in identifying particular factors that might explain errors. For example, let's say three lawmakers were caught accepting bribes before the general election you are evaluating. Checking this list will allow you to see if these three legislators are responsible for some of the prediction error. The fourth column reports the standard error for the predictions. Finally, the fifth column reports the probability that, in this case, a Democrat will win in the district. Numbers near zero or one are not very competitive districts, strongly likely to elect a Republican or a Democrat, respectively. As can be seen from the output above, the congressional districts in Michigan that were used to generate this output are not very competitive at all.

The output produced by the DIST command is somewhat different for prediction, as can be seen by the example below:

District-level Analyses
YVOTE:  Prediction  Year: 1988   Lambda=0.2199   Sigma=0.0545   Sims=100  N=18
YVOTE2: uc88
XNEW:   const inc88 pr88 us88 ag86 go86
DISTS:  cd

 District    Next      Predicted   Standard
  Number     Vote      Vote      Error     Pr(Vote>.5)
   1.0000    0.9209    0.8942    0.0693    1.0000
   2.0000    0.4501    0.3440    0.0794    0.0247
   3.0000    0.5734    0.6039    0.0845    0.8905
   4.0000    0.2915    0.2620    0.0660    0.0002
   5.0000    0.2740    0.2313    0.0779    0.0003
   6.0000    0.5979    0.6301    0.0821    0.9435
   7.0000    0.7622    0.7160    0.0966    0.9873
   8.0000    0.7208    0.6521    0.1122    0.9124
   9.0000    0.3054    0.3143    0.0702    0.0041
  10.0000    0.2663    0.3884    0.0896    0.1065
  11.0000    0.4013    0.4377    0.1093    0.2843
  12.0000    0.5410    0.6111    0.1114    0.8405
  13.0000    0.8832    0.8596    0.0684    1.0000
  14.0000    0.6329    0.7033    0.0851    0.9916
  15.0000    0.6474    0.7185    0.0832    0.9957
  16.0000    1.0000    0.7051    0.0911    0.9878
  17.0000    0.7105    0.7065    0.0831    0.9935
  18.0000    0.2276    0.2572    0.0922    0.0042

E(Average District Vote) =    0.5575
Average Standard Error   =    0.0873

The main difference is in the second and third columns. The second column, now called ``Next Vote," prints the variable found in YVOTE2, if any. The third column prints the vote predictions determined using the explanatory variables given by XNEW rather than XVARS.



Gary King 2006-01-07