Gary King Homepage Previous: A Brief Outline of Up: A Brief Outline of Next: Estimation.

The Model.

JudgeIt is based on a random components regression model:
\begin{displaymath}
v_i = X_i\beta + \gamma_i + \epsilon_i
\end{displaymath} (2)

where $ v_i$ is (for example) the Democratic proportion of the two-party vote for district $ i$ ($ i=1,\ldots,n$ legislative districts), $ X_i$ is a set of explanatory variables (such as vote in the last election, incumbency status, partisan control, campaign spending, etc.) and $ \beta$ is a vector of regression coefficients, such that $ X_i\beta=\beta_0+\beta_1X_{1i}+\beta_2X_{2i}+\cdots$. The parameter $ \gamma_i$ represents the part of the district vote that is not explained by $ X_i$ but is still a systematic feature of the electoral system and therefore persists over time. For each $ i$, the error terms have independent normal distributions, $ \gamma_i$ with mean zero and variance $ \sigma_{\gamma}^2$ and $ \epsilon_i$ with mean zero and variance $ \sigma_{\epsilon}^2$. We also define $ \sigma^2=\sigma^2_\gamma+\sigma^2_\epsilon$ and $ \lambda=\sigma^2_\gamma/\sigma^2$.

We define hypothetical election results as the set of all possible election outcomes that could have occurred if all political conditions up to the start of the campaign were held constant and the campaign were run again. The vector $v^{\hyp}$ of hypothetical vote proportions is determined by an analogous probability model:

\begin{displaymath}
v^{\hyp} = X^{\hyp}\beta + \delta^{\hyp} + \gamma + \epsilon^{\hyp},
\end{displaymath} (3)

where $\epsilon^{\hyp}$ is a new vector of $ n$ independent error terms with variance $ \sigma^2_{\epsilon}$, and $\delta^{\hyp}$ is a known constant used to model statewide partisan swing.The parameter $\delta^{\hyp}$ in this model allows us to easily vary the average district vote in a hypothetical (or predicted) election, without affecting the relative positions of the districts. This partitioning reflects the common result that it is often quite easy to predict which districts will vote more Republican than others, but it is harder to forecast exactly what the average vote would be across districts.

The hypothetical outcome, $v^{\hyp}$, differs from the actual $ v$ in three ways:

  1. The matrix, $ X$, of explanatory variables is replaced by $X^{\hyp}$, to recognize that we may wish to specify different conditions under which the hypothetical election may be run (such as no incumbents running).

  2. A constant, $\delta^{\hyp}$, is added, to allow a statewide partisan swing to be specified. One can specify either $\delta^{\hyp}$ or a corresponding value for the expected average district vote, $E(\overline{v}^{\hyp})$, since $\delta^{\hyp}=E(\overline{v}^{\hyp})-\frac{1}{n}\sum_{i=1}^nX^{\hyp}_i\beta$.

  3. The new error term, $\epsilon^{\hyp}$, models the fact that, even if the variables in $ X$ were unchanged, we would not expect $v^{\hyp}$ to be identical to $ v$. Across many hypothetical elections, $ \gamma$ remains unchanged, while $ \epsilon$ varies.
The stochastic model is interpreted slightly differently for prediction and evaluation: for prediction, we ask how many seats will the Democrats win with an average of 45% of the votes (say), and in evaluation we ask, how many seats would they have won if essentially the same election campaign had been run again. The only difference between evaluation and prediction is that we observe one of the possible hypothetical election outcomes for the former and do not observe any for the latter.

The parameters of this model to be estimated--$ \sigma^2$, $ \lambda$, and $ \beta$--are not usually of primary interest in evaluating electoral systems and redistricting plans (although $ \beta$ is in some cases of interest in estimating causal effects). Instead, we define all the quantities of interest, including the seats-votes curve, district vote predictions, etc., in terms of the posterior distribution of hypothetical election outcomes $v^{\hyp}$, given the average district vote $ \overline{v}$ or the actual election outcomes $ v$ when available (which, in turn, depend on the parameters). From this, we can easily calculate estimates and standard errors of any quantity of interest (such as those listed in our summary of prior research).

t:hypcccccccModel Structure & & Actual & Hypothetical Replications


Gary King 2006-01-07