next up previous contents home.gif
Next: Modeling -distributed Data Up: Modules for Special Data Previous: Identification Variables   Contents

Compositional Data, such as Multiparty Votes (Gauss version only)

This section describes the approach by Honaker, Katz, and King (2000). In ``A Statistical Model for Multiparty Electoral Systems,'' Katz and King (2002) developed a model for analyzing multiparty electoral data. In general, multiparty electoral data are a special case of compositional data, where the set of variables fall on the simplex, which means that each vote proportion falls between 0 and 1,

$\displaystyle V_{ij} \in [0,1]$   for all $ i$ and $ j$ (3)

and the set of votes in a district sum to 1,

$\displaystyle \sum_{j=1}^J V_{ij} = 1$   for all $ i$$\displaystyle .$ (4)

Following Aitchison (1986), KK induce constraints by modeling the $ J-1$ log ratios of the vote variables $ Y_{ij}=ln(V_{ij}/V_{iJ})$, for $ j=1,\dots,J-1$. The advantage of this approach is that the set of $ Y_{ij}$ variables are individually and collectively unconstrained.

KK depart from Aitchison's approach (of modeling the $ Y$'s via a multivariate normal) in two important ways. First, they use a multivariate $ t$ distribution to model the log-ratios. They showed that this model, which becomes the additive logistic $ t$ on the scale of the $ V$'s, fits the data far better than the normal. Second, they added a component of the model to cope with partially or uncontested district elections. They set the goal of the analysis as predicting or explaining the effective vote, the values of $ V_{ij}$ we would observe if all parties were contesting all $ J$ districts. Implementing this procedure required a special-purpose computer program.

We have modified Amelia to implement the easier approach of Honaker, Katz, and King (2000) who show that the recovery of the effective vote can be treated as a missing data problem. To implement the approach in that paper:

  1. The variables containing the vote share data need to be identified. The vote share variables should be the leading variables in the dataset. That is, if there are $ J$ parties, the first $ J$ variables should be the vote shares of these parties. The global _AMkknp should be set equal to the number of parties. These variables will be transformed into the $ J-1$ log vote shares for the purpose of the imputation model, but will be transformed back to the original vote shares in the imputed datasets. Also each pattern of contestation will result in new variables being added to the imputation model to avoid imposing assumptions of independence of irrelevant alternatives. Thus vote shares when all parties contest are treated as different variables then the vote shares when the first party does not contest, and are different still from any other pattern. In general there are $ \sum_{k=1}^J n_k (k-1)$ vote share variables, where $ n_k$ is the number of different patterns that exist in the dataset where $ k$ parties contest some district. For large numbers of patterns of contestation and thus large numbers of variables in the imputation model, the typical usefulness of the prior (see section 7.1) and increased time to completion (see section 9) apply.

  2. Parties that did not run should have their vote share coded as a zero. Parties that did run, but whose vote share is unknown or missing may be coded with a missing value.

  3. Currently in Amelia, one of the $ J$ parties must contest every district. While this is not theoretically neccessary for the model, it is an artifact of the present code. Set _AMkkpfo equal to the position of the party which is fully observed. In almost all countries there is a national party that contests all districts, and so this is not a technical problem. However, if you have research where this poses a problem, feel free to contact one of the Amelia authors as we are working on making this more general, or check that you have the most recent copy of Amelia.

  4. To use the multivariate $ t$ distribution instead of the multivariate normal, as KK suggest, set the global _AMemt=1. This implements the $ t$ distributed ECME algorithm used by HKK (the default _AMemt=0 uses the multivarate normal EM algorithm). See Section 7.7.

  5. Appropriate constraints on the effective vote can be implemented. In the original model KK impose the constraint that ``the noncontesting parties would have received fewer votes than the parties which did nominate candidates''. Appropriate constraints derived from substantive knowledge can be imposed by the analyst using the rejection sampler. See section 7.8.


next up previous contents home.gif
Next: Modeling -distributed Data Up: Modules for Special Data Previous: Identification Variables   Contents
Gary King 2003-07-25