Next: Modeling -distributed Data
Up: Modules for Special Data
Previous: Identification Variables
  Contents
This section describes the approach by Honaker, Katz, and King
(2000). In ``A Statistical Model for
Multiparty Electoral Systems,''
Katz and
King (2002) developed a model for analyzing multiparty electoral data.
In general, multiparty electoral data are a special case of
compositional data, where the set of variables fall on the
simplex, which means that each vote proportion falls between 0 and 1,
and the set of votes in a district sum to 1,
Following Aitchison (1986), KK induce constraints by modeling the
log ratios of the vote variables
, for
. The advantage of this approach is that the set of
variables are individually and collectively unconstrained.
KK depart from Aitchison's approach (of modeling the
's via a
multivariate normal) in two important ways. First, they use a
multivariate
distribution to model the log-ratios. They showed
that this model, which becomes the additive logistic
on the scale
of the
's, fits the data far better than the normal. Second, they
added a component of the model to cope with partially or uncontested
district elections. They set the goal of the analysis as predicting
or explaining the effective vote, the values of
we
would observe if all parties were contesting all
districts.
Implementing this procedure required a special-purpose computer
program.
We have modified Amelia to implement the easier approach of
Honaker, Katz, and King (2000)
who show
that the recovery of the effective vote can be treated as a missing
data problem. To implement the approach in that paper:
- The variables containing the vote share data need to be
identified. The vote share variables should be the leading
variables in the dataset. That is, if there are
parties, the
first
variables should be the vote shares of these parties. The
global _AMkknp should be set equal to the number of
parties. These variables will be transformed into the
log
vote shares for the purpose of the imputation model, but will be
transformed back to the original vote shares in the imputed
datasets. Also each pattern of contestation will result in new
variables being added to the imputation model to avoid imposing
assumptions of independence of irrelevant alternatives. Thus vote
shares when all parties contest are treated as different variables
then the vote shares when the first party does not contest, and are
different still from any other pattern. In general there are
vote share variables, where
is the
number of different patterns that exist in the dataset where
parties contest some district. For large numbers of patterns of
contestation and thus large numbers of variables in the imputation
model, the typical usefulness of the prior (see section
7.1) and increased time to completion (see section
9) apply.
- Parties that did not run should have their vote share coded as a
zero. Parties that did run, but whose vote share is unknown or
missing may be coded with a missing value.
- Currently in Amelia, one of the
parties must contest every
district. While this is not theoretically neccessary for the model,
it is an artifact of the present code. Set _AMkkpfo equal
to the position of the party which is fully observed. In almost all
countries there is a national party that contests all districts, and
so this is not a technical problem. However, if you have research
where this poses a problem, feel free to contact one of the Amelia
authors as we are working on making this more general, or check that
you have the most recent copy of Amelia.
- To use the multivariate
distribution instead of the
multivariate normal, as KK suggest, set the global
_AMemt=1. This implements the
distributed ECME
algorithm used by HKK (the default _AMemt=0 uses the
multivarate normal EM algorithm). See Section 7.7.
- Appropriate constraints on the effective vote can be
implemented. In the original model KK impose the constraint that
``the noncontesting parties would have received fewer votes than the
parties which did nominate candidates''. Appropriate constraints
derived from substantive knowledge can be imposed by the analyst
using the rejection sampler. See section 7.8.
Next: Modeling -distributed Data
Up: Modules for Special Data
Previous: Identification Variables
  Contents
Gary King
2003-07-25