I appreciate the editor’s invitation to reply to Freedman et al.’s (1998) review of "A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data" (Princeton University Press.) I welcome this scholarly critique and JASA’s decision to publish in this field. Ecological inference is a large and very important area for applications that is especially rich with open statistical questions. I hope this discussion stimulates much new scholarship. Freedman et al. raise several interesting issues, but also misrepresent or misunderstand the prior literature, my approach, and their own empirical analyses, and compound the problem, by refusing requests from me and the editor to make their data and software available for this note. Some clarification is thus in order.
In 1990, Budge and Hofferbert (B&H) claimed that they had found solid evidence that party platforms cause U.S. budgetary priorities, and thus concluded that mandate theory applies in the United States as strongly as it does elsewhere. The implications of this stunning conclusion would mean that virtually every observer of the American party system in this century has been wrong. King and Laver (1993) reanalyzed B&H’s data and demonstrated in two ways that there exists no evidence for a causal relationship. First, accepting their entire statistical model, and correcting only an algebraic error (a mistake in how they computed their standard errors), we showed that their hypothesized relationship holds up in fewer than half the tests they reported. Second, we showed that their statistical model includes a slightly hidden but politically implausible assumption that a new party achieves every budgetary desire immediately upon taking office. We then specified a model without this unrealistic assumption and we found that the assumption was not supported, and that all evidence in the data for platforms causing government budgets evaporated. In their published response to our article, B&H withdrew their key claim and said they were now (in 1993) merely interested in an association and not causation. That is how it was left in 1993—a perfectly amicable resolution as far as we were concerned—since we have no objection to the claim that there is a non-causal or chance association between any two variables. Of course, we see little reason to be interested in non-causal associations in this area any more than in the chance correlation that exists between the winner of the baseball World Series and the party winning the U.S. presidency. Since party mandate theory only makes sense as a causal theory, the conventional wisdom about America’s porous, non-mandate party system stands.
The directional and proximity models offer dramatically different theories for how voters make decisions and fundamentally divergent views of the supposed microfoundations on which vast bodies of literature in theoretical rational choice and empirical political behavior have been built. We demonstrate here that the empirical tests in the large and growing body of literature on this subject amount to theoretical debates about which statistical assumption is right. The key statistical assumptions have not been empirically tested and, indeed, turn out to be effectively untestable with exiting methods and data. Unfortunately, these assumptions are also crucial since changing them leads to different conclusions about voter processes.
We propose a comprehensive statistical model for analyzing multiparty, district-level elections. This model, which provides a tool for comparative politics research analagous to that which regression analysis provides in the American two-party context, can be used to explain or predict how geographic distributions of electoral results depend upon economic conditions, neighborhood ethnic compositions, campaign spending, and other features of the election campaign or aggregate areas. We also provide new graphical representations for data exploration, model evaluation, and substantive interpretation. We illustrate the use of this model by attempting to resolve a controversy over the size of and trend in electoral advantage of incumbency in Britain. Contrary to previous analyses, all based on measures now known to be biased, we demonstrate that the advantage is small but meaningful, varies substantially across the parties, and is not growing. Finally, we show how to estimate the party from which each party’s advantage is predominantly drawn.
The authors develop binomial-beta hierarchical models for ecological inference using insights from the literature on hierarchical models based on Markov chain Monte Carlo algorithms and King’s ecological inference model. The new approach reveals some features of the data that King’s approach does not, can easily be generalized to more complicated problems such as general R x C tables, allows the data analyst to adjust for covariates, and provides a formal evaluation of the significance of the covariates. It may also be better suited to cases in which the observed aggregate cells are estimated from very few observations or have some forms of measurement error. This article also provides an example of a hierarchical model in which the statistical idea of "borrowing strength" is used not merely to increase the efficiency of the estimates but to enable the data analyst to obtain estimates.
We present a method of analyzing a series of independent cross-sectional surveys in which some questions are not answered in some surveys and some respondents do not answer some of the questions posed. The method is also applicable to a single survey in which different questions are asked or different sampling methods are used in different strata or clusters. Our method involves multiply imputing the missing items and questions by adding to existing methods of imputation designed for single surveys a hierarchical regression model that allows covariates at the individual and survey levels. Information from survey weights is exploited by including in the analysis the variables on which the weights are based, and then reweighting individual responses (observed and imputed) to estimate population quantities. We also develop diagnostics for checking the fit of the imputation model based on comparing imputed data to nonimputed data. We illustrate with the example that motivated this project: a study of pre-election public opinion polls in which not all the questions of interest are asked in all the surveys, so that it is infeasible to impute within each survey separately.