I appreciate the editor’s invitation to reply to Freedman et al.’s (1998) review of "A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data" (Princeton University Press.) I welcome this scholarly critique and JASA’s decision to publish in this field. Ecological inference is a large and very important area for applications that is especially rich with open statistical questions. I hope this discussion stimulates much new scholarship. Freedman et al. raise several interesting issues, but also misrepresent or misunderstand the prior literature, my approach, and their own empirical analyses, and compound the problem, by refusing requests from me and the editor to make their data and software available for this note. Some clarification is thus in order.
In 1990, Budge and Hofferbert (B&H) claimed that they had found solid evidence that party platforms cause U.S. budgetary priorities, and thus concluded that mandate theory applies in the United States as strongly as it does elsewhere. The implications of this stunning conclusion would mean that virtually every observer of the American party system in this century has been wrong. King and Laver (1993) reanalyzed B&H’s data and demonstrated in two ways that there exists no evidence for a causal relationship. First, accepting their entire statistical model, and correcting only an algebraic error (a mistake in how they computed their standard errors), we showed that their hypothesized relationship holds up in fewer than half the tests they reported. Second, we showed that their statistical model includes a slightly hidden but politically implausible assumption that a new party achieves every budgetary desire immediately upon taking office. We then specified a model without this unrealistic assumption and we found that the assumption was not supported, and that all evidence in the data for platforms causing government budgets evaporated. In their published response to our article, B&H withdrew their key claim and said they were now (in 1993) merely interested in an association and not causation. That is how it was left in 1993—a perfectly amicable resolution as far as we were concerned—since we have no objection to the claim that there is a non-causal or chance association between any two variables. Of course, we see little reason to be interested in non-causal associations in this area any more than in the chance correlation that exists between the winner of the baseball World Series and the party winning the U.S. presidency. Since party mandate theory only makes sense as a causal theory, the conventional wisdom about America’s porous, non-mandate party system stands.
The directional and proximity models offer dramatically different theories for how voters make decisions and fundamentally divergent views of the supposed microfoundations on which vast bodies of literature in theoretical rational choice and empirical political behavior have been built. We demonstrate here that the empirical tests in the large and growing body of literature on this subject amount to theoretical debates about which statistical assumption is right. The key statistical assumptions have not been empirically tested and, indeed, turn out to be effectively untestable with exiting methods and data. Unfortunately, these assumptions are also crucial since changing them leads to different conclusions about voter processes.
We present a method of analyzing a series of independent cross-sectional surveys in which some questions are not answered in some surveys and some respondents do not answer some of the questions posed. The method is also applicable to a single survey in which different questions are asked or different sampling methods are used in different strata or clusters. Our method involves multiply imputing the missing items and questions by adding to existing methods of imputation designed for single surveys a hierarchical regression model that allows covariates at the individual and survey levels. Information from survey weights is exploited by including in the analysis the variables on which the weights are based, and then reweighting individual responses (observed and imputed) to estimate population quantities. We also develop diagnostics for checking the fit of the imputation model based on comparing imputed data to nonimputed data. We illustrate with the example that motivated this project: a study of pre-election public opinion polls in which not all the questions of interest are asked in all the surveys, so that it is infeasible to impute within each survey separately.
We propose a comprehensive statistical model for analyzing multiparty, district-level elections. This model, which provides a tool for comparative politics research analagous to that which regression analysis provides in the American two-party context, can be used to explain or predict how geographic distributions of electoral results depend upon economic conditions, neighborhood ethnic compositions, campaign spending, and other features of the election campaign or aggregate areas. We also provide new graphical representations for data exploration, model evaluation, and substantive interpretation. We illustrate the use of this model by attempting to resolve a controversy over the size of and trend in electoral advantage of incumbency in Britain. Contrary to previous analyses, all based on measures now known to be biased, we demonstrate that the advantage is small but meaningful, varies substantially across the parties, and is not growing. Finally, we show how to estimate the party from which each party’s advantage is predominantly drawn.
Researchers sometimes argue that statisticians have little to contribute when few realizations of the process being estimated are observed. We show that this argument is incorrect even in the extreme situation of estimating the probabilities of events so rare that they have never occurred. We show how statistical forecasting models allow us to use empirical data to improve inferences about the probabilities of these events. Our application is estimating the probability that your vote will be decisive in a U.S. presidential election, a problem that has been studied by political scientists for more than two decades. The exact value of this probability is of only minor interest, but the number has important implications for understanding the optimal allocation of campaign resources, whether states and voter groups receive their fair share of attention from prospective presidents, and how formal "rational choice" models of voter behavior might be able to explain why people vote at all. We show how the probability of a decisive vote can be estimated empirically from state-level forecasts of the presidential election and illustrate with the example of 1992. Based on generalizations of standard political science forecasting models, we estimate the (prospective) probability of a single vote being decisive as about 1 in 10 million for close national elections such as 1992, varying by about a factor of 10 among states. Our results support the argument that subjective probabilities of many types are best obtained through empirically based statistical prediction models rather than solely through mathematical reasoning. We discuss the implications of our findings for the types of decision analyses used in public choice studies.
A set of Gauss programs and datasets (annotated for pedagogical purposes) to implement many of the maximum likelihood-based models I discuss in Unifying Political Methodology: The Likelihood Theory of Statistical Inference, Ann Arbor: University of Michigan Press, 1998, and use in my class. All datasets are real, not simulated.
Andrew Gelman and Gary King. 1996. “Advantages of Conflictual Redistricting.” In Fixing the Boundary: Defining and Redefining Single-Member Electoral Districts, edited by Iain McLean and David Butler, Pp. 207–218. Aldershot, England: Dartmouth Publishing Company.Abstract
This article describes the results of an analysis we did of state legislative elections in the United States, where each state is required to redraw the boundaries of its state legislative districts every ten years. In the United States, redistrictings are sometimes controlled by the Democrats, sometimes by the Republicans, and sometimes by bipartisan committees, but never by neutral boundary commissions. Our goal was to study the consequences of redistricting and at the conclusion of this article, we discuss how our findings might be relevant to British elections.
We use an analogy with the normal distribution and linear regression to demonstrate the need for the Generalize Event Count (GEC) model. We then show how the GEC provides a unified framework within which to understand a diversity of distributions used to model event counts, and how to express the model in one simple equation. Finally, we address the points made by Christopher Achen, Timothy Amato, and John Londregan. Amato's and Londregan's arguments are consistent with ours and provide additional interesting information and explanations. Unfortunately, the foundation on which Achen built his paper turns out to be incorrect, rendering all his novel claims about the GEC false (or in some cases irrelevant).
Ecological inference, as traditionally defined, is the process of using aggregate (i.e., "ecological") data to infer discrete individual-level relationships of interest when individual-level data are not available. Existing methods of ecological inference generate very inaccurate conclusions about the empirical world- which thus gives rise to the ecological inference problem. Most scholars who analyze aggregate data routinely encounter some form of this problem. EI (by Gary King) and EzI (by Kenneth Benoit and Gary King) are freely available software that implement the statistical and graphical methods detailed in Gary King’s book A Solution to the Ecological Inference Problem. These methods make it possible to infer the attributes of individual behavior from aggregate data. EI works within the statistics program Gauss and will run on any computer hardware and operating system that runs Gauss (the Gauss module, CML, or constrained maximum likelihood- by Ronald J. Schoenberg- is also required). EzI is a menu-oriented stand-alone version of the program that runs under MS-DOS (and soon Windows 95, OS/2, and HP-UNIX). EI allows users to make ecological inferences as part of the powerful and open Gauss statistical environment. In contrast, EzI requires no additional software, and provides an attractive menu-based user interface for non-Gauss users, although it lacks the flexibility afforded by the Gauss version. Both programs presume that the user has read or is familiar with A Solution to the Ecological Inference Problem.
In this chapter, we study standards of racial fairness in legislative redistricting- a field that has been the subject of considerable legislation, jurisprudence, and advocacy, but very little serious academic scholarship. We attempt to elucidate how basic concepts about "color-blind" societies, and similar normative preferences, can generate specific practical standards for racial fairness in representation and redistricting. We also provide the normative and theoretical foundations on which concepts such as proportional representation rest, in order to give existing preferences of many in the literature a firmer analytical foundation.
This paper is an invited comment on a paper by John Agnew. I largely agree with Agnew’s comments and thus focus on remaining areas wehre an alternative perspective might be useful. My argument is that political geographers should not be so concerned with demonstrating that context matters. My reasoning is based on three arguments. First, in fact context rarely counts (Section 1) and, second, the most productive practical goal for political researchers should be to show that it does not count (Section 2). Finally, a disproportionate focus on ‘context counting’ can lead, and has led, to some seriosu problems in practical research situations, such as attempting to give theoretical answers to empirical questions (Section 3) and empirical answers to theoretical questions (Section 4).
We demonstrate that the expected value and variance commonly given for a well-known probability distribution are incorrect. We also provide corrected versions and report changes in a computer program to account for the known practical uses of this distribution.
Receiving five serious reviews in this symposium is gratifying and confirms our belief that research design should be a priority for our discipline. We are pleased that our five distinguished reviewers appear to agree with our unified approach to the logic of inference in the social sciences, and with our fundamental point: that good quantitative and good qualitative research designs are based fundamentally on the same logic of inference. The reviewers also raised virtually no objections to the main practical contribution of our book– our many specific procedures for avoiding bias, getting the most out of qualitative data, and making reliable inferences. However, the reviews make clear that although our book may be the latest word on research design in political science, it is surely not the last. We are taxed for failing to include important issues in our analysis and for dealing inadequately with some of what we included. Before responding to the reviewers’ more direct criticisms, let us explain what we emphasize in Designing Social Inquiry and how it relates to some of the points raised by the reviewers.
Before every presidential election, journalists, pollsters, and politicians commission dozens of public opinion polls. Although the primary function of these surveys is to forecast the election winners, they also generate a wealth of political data valuable even after the election. These preelection polls are useful because they are conducted with such frequency that they allow researchers to study change in estimates of voter opinion within very narrow time increments (Gelman and King 1993). Additionally, so many are conducted that the cumulative sample size of these polls is large enough to construct aggregate measures of public opinion within small demographic or geographical groupings (Wright, Erikson, and McIver 1985).
These advantages, however, are mitigated by the decentralized origin of the many preelection polls. The surveys are conducted by diverse private enterprises with procedures that differ significantly. Moreover, important methodological detail does not appear in the public record. Codebooks provided by the survey organizations are all incomplete; many are outdated and most are at least partly inaccurate. The most recent treatment in the academic literature, by Brady and Orren (1992), discusses the approach used by three companies but conceals their identities and omits most of the detail. ...
Political science is a community enterprise and the community of empirical political scientists need access to the body of data necessary to replicate existing studies to understand, evaluate, and especially build on this work. Unfortunately, the norms we have in place now do not encourage, or in some cases even permit, this aim. Following are suggestions that would facilitate replication and are easy to implement – by teachers, students, dissertation writers, graduate programs, authors, reviewers, funding agencies, and journal and book editors.