This article describes WhatIf: Software for Evaluating Counterfactuals, an R package that implements the methods for evaluating counterfactuals introduced in King and Zeng (2006a) and King and Zeng (2006b). It offers easy-to-use techniques for assessing a counterfactual’s model dependence without having to conduct sensitivity testing over specified classes of models. These same methods can be used to approximate the common support of the treatment and control groups in causal inference.
Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just the relative risks and rates, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information.
Although not widely known until much later, Al Gore received 202 more votes than George W. Bush on election day in Florida. George W. Bush is president because he overcame his election day deficit with overseas absentee ballots that arrived and were counted after election day. In the final official tally, Bush received 537 more votes than Gore. These numbers are taken from the official results released by the Florida Secretary of State's office and so do not reflect overvotes, undervotes, unsuccessful litigation, butterfly ballot problems, recounts that might have been allowed but were not, or any other hypothetical divergence between voter preferences and counted votes. After the election, the New York Times conducted a six month long investigation and found that 680 of the overseas absentee ballots were illegally counted, and no partisan, pundit, or academic has publicly disagreed with their assessment. In this paper, we describe the statistical procedures we developed and implemented for the Times to ascertain whether disqualifying these 680 ballots would have changed the outcome of the election. The methods involve adding formal Bayesian model averaging procedures to King's (1997) ecological inference model. Formal Bayesian model averaging has not been used in political science but is especially useful when substantive conclusions depend heavily on apparently minor but indefensible model choices, when model generalization is not feasible, and when potential critics are more partisan than academic. We show how we derived the results for the Times so that other scholars can use these methods to make ecological inferences for other purposes. We also present a variety of new empirical results that delineate the precise conditions under which Al Gore would have been elected president, and offer new evidence of the striking effectiveness of the Republican effort to convince local election officials to count invalid ballots in Bush counties and not count them in Gore counties.
Ecological Inference: New Methodological Strategies brings together a diverse group of scholars to survey the latest strategies for solving ecological inference problems in various fields. The last half decade has witnessed an explosion of research in ecological inference – the attempt to infer individual behavior from aggregate data. The uncertainties and the information lost in aggregation make ecological inference one of the most difficult areas of statistical inference, but such inferences are required in many academic fields, as well as by legislatures and the courts in redistricting, by businesses in marketing research, and by governments in policy analysis.
Andrew Gelman, Jonathan Katz, and Gary King. 2004. “Empirically Evaluating the Electoral College.” In Rethinking the Vote: The Politics and Prospects of American Electoral Reform, edited by Ann N Crigler, Marion R Just, and Edward J McCaffery, Pp. 75-88. New York: Oxford University Press.Abstract
The 2000 U.S. presidential election rekindled interest in possible electoral reform. While most of the popular and academic accounts focused on balloting irregularities in Florida, such as the now infamous "butterfly" ballot and mishandled absentee ballots, some also noted that this election marked only the fourth time in history that the candidate with a plurality of the popular vote did not also win the Electoral College. This "anti-democratic" outcome has fueled desire for reform or even outright elimination of the electoral college. We show that after appropriate statistical analysis of the available historical electoral data, there is little basis to argue for reforming the Electoral College. We first show that while the Electoral College may once have been biased against the Democrats, the current distribution of voters advantages neither party. Further, the electoral vote will differ from the popular vote only when the average vote shares of the two major candidates are extremely close to 50 percent. As for individual voting power, we show that while there has been much temporal variation in relative voting power over the last several decades, the voting power of individual citizens would not likely increase under a popular vote system of electing the president.
We address two long-standing survey research problems: measuring complicated concepts, such as political freedom or efficacy, that researchers define best with reference to examples and and what to do when respondents interpret identical questions in different ways. Scholars have long addressed these problems with approaches to reduce incomparability, such as writing more concrete questions – with uneven success. Our alternative is to measure directly response category incomparability and to correct for it. We measure incomparability via respondents’ assessments, on the same scale as the self-assessments to be corrected, of hypothetical individuals described in short vignettes. Since actual levels of the vignettes are invariant over respondents, variability in vignette answers reveals incomparability. Our corrections require either simple recodes or a statistical model designed to save survey administration costs. With analysis, simulations, and cross-national surveys, we show how response incomparability can drastically mislead survey researchers and how our approach can fix them.
We thank Scott de Marchi, Christopher Gelpi, and Jeffrey Grynaviski (2003 and hereinafter dGG) for their careful attention to our work (Beck, King, and Zeng, 2000 and hereinafter BKZ) and for raising some important methodological issues that we agree deserve readers’ attention. We are pleased that dGG’s analyses are consistent with the theoretical conjecture about international conflict put forward in BKZ –- "The causes of conflict, theorized to be important but often found to be small or ephemeral, are indeed tiny for the vast majority of dyads, but they are large stable and replicable whenever the ex ante probability of conflict is large" (BKZ, p.21) –- and that dGG agree with our main methodological point that out-of-sample forecasting performance should always be one of the standards used to judge studies of international conflict, and indeed most other areas of political science. However, dGG frequently err when they draw methodological conclusions. Their central claim involves the superiority of logit over neural network models for international conflict data, as judged by forecasting performance and other properties such as ease of use and interpretation ("neural networks hold few unambiguous advantages... and carry significant costs" relative to logit and dGG, p.14). We show here that this claim, which would be regarded as stunning in any of the diverse fields in which both methods are more commonly used, is false. We also show that dGG’s methodological errors and the restrictive model they favor cause them to miss and mischaracterize crucial patterns in the causes of international conflict. We begin in the next section by summarizing the growing support for our conjecture about international conflict. The second section discusses the theoretical reasons why neural networks dominate logistic regression, correcting a number of methodological errors. The third section then demonstrates empirically, in the same data as used in BKZ and dGG, that neural networks substantially outperform dGG’s logit model. We show that neural networks improve on the forecasts from logit as much as logit improves on a model with no theoretical variables. We also show how dGG’s logit analysis assumed, rather than estimated, the answer to the central question about the literature’s most important finding, the effect of democracy on war. Since this and other substantive assumptions underlying their logit model are wrong, their substantive conclusion about the democratic peace is also wrong. The neural network models we used in BKZ not only avoid these difficulties, but they, or one of the other methods available that do not make highly restrictive assumptions about the exact functional form, are just what is called for to study the observable implications of our conjecture.
What should a researcher do when statistical analysis software terminates before completion with a message that the Hessian is not invertable? The standard textbook advice is to respecify the model, but this is another way of saying that the researcher should change the question being asked. Obviously, however, computer programs should not be in the business of deciding what questions are worthy of study. Although noninvertable Hessians are sometimes signals of poorly posed questions, nonsensical models, or inappropriate estimators, they also frequently occur when information about the quantities of interest exists in the data, through the likelihood function. We explain the problem in some detail and lay out two preliminary proposals for ways of dealing with noninvertable Hessians without changing the question asked.
YourCast is (open source and free) software that makes forecasts by running sets of linear regressions together in a variety of sophisticated ways. YourCast avoids the bias that results when stacking datasets from separate cross-sections and assuming constant parameters, and the inefficiency that results from running independent regressions in each cross-section.
Despite widespread recognition that aggregated summary statistics on international conflict and cooperation miss most of the complex interactions among nations, the vast majority of scholars continue to employ annual, quarterly, or occasionally monthly observations. Daily events data, coded from some of the huge volume of news stories produced by journalists, have not been used much for the last two decades. We offer some reason to change this practice, which we feel should lead to considerably increased use of these data. We address advances in event categorization schemes and software programs that automatically produce data by "reading" news stories without human coders. We design a method that makes it feasible for the first time to evaluate these programs when they are applied in areas with the particular characteristics of international conflict and cooperation data, namely event categories with highly unequal prevalences, and where rare events (such as highly conflictual actions) are of special interest. We use this rare events design to evaluate one existing program, and find it to be as good as trained human coders, but obviously far less expensive to use. For large scale data collections, the program dominates human coding. Our new evaluative method should be of use in international relations, as well as more generally in the field of computational linguistics, for evaluating other automated information extraction tools. We believe that the data created by programs similar to the one we evaluated should see dramatically increased use in international relations research. To facilitate this process, we are releasing with this article data on 4.3 million international events, covering the entire world for the last decade.
In every discipline in which "empirical research" has become commonplace, scholars have formed a subfield devoted to solving the methodological problems unique to that discipline’s data and theoretical questions. Although students of economics, political science, psychology, sociology, business, education, medicine, public health, and so on primarily focus on specific substantive questions, they cannot wait for those in other fields to solve their methoodological problems or to teach them "new" methods, wherever they were initially developed. In "The Rules of Inference," we argued for the creation of an analogous methodological subfield devoted to legal scholarship. We also had two other objectives: (1) to adapt the rules of inference used in the natural and social sciences, which apply equally to quantitative and qualitative research, to the special needs, theories, and data in legal scholarship, and (2) to offer recommendations on how the infrastructure of teaching and research at law schools might be reorganized so that it could better support the creation of first-rate quantitative and qualitative empirical research without compromising other important objectives. Published commentaries on our paper, along with citations to it, have focused largely on the first-our application of the rules of inference to legal scholarship. Until now, discussions of our second goal-suggestions for the improvement of legal scholarship, as well as our argument for the creation of a group that would focus on methodological problems unique to law-have been relegated to less public forums, even though, judging from the volume of correspondence we have received, they seem to be no less extensive.
This is a set of easy-to-use Stata macros that implement the techniques described in Gary King, Michael Tomz, and Jason Wittenberg's "Making the Most of Statistical Analyses: Improving Interpretation and Presentation". To install Clarify, type "net from https://gking.harvard.edu/clarify (https://gking.harvard.edu/clarify)" at the Stata command line.
Winner of the Okidata Best Research Software Award. Also try -ssc install qsim- to install a wrapper, donated by Fred Wolfe, to automate Clarify's simulation of dummy variables.
Since Herron and Shotts (2003a and hereinafter HS), Adolph and King (2003 andhereinafter AK), and Herron and Shotts (2003b and hereinafter HS2), the four of us have iterated many more times, learned a great deal, and arrived at a consensus on this issue. This paper describes our joint recommendations for how to run second-stage ecological regressions, and provides detailed analyses to back up our claims.
Few would disagree that health policies and programmes ought to be based on valid, timely and relevant information, focused on those aspects of health development that are in greatest need of improvement. For example, vaccination programmes rely heavily on information on cases and deaths to document needs and to monitor progress on childhood illness and mortality. The same strong information basis is necessary for policies on health inequality. The reduction of health inequality is widely accepted as a key goal for societies, but any policy needs reliable research on the extent and causes of health inequality. Given that child deaths still constitute 19% of all deaths globally and 24% of all deaths in developing countries (1), reducing inequalities in child survival is a good beginning.
The between-group component of total health inequality has been studied extensively by numerous scholars. They have expertly analysed the causes of differences in health status and mortality across population subgroups, defined by income, education, race/ethnicity, country, region, social class, and other group identifiers (2–9).