# Methods

## Anchoring Vignettes (for interpersonal incomparability)

Methods for interpersonal incomparability, when respondents (from different cultures, genders, countries, or ethnic groups) understand survey questions in different ways; for developing theoretical definitions of complicated concepts apparently definable only by example (i.e., "you know it when you see it").

Gary King, Christopher J.L. Murray, Joshua A. Salomon, and Ajay Tandon. 2004. “Enhancing the Validity and Cross-cultural Comparability of Measurement in Survey Research.” American Political Science Review, 98, Pp. 191–207.Abstract

We address two long-standing survey research problems: measuring complicated concepts, such as political freedom or efficacy, that researchers define best with reference to examples and and what to do when respondents interpret identical questions in different ways. Scholars have long addressed these problems with approaches to reduce incomparability, such as writing more concrete questions – with uneven success. Our alternative is to measure directly response category incomparability and to correct for it. We measure incomparability via respondents’ assessments, on the same scale as the self-assessments to be corrected, of hypothetical individuals described in short vignettes. Since actual levels of the vignettes are invariant over respondents, variability in vignette answers reveals incomparability. Our corrections require either simple recodes or a statistical model designed to save survey administration costs. With analysis, simulations, and cross-national surveys, we show how response incomparability can drastically mislead survey researchers and how our approach can fix them.

Gary King and Jonathan Wand. 2007. “Comparing Incomparable Survey Responses: New Tools for Anchoring Vignettes.” Political Analysis, 15, Pp. 46-66.Abstract

When respondents use the ordinal response categories of standard survey questions in different ways, the validity of analyses based on the resulting data can be biased. Anchoring vignettes is a survey design technique, introduced by King, Murray, Salomon, and Tandon (2004), intended to correct for some of these problems. We develop new methods both for evaluating and choosing anchoring vignettes, and for analyzing the resulting data. With surveys on a diverse range of topics in a range of countries, we illustrate how our proposed methods can improve the ability of anchoring vignettes to extract information from survey data, as well as saving in survey administration costs.

Daniel Hopkins and Gary King. 2010. “Improving Anchoring Vignettes: Designing Surveys to Correct Interpersonal Incomparability.” Public Opinion Quarterly, Pp. 1-22.Abstract

We report the results of several randomized survey experiments designed to evaluate two intended improvements to anchoring vignettes, an increasingly common technique used to achieve interpersonal comparability in survey research.  This technique asks for respondent self-assessments followed by assessments of hypothetical people described in vignettes. Variation in assessments of the vignettes across respondents reveals interpersonal incomparability and allows researchers to make responses more comparable by rescaling them. Our experiments show, first, that switching the question order so that self-assessments follow the vignettes primes respondents to define the response scale in a common way.  In this case, priming is not a bias to avoid but a means of better communicating the question’s meaning.  We then demonstrate that combining vignettes and self-assessments in a single direct comparison induces inconsistent and less informative responses.  Since similar combined strategies are widely employed for related purposes, our results indicate that anchoring vignettes could reduce measurement error in many applications where they are not currently used.  Data for our experiments come from a national telephone survey and a separate on-line survey.

# Anchoring Vignettes Overview

This site offers survey researchers, and others, help in addressing two long-standing questions:

1. How can we do survey research if different respondents (perhaps from different cultures, countries, or ethnic groups) understand questions in completely different ways, or if investigators mean one thing and respondents think they mean something else?
2. How can we develop accurate measures of complicated concepts which we can define only by example ("you know it when you see it"), and when attempts to produce more concrete questions tend to be more concrete but...

## Automated Text Analysis

Automated and computer-assisted methods of extracting, organizing, understanding, conceptualizing, and consuming knowledge from massive quantities of unstructured text.

## Content Analysis

Justin Grimmer and Gary King. 2011. “General Purpose Computer-Assisted Clustering and Conceptualization.” Proceedings of the National Academy of Sciences. Publisher's VersionAbstract

We develop a computer-assisted method for the discovery of insightful conceptualizations, in the form of clusterings (i.e., partitions) of input objects. Each of the numerous fully automated methods of cluster analysis proposed in statistics, computer science, and biology optimize a different objective function. Almost all are well defined, but how to determine before the fact which one, if any, will partition a given set of objects in an "insightful" or "useful" way for a given user is unknown and difficult, if not logically impossible. We develop a metric space of partitions from all existing cluster analysis methods applied to a given data set (along with millions of other solutions we add based on combinations of existing clusterings), and enable a user to explore and interact with it, and quickly reveal or prompt useful or insightful conceptualizations. In addition, although uncommon in unsupervised learning problems, we offer and implement evaluation designs that make our computer-assisted approach vulnerable to being proven suboptimal in specific data types. We demonstrate that our approach facilitates more efficient and insightful discovery of useful information than either expert human coders or many existing fully automated methods.

Methods to evaluate automated information extraction systems when coding rare events, the success of one such system, along with considerable data. Gary King and Will Lowe. 2003. “An Automated Information Extraction Tool For International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design.” International Organization, 57, Pp. 617-642.Abstract
Despite widespread recognition that aggregated summary statistics on international conflict and cooperation miss most of the complex interactions among nations, the vast majority of scholars continue to employ annual, quarterly, or occasionally monthly observations. Daily events data, coded from some of the huge volume of news stories produced by journalists, have not been used much for the last two decades. We offer some reason to change this practice, which we feel should lead to considerably increased use of these data. We address advances in event categorization schemes and software programs that automatically produce data by "reading" news stories without human coders. We design a method that makes it feasible for the first time to evaluate these programs when they are applied in areas with the particular characteristics of international conflict and cooperation data, namely event categories with highly unequal prevalences, and where rare events (such as highly conflictual actions) are of special interest. We use this rare events design to evaluate one existing program, and find it to be as good as trained human coders, but obviously far less expensive to use. For large scale data collections, the program dominates human coding. Our new evaluative method should be of use in international relations, as well as more generally in the field of computational linguistics, for evaluating other automated information extraction tools. We believe that the data created by programs similar to the one we evaluated should see dramatically increased use in international relations research. To facilitate this process, we are releasing with this article data on 4.3 million international events, covering the entire world for the last decade.
Gary King, Jennifer Pan, and Margaret E. Roberts. 2014. “Reverse-engineering censorship in China: Randomized experimentation and participant observation.” Science, 345, 6199, Pp. 1-10. Publisher's VersionAbstract

Existing research on the extensive Chinese censorship organization uses observational methods with well-known limitations. We conducted the first large-scale experimental study of censorship by creating accounts on numerous social media sites, randomly submitting different texts, and observing from a worldwide network of computers which texts were censored and which were not. We also supplemented interviews with confidential sources by creating our own social media site, contracting with Chinese firms to install the same censoring technologies as existing sites, and—with their software, documentation, and even customer support—reverse-engineering how it all works. Our results offer rigorous support for the recent hypothesis that criticisms of the state, its leaders, and their policies are published, whereas posts about real-world events with collective action potential are censored.

A method that gives unbiased estimates of the proportion of text documents in investigator-chosen categories, given only a small subset of hand-coded documents. Also includes the first correction for the far less-than-perfect levels of inter-coder reliability that typically characterize hand coding. Applications to sentiment detection about politicians in blog posts. Daniel Hopkins and Gary King. 2010. “A Method of Automated Nonparametric Content Analysis for Social Science.” American Journal of Political Science, 54, 1, Pp. 229–247.Abstract

The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis.

Justin Grimmer, Gary King, and Chiara Superti. 2014. “You Lie! Patterns of Partisan Taunting in the U.S. Senate (Poster).” In Society for Political Methodology. Athens, GA.Abstract

This is a poster that describes our analysis of "partisan taunting," the explicit, public, and negative attacks on another political party or its members, usually using vitriolic and derogatory language. We first demonstrate that most projects that hand code text in the social sciences optimize with respect to the wrong criterion, resulting in large, unnecessary biases. We show how to fix this problem and then apply it to taunting. We find empirically that, unlike most claims in the press and the literature, taunting is not inexorably increasing; it appears instead to be a rational political strategy, most often used by those least likely to win by traditional means -- ideological extremists, out-party members when the president is unpopular, and minority party members. However, although taunting appears to be individually rational, it is collectively irrational: Constituents may resonate with one cutting taunt by their Senator, but they might not approve if he or she were devoting large amounts of time to this behavior rather than say trying to solve important national problems. We hope to partially rectify this situation by posting public rankings of Senatorial taunting behavior.

A version of the previous article for a different audience: Will Lowe and Gary King. 2003. “Some Statistical Methods for Evaluating Information Extraction Systems.” Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Pp. 19-26.Abstract

We present new statistical methods for evaluating information extraction systems. The methods were developed to evaluate a system used by political scientists to extract event information from news leads about international politics. The nature of this data presents two problems for evaluators: 1) the frequency distribution of event types in international event data is strongly skewed, so a random sample of newsleads will typically fail to contain any low frequency events. 2) Manual information extraction necessary to create evaluation sets is costly, and most effort is wasted coding high frequency categories . We present an evaluation scheme that overcomes these problems with considerably less manual effort than traditional methods, and also allows us to interpret an information extraction system as an estimator (in the statistical sense) and to estimate its bias.

Aykut Firat, Mitchell Brooks, Christopher Bingham, Amac Herdagdelen, and Gary King. 11/1/2016. “Systems and methods for calculating category proportions.” United States of America 9,483,544 (U.S. Patent and Trademark Office).Abstract

Systems and methods are provided for classifying text based on language using one or more computer servers and storage devices. A computer-implemented method includes receiving a training set of elements, each element in the training set being assigned to one of a plurality of categories and having one of a plurality of content profiles associated therewith; receiving a population set of elements, each element in the population set having one of the plurality of content profiles associated therewith; and calculating using at least one of a stacked regression algorithm, a bias formula algorithm, a noise elimination algorithm, and an ensemble method consisting of a plurality of algorithmic methods the results of which are averaged, based on the content profiles associated with and the categories assigned to elements in the training set and the content profiles associated with the elements of the population set, a distribution of elements of the population set over the categories.

Gary King, Jennifer Pan, and Margaret E Roberts. 2013. “How Censorship in China Allows Government Criticism but Silences Collective Expression.” American Political Science Review, 107, 2 (May), Pp. 1-18.Abstract

We offer the first large scale, multiple source analysis of the outcome of what may be the most extensive effort to selectively censor human expression ever implemented. To do this, we have devised a system to locate, download, and analyze the content of millions of social media posts originating from nearly 1,400 different social media services all over China before the Chinese government is able to find, evaluate, and censor (i.e., remove from the Internet) the large subset they deem objectionable. Using modern computer-assisted text analytic methods that we adapt to and validate in the Chinese language, we compare the substantive content of posts censored to those not censored over time in each of 85 topic areas. Contrary to previous understandings, posts with negative, even vitriolic, criticism of the state, its leaders, and its policies are not more likely to be censored. Instead, we show that the censorship program is aimed at curtailing collective action by silencing comments that represent, reinforce, or spur social mobilization, regardless of content. Censorship is oriented toward attempting to forestall collective activities that are occurring now or may occur in the future --- and, as such, seem to clearly expose government intent.

Gary King, Daniel Hopkins, and Ying Lu. 11/17/2015. “System for Estimating a Distribution of Message Content Categories in Source Data (2nd).” United States of America US 9,189,538 B2 (U.S Patent and Trademark Office).Abstract
A method of computerized content analysis that gives "approximately unbiased and statistically consistent estimates" of a distribution of elements of structured, unstructured, and partially structured soruce data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be euqal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.
Daniel Hopkins, Gary King, and Ying Lu. 2012. “System for Estimating a Distribution of Message Content Categories in Source Data.” United States of America 8180717 (May 15).Abstract

A method of computerized content analysis that gives “approximately unbiased and statistically consistent estimates” of a distribution of elements of structured, unstructured, and partially structured source data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be equal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.

Gary King, Patrick Lam, and Margaret Roberts. 2017. “Computer-Assisted Keyword and Document Set Discovery from Unstructured Text.” American Journal of Political Science, 61, 4, Pp. 971-988. Publisher's VersionAbstract

The (unheralded) first step in many applications of automated text analysis involves selecting keywords to choose documents from a large text corpus for further study. Although all substantive results depend on this choice, researchers usually pick keywords in ad hoc ways that are far from optimal and usually biased. Paradoxically, this often means that the validity of the most sophisticated text analysis methods depend in practice on the inadequate keyword counting or matching methods they are designed to replace. Improved methods of keyword selection would also be valuable in many other areas, such as following conversations that rapidly innovate language to evade authorities, seek political advantage, or express creativity; generic web searching; eDiscovery; look-alike modeling; intelligence analysis; and sentiment and topic analysis. We develop a computer-assisted (as opposed to fully automated) statistical approach that suggests keywords from available text without needing structured data as inputs. This framing poses the statistical problem in a new way, which leads to a widely applicable algorithm. Our specific approach is based on training classifiers, extracting information from (rather than correcting) their mistakes, and summarizing results with Boolean search strings. We illustrate how the technique works with analyses of English texts about the Boston Marathon Bombings, Chinese social media posts designed to evade censorship, among others.

Gary King, Brian Lukoff, and Eric Mazur. 2014. “Participant Grouping for Enhanced Interactive Experience.” United States of America US 8,914,373 B2 (U.S. Patent and Trademark Office).Abstract

Representative embodiments of a method for grouping participants in an activity include the steps of: (i) defining a grouping policy; (ii) storing, in a database, participant records that include a participant identifer, a characteristic associated With the participant, and/or an identifier for a participant’s handheld device; (iii) defining groupings based on the policy and characteristics of the participants relating to the policy and to the activity; and (iv) communicating the groupings to the handheld devices to establish the groups.

Connor T. Jerzak, Gary King, and Anton Strezhnev. Forthcoming. “An Improved Method of Automated Nonparametric Content Analysis for Social Science.” Political Analysis.Abstract

Some scholars build models to classify documents into chosen categories. Others, especially social scientists who tend to focus on population characteristics, instead usually estimate the proportion of documents in each category -- using either parametric "classify-and-count" methods or "direct" nonparametric estimation of proportions without individual classification. Unfortunately, classify-and-count methods can be highly model dependent or generate more bias in the proportions even as the percent of documents correctly classified increases. Direct estimation avoids these problems, but can suffer when the meaning of language changes between training and test sets or is too similar across categories. We develop an improved direct estimation approach without these issues by including and optimizing continuous text features, along with a form of matching adapted from the causal inference literature. Our approach substantially improves performance in a diverse collection of 73 data sets. We also offer easy-to-use software software that implements all ideas discussed herein.

Gary King and Justin Grimmer. 2013. “Method and Apparatus for Selecting Clusterings to Classify A Predetermined Data Set.” United States of America 8,438,162 (May 7).Abstract

A method for selecting clusterings to classify a predetermined data set of numerical data comprises five steps. First, a plurality of known clustering methods are applied, one at a time, to the data set to generate clusterings for each method. Second, a metric space of clusterings is generated using a metric that measures the similarity between two clusterings. Third, the metric space is projected to a lower dimensional representation useful for visualization. Fourth, a “local cluster ensemble” method generates a clustering for each point in the lower dimensional space. Fifth, an animated visualization method uses the output of the local cluster ensemble method to display the lower dimensional space and to allow a user to move around and explore the space of clustering.

## Causal Inference

Methods for detecting and reducing model dependence (i.e., when minor model changes produce substantively different inferences) in inferring causal effects and other counterfactuals. Matching methods; "politically robust" and cluster-randomized experimental designs; causal bias decompositions.

## Evaluating Model Dependence

Evaluating whether counterfactual questions (predictions, what-if questions, and causal effects) can be reasonably answered from given data, or whether inferences will instead be highly model-dependent; also, a new decomposition of bias in causal inference. These articles overlap (and each as been the subject of a journal symposium):
For complete mathematical proofs, general notation, and other technical material, see: Gary King and Langche Zeng. 2006. “The Dangers of Extreme Counterfactuals.” Political Analysis, 14, Pp. 131–159.Abstract
We address the problem that occurs when inferences about counterfactuals – predictions, "what if" questions, and causal effects – are attempted far from the available data. The danger of these extreme counterfactuals is that substantive conclusions drawn from statistical models that fit the data well turn out to be based largely on speculation hidden in convenient modeling assumptions that few would be willing to defend. Yet existing statistical strategies provide few reliable means of identifying extreme counterfactuals. We offer a proof that inferences farther from the data are more model-dependent, and then develop easy-to-apply methods to evaluate how model-dependent our answers would be to specified counterfactuals. These methods require neither sensitivity testing over specified classes of models nor evaluating any specific modeling assumptions. If an analysis fails the simple tests we offer, then we know that substantive results are sensitive to at least some modeling choices that are not based on empirical evidence.
For more intuitive, but less general, notation, but with additional examples and more pedagogically oriented material, see: Gary King and Langche Zeng. 2007. “When Can History Be Our Guide? The Pitfalls of Counterfactual Inference.” International Studies Quarterly, Pp. 183-210.Abstract
Inferences about counterfactuals are essential for prediction, answering "what if" questions, and estimating causal effects. However, when the counterfactuals posed are too far from the data at hand, conclusions drawn from well-specified statistical analyses become based on speculation and convenient but indefensible model assumptions rather than empirical evidence. Unfortunately, standard statistical approaches assume the veracity of the model rather than revealing the degree of model-dependence, and so this problem can be hard to detect. We develop easy-to-apply methods to evaluate counterfactuals that do not require sensitivity testing over specified classes of models. If an analysis fails the tests we offer, then we know that substantive results are sensitive to at least some modeling choices that are not based on empirical evidence. We use these methods to evaluate the extensive scholarly literatures on the effects of changes in the degree of democracy in a country (on any dependent variable) and separate analyses of the effects of UN peacebuilding efforts. We find evidence that many scholars are inadvertently drawing conclusions based more on modeling hypotheses than on their data. For some research questions, history contains insufficient information to be our guide.

## Matching Methods

Stefano M Iacus, Gary King, and Giuseppe Porro. 2009. “CEM: Software for Coarsened Exact Matching.” Journal of Statistical Software, 30. Publisher's VersionAbstract

This program is designed to improve causal inference via a method of matching that is widely applicable in observational data and easy to understand and use (if you understand how to draw a histogram, you will understand this method). The program implements the coarsened exact matching (CEM) algorithm, described below. CEM may be used alone or in combination with any existing matching method. This algorithm, and its statistical properties, are described in Iacus, King, and Porro (2008).

A technical paper that describes a new class of matching methods, of which coarsened exact matching is an example: Stefano M Iacus, Gary King, and Giuseppe Porro. 2011. “Multivariate Matching Methods That are Monotonic Imbalance Bounding.” Journal of the American Statistical Association, 106, 493, Pp. 345-361.Abstract

We introduce a new "Monotonic Imbalance Bounding" (MIB) class of matching methods for causal inference with a surprisingly large number of attractive statistical properties. MIB generalizes and extends in several new directions the only existing class, "Equal Percent Bias Reducing" (EPBR), which is designed to satisfy weaker properties and only in expectation. We also offer strategies to obtain specific members of the MIB class, and analyze in more detail a member of this class, called Coarsened Exact Matching, whose properties we analyze from this new perspective. We offer a variety of analytical results and numerical simulations that demonstrate how members of the MIB class can dramatically improve inferences relative to EPBR-based matching methods.

A unified approach to matching methods as a way to reduce model dependence by preprocessing data and then using any model you would have without matching: Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart. 2007. “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference.” Political Analysis, 15, Pp. 199–236.Abstract

Although published works rarely include causal estimates from more than a few model specifications, authors usually choose the presented estimates from numerous trial runs readers never see. Given the often large variation in estimates across choices of control variables, functional forms, and other modeling assumptions, how can researchers ensure that the few estimates presented are accurate or representative? How do readers know that publications are not merely demonstrations that it is possible to find a specification that fits the author’s favorite hypothesis? And how do we evaluate or even define statistical properties like unbiasedness or mean squared error when no unique model or estimator even exists? Matching methods, which offer the promise of causal inference with fewer assumptions, constitute one possible way forward, but crucial results in this fast-growing methodological literature are often grossly misinterpreted. We explain how to avoid these misinterpretations and propose a unified approach that makes it possible for researchers to preprocess data with matching (such as with the easy-to-use software we offer) and then to apply the best parametric techniques they would have used anyway. This procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.

Matthew Blackwell, Stefano Iacus, Gary King, and Giuseppe Porro. 2009. “CEM: Coarsened Exact Matching in Stata.” The Stata Journal, 9, Pp. 524–546.Abstract
In this article, we introduce a Stata implementation of coarsened exact matching, a new method for improving the estimation of causal effects by reducing imbalance in covariates between treated and control groups. Coarsened exact matching is faster, is easier to use and understand, requires fewer assumptions, is more easily automated, and possesses more attractive statistical properties for many applications than do existing matching methods. In coarsened exact matching, users temporarily coarsen their data, exact match on these coarsened data, and then run their analysis on the uncoarsened, matched data. Coarsened exact matching bounds the degree of model dependence and causal effect estimation error by ex ante user choice, is monotonic imbalance bounding (so that reducing the maximum imbalance on one variable has no effect on others), does not require a separate procedure to restrict data to common support, meets the congruence principle, is approximately invariant to measurement error, balances all nonlinearities and interactions in sample (i.e., not merely in expectation), and works with multiply imputed datasets. Other matching methods inherit many of the coarsened exact matching method’s properties when applied to further match data preprocessed by coarsened exact matching. The cem command implements the coarsened exact matching algorithm in Stata.
Stefano M. Iacus, Gary King, and Giuseppe Porro. 2019. “A Theory of Statistical Inference for Matching Methods in Causal Research.” Political Analysis, 27, 1, Pp. 46-68.Abstract

Researchers who generate data often optimize efficiency and robustness by choosing stratified over simple random sampling designs. Yet, all theories of inference proposed to justify matching methods are based on simple random sampling. This is all the more troubling because, although these theories require exact matching, most matching applications resort to some form of ex post stratification (on a propensity score, distance metric, or the covariates) to find approximate matches, thus nullifying the statistical properties these theories are designed to ensure. Fortunately, the type of sampling used in a theory of inference is an axiom, rather than an assumption vulnerable to being proven wrong, and so we can replace simple with stratified sampling, so long as we can show, as we do here, that the implications of the theory are coherent and remain true. Properties of estimators based on this theory are much easier to understand and can be satisfied without the unattractive properties of existing theories, such as assumptions hidden in data analyses rather than stated up front, asymptotics, unfamiliar estimators, and complex variance calculations. Our theory of inference makes it possible for researchers to treat matching as a simple form of preprocessing to reduce model dependence, after which all the familiar inferential techniques and uncertainty calculations can be applied. This theory also allows binary, multicategory, and continuous treatment variables from the outset and straightforward extensions for imperfect treatment assignment and different versions of treatments.

A simple and powerful method of matching: Stefano M. Iacus, Gary King, and Giuseppe Porro. 2012. “Causal Inference Without Balance Checking: Coarsened Exact Matching.” Political Analysis, 20, 1, Pp. 1--24. WebsiteAbstract

We discuss a method for improving causal inferences called "Coarsened Exact Matching'' (CEM), and the new "Monotonic Imbalance Bounding'' (MIB) class of matching methods from which CEM is derived. We summarize what is known about CEM and MIB, derive and illustrate several new desirable statistical properties of CEM, and then propose a variety of useful extensions. We show that CEM possesses a wide range of desirable statistical properties not available in most other matching methods, but is at the same time exceptionally easy to comprehend and use. We focus on the connection between theoretical properties and practical applications. We also make available easy-to-use open source software for R and Stata which implement all our suggestions.

Gary King, Christopher Lucas, and Richard Nielsen. 2017. “The Balance-Sample Size Frontier in Matching Methods for Causal Inference.” American Journal of Political Science, 61, 2, Pp. 473-489.Abstract

We propose a simplified approach to matching for causal inference that simultaneously optimizes balance (similarity between the treated and control groups) and matched sample size. Existing approaches either fix the matched sample size and maximize balance or fix balance and maximize sample size, leaving analysts to settle for suboptimal solutions or attempt manual optimization by iteratively tweaking their matching method and rechecking balance. To jointly maximize balance and sample size, we introduce the matching frontier, the set of matching solutions with maximum possible balance for each sample size. Rather than iterating, researchers can choose matching solutions from the frontier for analysis in one step. We derive fast algorithms that calculate the matching frontier for several commonly used balance metrics. We demonstrate with analyses of the effect of sex on judging and job training programs that show how the methods we introduce can extract new knowledge from existing data sets.

Easy to use, open source, software is available here to implement all methods in the paper.

Daniel E. Ho, Kosuke Imai, Gary King, and Elizabeth A. Stuart. 2011. “MatchIt: Nonparametric Preprocessing for Parametric Causal Inference.” Journal of Statistical Software, 42, 8, Pp. 1--28. Publisher's VersionAbstract
MatchIt implements the suggestions of Ho, Imai, King, and Stuart (2007) for improving parametric statistical models by preprocessing data with nonparametric matching methods. MatchIt implements a wide range of sophisticated matching methods, making it possible to greatly reduce the dependence of causal inferences on hard-to-justify, but commonly made, statistical modeling assumptions. The software also easily fits into existing research practices since, after preprocessing data with MatchIt, researchers can use whatever parametric model they would have used without MatchIt, but produce inferences with substantially more robustness and less sensitivity to modeling assumptions. MatchIt is an R program, and also works seamlessly with Zelig.
Gary King, Richard Nielsen, Carter Coberley, James E. Pope, and Aaron Wells. 2011. “Comparative Effectiveness of Matching Methods for Causal Inference”.Abstract

Matching is an increasingly popular method of causal inference in observational data, but following methodological best practices has proven difficult for applied researchers. We address this problem by providing a simple graphical approach for choosing among the numerous possible matching solutions generated by three methods: the venerable Mahalanobis Distance Matching'' (MDM), the commonly used Propensity Score Matching'' (PSM), and a newer approach called Coarsened Exact Matching'' (CEM). In the process of using our approach, we also discover that PSM often approximates random matching, both in many real applications and in data simulated by the processes that fit PSM theory. Moreover, contrary to conventional wisdom, random matching is not benign: it (and thus PSM) can often degrade inferences relative to not matching at all. We find that MDM and CEM do not have this problem, and in practice CEM usually outperforms the other two approaches. However, with our comparative graphical approach and easy-to-follow procedures, focus can be on choosing a matching solution for a particular application, which is what may improve inferences, rather than the particular method used to generate it.

Please see our follow up paper on this topic: Why Propensity Scores Should Not Be Used for Matching.

Gary King and Richard Nielsen. 2019. “Why Propensity Scores Should Not Be Used for Matching.” Political Analysis, 27, 4, Pp. 435-454. Publisher's VersionAbstract

We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal --- thus increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers replace PSM with one of the other available matching methods, propensity scores have other productive uses.

A method to estimate base probabilities or any quantity of interest from case-control data, even with no (or partial) auxiliary information. Discusses problems with odds-ratios. Gary King and Langche Zeng. 2002. “Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control Studies.” Statistics in Medicine, 21, Pp. 1409–1427.Abstract
Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just the relative risks and rates, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information.
Gary King. 1991. “'Truth' is Stranger than Prediction, More Questionable Than Causal Inference.” American Journal of Political Science, 35, Pp. 1047–1053.Abstract
Robert Luskin’s article in this issue provides a useful service by appropriately qualifying several points I made in my 1986 American Journal of Political Science article. Whereas I focused on how to avoid common mistakes in quantitative political sciences, Luskin clarifies ways to extract some useful information from usually problematic statistics: correlation coefficients, standardized coefficients, and especially R2. Since these three statistics are very closely related (and indeed deterministic functions of one another in some cases), I focus in this discussion primarily on R2, the most widely used and abused. Luskin also widens the discussion to various kinds of specification tests, a general issue I also address. In fact, as Beck (1991) reports, a large number of formal specification tests are just functions of R2, with differences among them primarily due to how much each statistic penalizes one for including extra parameters and fewer observations. Quantitative political scientists often worry about model selection and specification, asking questions about parameter identification, autocorrelated or heteroscedastic disturbances, parameter constancy, variable choice, measurement error, endogeneity, functional forms, stochastic assumptions, and selection bias, among numerous others. These model specification questions are all important, but we may have forgotten why we pose them. Political scientists commonly give three reasons: (1) finding the "true" model, or the "full" explanation and (2) prediction and and (3) estimating specific causal effects. I argue here that (1) is used the most but useful the least and (2) is very useful but not usually in political science where forecasting is not often a central concern and and (3) correctly represents the goals of political scientists and should form the basis of most of our quantitative empirical work.

## Experimental Design

Gary King, Benjamin Schneer, and Ariel White. 11/10/2017. “How the news media activate public expression and influence national agendas.” Science, 358, Pp. 776-780. Publisher's VersionAbstract

We demonstrate that exposure to the news media causes Americans to take public stands on specific issues, join national policy conversations, and express themselves publicly—all key components of democratic politics—more often than they would otherwise. After recruiting 48 mostly small media outlets, we chose groups of these outlets to write and publish articles on subjects we approved, on dates we randomly assigned. We estimated the causal effect on proximal measures, such as website pageviews and Twitter discussion of the articles’ specific subjects, and distal ones, such as national Twitter conversation in broad policy areas. Our intervention increased discussion in each broad policy area by approximately 62.7% (relative to a day’s volume), accounting for 13,166 additional posts over the treatment week, with similar effects across population subgroups.

On the Science website: AbstractReprintFull text, and a comment (by Matthew Gentzkow) "Small media, big impact".

Kosuke Imai, Gary King, and Carlos Velasco Rivera. 1/31/2020. “Do Nonpartisan Programmatic Policies Have Partisan Electoral Effects? Evidence from Two Large Scale Experiments.” Journal of Politics, 81, 2, Pp. 714-730. Publisher's VersionAbstract

A vast literature demonstrates that voters around the world who benefit from their governments' discretionary spending cast more ballots for the incumbent party than those who do not benefit. But contrary to most theories of political accountability, some suggest that voters also reward incumbent parties for implementing "programmatic" spending legislation, over which incumbents have no discretion, and even when passed with support from all major parties. Why voters would attribute responsibility when none exists is unclear, as is why minority party legislators would approve of legislation that would cost them votes. We study the electoral effects of two large prominent programmatic policies that fit the ideal type especially well, with unusually large scale experiments that bring more evidence to bear on this question than has previously been possible. For the first policy, we design and implement ourselves one of the largest randomized social experiments ever. For the second policy, we reanalyze studies that used a large scale randomized experiment and a natural experiment to study the same question but came to opposite conclusions. Using corrected data and improved statistical methods, we show that the evidence from all analyses of both policies is consistent: programmatic policies have no effect on voter support for incumbents. We conclude by discussing how the many other studies in the literature may be interpreted in light of our results.

Gary King, Benjamin Schneer, and Ariel White. 2014. “Methods for Extremely Large Scale Media Experiments and Observational Studies (Poster).” In Society for Political Methodology. Athens, GA.Abstract

This is a poster presentation describing (1) the largest ever experimental study of media effects, with more than 50 cooperating traditional media sites, normally unavailable web site analytics, the text of hundreds of thousands of news articles, and tens of millions of social media posts, and (2) a design we used in preparation that attempts to anticipate experimental outcomes

An evaluation of the Mexican Seguro Popular program (designed to extend health insurance and regular and preventive medical care, pharmaceuticals, and health facilities to 50 million uninsured Mexicans), one of the world's largest health policy reforms of the last two decades. The evaluation features the largest randomized health policy experiment in history, a new design for field experiments that is more robust to the political interventions that have ruined many similar previous efforts, and new statistical methods that produce more reliable and efficient results using substantially fewer resources, assumptions, and data. Gary King, Richard Nielsen, Carter Coberley, James E Pope, and Aaron Wells. 2011. “Avoiding Randomization Failure in Program Evaluation.” Population Health Management, 14, 1, Pp. S11-S22.Abstract

We highlight common problems in the application of random treatment assignment in large scale program evaluation. Random assignment is the defining feature of modern experimental design. Yet, errors in design, implementation, and analysis often result in real world applications not benefiting from the advantages of randomization. The errors we highlight cover the control of variability, levels of randomization, size of treatment arms, and power to detect causal effects, as well as the many problems that commonly lead to post-treatment bias. We illustrate with an application to the Medicare Health Support evaluation, including recommendations for improving the design and analysis of this and other large scale randomized experiments.

(Articles on the Seguro Popular Evaluation: Website)
Clarifying serious misunderstandings in the advantages and uses of the most common research designs for making causal inferences. Kosuke Imai, Gary King, and Elizabeth Stuart. 2008. “Misunderstandings Among Experimentalists and Observationalists about Causal Inference.” Journal of the Royal Statistical Society, Series A, 171, part 2, Pp. 481–502.Abstract

We attempt to clarify, and suggest how to avoid, several serious misunderstandings about and fallacies of causal inference in experimental and observational research. These issues concern some of the most basic advantages and disadvantages of each basic research design. Problems include improper use of hypothesis tests for covariate balance between the treated and control groups, and the consequences of using randomization, blocking before randomization, and matching after treatment assignment to achieve covariate balance. Applied researchers in a wide range of scientific disciplines seem to fall prey to one or more of these fallacies, and as a result make suboptimal design or analysis choices. To clarify these points, we derive a new four-part decomposition of the key estimation errors in making causal inferences. We then show how this decomposition can help scholars from different experimental and observational research traditions better understand each other’s inferential problems and attempted solutions.

Kosuke Imai, Gary King, and Clayton Nall. 2009. “The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation.” Statistical Science, 24, Pp. 29–53.Abstract
A basic feature of many field experiments is that investigators are only able to randomize clusters of individuals–-such as households, communities, firms, medical practices, schools, or classrooms–-even when the individual is the unit of interest. To recoup the resulting efficiency loss, some studies pair similar clusters and randomize treatment within pairs. However, many other studies avoid pairing, in part because of claims in the literature, echoed by clinical trials standards organizations, that this matched-pair, cluster-randomization design has serious problems. We argue that all such claims are unfounded. We also prove that the estimator recommended for this design in the literature is unbiased only in situations when matching is unnecessary and and its standard error is also invalid. To overcome this problem without modeling assumptions, we develop a simple design-based estimator with much improved statistical properties. We also propose a model-based approach that includes some of the benefits of our design-based estimator as well as the estimator in the literature. Our methods also address individual-level noncompliance, which is common in applications but not allowed for in most existing methods. We show that from the perspective of bias, efficiency, power, robustness, or research costs, and in large or small samples, pairing should be used in cluster-randomized experiments whenever feasible and failing to do so is equivalent to discarding a considerable fraction of one’s data. We develop these techniques in the context of a randomized evaluation we are conducting of the Mexican Universal Health Insurance Program.

## Software

Gary King, Christopher Lucas, and Richard Nielsen. 2014. “MatchingFrontier: R Package for Calculating the Balance-Sample Size Frontier”.Abstract

MatchingFrontier is an easy-to-use R Package for making optimal causal inferences from observational data.  Despite their popularity, existing matching approaches leave researchers with two fundamental tensions. First, they are designed to maximize one metric (such as propensity score or Mahalanobis distance) but are judged against another for which they were not designed (such as L1 or differences in means). Second, they lack a principled solution to revealing the implicit bias-variance trade off: matching methods need to optimize with respect to both imbalance (between the treated and control groups) and the number of observations pruned, but existing approaches optimize with respect to only one; users then either ignore the other, or tweak it, usually suboptimally, by hand.

MatchingFrontier resolves both tensions by consolidating previous techniques into a single, optimal, and flexible approach. It calculates the matching solution with maximum balance for each possible sample size (N, N-1, N-2,...). It thus directly calculates the entire balance-sample size frontier, from which the user can easily choose one, several, or all subsamples from which to conduct their final analysis, given their own choice of imbalance metric and quantity of interest. MatchingFrontier solves the joint optimization problem in one run, automatically, without manual tweaking, and without iteration.  Although for each subset size k, there exist a huge (N choose k) number of unique subsets, MatchingFrontier includes specially designed fast algorithms that give the optimal answer, usually in a few minutes.

MatchingFrontier implements the methods in this paper:

King, Gary, Christopher Lucas, and Richard Nielsen. 2014. The Balance-Sample Size Frontier in Matching Methods for Causal Inference, copy at http://j.mp/1dRDMrE

Matthew Blackwell, Stefano Iacus, Gary King, and Giuseppe Porro. 2009. “CEM: Coarsened Exact Matching in Stata.” The Stata Journal, 9, Pp. 524–546.Abstract
In this article, we introduce a Stata implementation of coarsened exact matching, a new method for improving the estimation of causal effects by reducing imbalance in covariates between treated and control groups. Coarsened exact matching is faster, is easier to use and understand, requires fewer assumptions, is more easily automated, and possesses more attractive statistical properties for many applications than do existing matching methods. In coarsened exact matching, users temporarily coarsen their data, exact match on these coarsened data, and then run their analysis on the uncoarsened, matched data. Coarsened exact matching bounds the degree of model dependence and causal effect estimation error by ex ante user choice, is monotonic imbalance bounding (so that reducing the maximum imbalance on one variable has no effect on others), does not require a separate procedure to restrict data to common support, meets the congruence principle, is approximately invariant to measurement error, balances all nonlinearities and interactions in sample (i.e., not merely in expectation), and works with multiply imputed datasets. Other matching methods inherit many of the coarsened exact matching method’s properties when applied to further match data preprocessed by coarsened exact matching. The cem command implements the coarsened exact matching algorithm in Stata.
Michael Tomz, Jason Wittenberg, and Gary King. 2003. “CLARIFY: Software for Interpreting and Presenting Statistical Results.” Journal of Statistical Software.Abstract

This is a set of easy-to-use Stata macros that implement the techniques described in Gary King, Michael Tomz, and Jason Wittenberg's "Making the Most of Statistical Analyses: Improving Interpretation and Presentation". To install Clarify, type "net from https://gking.harvard.edu/clarify (https://gking.harvard.edu/clarify)" at the Stata command line.

Winner of the Okidata Best Research Software Award. Also try -ssc install qsim- to install a wrapper, donated by Fred Wolfe, to automate Clarify's simulation of dummy variables.

Heather Stoll, Gary King, and Langchee Zeng. 2005. “WhatIf: Software for Evaluating Counterfactuals.” Journal of Statistical Software, 15, 4, Pp. 1--18. Publisher's versionAbstract

This article describes WhatIf: Software for Evaluating Counterfactuals, an R package that implements the methods for evaluating counterfactuals introduced in King and Zeng (2006a) and King and Zeng (2006b). It offers easy-to-use techniques for assessing a counterfactual’s model dependence without having to conduct sensitivity testing over specified classes of models. These same methods can be used to approximate the common support of the treatment and control groups in causal inference.

## Applications

A brief summary of the above article for an undergraduate audience: Lee Epstein, Daniel E Ho, Gary King, and Jeffrey A Segal. 2005. “The Supreme Court During Crisis: How War Affects only Non-War Cases.” New York University Law Review, 80, Pp. 1–116.Abstract
Does the U.S. Supreme Court curtail rights and liberties when the nation’s security is under threat? In hundreds of articles and books, and with renewed fervor since September 11, 2001, members of the legal community have warred over this question. Yet, not a single large-scale, quantitative study exists on the subject. Using the best data available on the causes and outcomes of every civil rights and liberties case decided by the Supreme Court over the past six decades and employing methods chosen and tuned especially for this problem, our analyses demonstrate that when crises threaten the nation’s security, the justices are substantially more likely to curtail rights and liberties than when peace prevails. Yet paradoxically, and in contradiction to virtually every theory of crisis jurisprudence, war appears to affect only cases that are unrelated to the war. For these cases, the effect of war and other international crises is so substantial, persistent, and consistent that it may surprise even those commentators who long have argued that the Court rallies around the flag in times of crisis. On the other hand, we find no evidence that cases most directly related to the war are affected. We attempt to explain this seemingly paradoxical evidence with one unifying conjecture: Instead of balancing rights and security in high stakes cases directly related to the war, the Justices retreat to ensuring the institutional checks of the democratic branches. Since rights-oriented and process-oriented dimensions seem to operate in different domains and at different times, and often suggest different outcomes, the predictive factors that work for cases unrelated to the war fail for cases related to the war. If this conjecture is correct, federal judges should consider giving less weight to legal principles outside of wartime but established during wartime, and attorneys should see it as their responsibility to distinguish cases along these lines.
Lee Epstein, Daniel E. Ho, Gary King, and Jeffrey A. Segal. 2006. “The Effect of War on the Supreme Court.” In Principles and Practice in American Politics: Classic and Contemporary Readings, edited by Samuel Kernell and Steven S. Smith, 3rd ed. Washington, D.C. Congressional Quarterly Press.Abstract

Does the U.S. Supreme Court curtail rights and liberties when the nation’s security is under threat? In hundreds of articles and books, and with renewed fervor since September 11, 2001, members of the legal community have warred over this question. Yet, not a single large-scale, quantitative study exists on the subject. Using the best data available on the causes and outcomes of every civil rights and liberties case decided by the Supreme Court over the past six decades and employing methods chosen and tuned especially for this problem, our analyses demonstrate that when crises threaten the nation’s security, the justices are substantially more likely to curtail rights and liberties than when peace prevails. Yet paradoxically, and in contradiction to virtually every theory of crisis jurisprudence, war appears to affect only cases that are unrelated to the war. For these cases, the effect of war and other international crises is so substantial, persistent, and consistent that it may surprise even those commentators who long have argued that the Court rallies around the flag in times of crisis. On the other hand, we find no evidence that cases most directly related to the war are affected. We attempt to explain this seemingly paradoxical evidence with one unifying conjecture: Instead of balancing rights and security in high stakes cases directly related to the war, the Justices retreat to ensuring the institutional checks of the democratic branches. Since rights-oriented and process-oriented dimensions seem to operate in different domains and at different times, and often suggest different outcomes, the predictive factors that work for cases unrelated to the war fail for cases related to the war. If this conjecture is correct, federal judges should consider giving less weight to legal principles outside of wartime but established during wartime, and attorneys should see it as their responsibility to distinguish cases along these lines.

## Event Counts and Durations

Statistical models to explain or predict how many events occur for each fixed time period, or the time between events. An application to cabinet dissolution in parliamentary democracies which united two previously warring scholarly literature. Other applications to international relations and U.S. Supreme Court appointments.

## Event Counts

A series of methods that introduced existing, and developed new, statistical models for event counts for political science research.
Gary King and Curtis S Signorino. 1996. “The Generalization in the Generalized Event Count Model, With Comments on Achen, Amato, and Londregan.” Political Analysis, 6, Pp. 225–252.Abstract
We use an analogy with the normal distribution and linear regression to demonstrate the need for the Generalize Event Count (GEC) model. We then show how the GEC provides a unified framework within which to understand a diversity of distributions used to model event counts, and how to express the model in one simple equation. Finally, we address the points made by Christopher Achen, Timothy Amato, and John Londregan. Amato's and Londregan's arguments are consistent with ours and provide additional interesting information and explanations. Unfortunately, the foundation on which Achen built his paper turns out to be incorrect, rendering all his novel claims about the GEC false (or in some cases irrelevant).
Gary King. 1987. “Presidential Appointments to the Supreme Court: Adding Systematic Explanation to Probabilistic Description.” American Politics Quarterly, 15, Pp. 373–386.Abstract
Three articles, published in the leading journals of three disciplines over the last five decades, have each used the Poisson probability distribution to help describe the frequency with which presidents were able to appoint United States Supreme Court Justices. This work challenges these previous findings with a new model of Court appointments. The analysis demonstrates that the number of appointments a president can expect to make in a given year is a function of existing measurable variables.
Gary King. 1988. “Statistical Models for Political Science Event Counts: Bias in Conventional Procedures and Evidence for The Exponential Poisson Regression Model.” American Journal of Political Science, 32, Pp. 838-863.Abstract
This paper presents analytical, Monte Carlo, and empirical evidence on models for event count data. Event counts are dependent variables that measure the number of times some event occurs. Counts of international events are probably the most common, but numerous examples exist in every empirical field of the discipline. The results of the analysis below strongly suggest that the way event counts have been analyzed in hundreds of important political science studies have produced statistically and substantively unreliable results. Misspecification, inefficiency, bias, inconsistency, insufficiency, and other problems result from the unknowing application of two common methods that are without theoretical justification or empirical unity in this type of data. I show that the exponential Poisson regression (EPR) model provides analytically, in large samples, and empirically, in small, finite samples, a far superior model and optimal estimator. I also demonstrate the advantage of this methodology in an application to nineteenth-century party switching in the U.S. Congress. Its use by political scientists is strongly encouraged.
Gary King. 1989. “Event Count Models for International Relations: Generalizations and Applications.” International Studies Quarterly, 33, Pp. 123–147.Abstract
International relations theorists tend to think in terms of continuous processes. Yet we observe only discrete events, such as wars or alliances, and summarize them in terms of the frequency of occurrence. As such, most empirical analyses in international relations are based on event count variables. Unfortunately, analysts have generally relied on statistical techniques that were designed for continuous data. This mismatch between theory and method has caused bias, inefficiency, and numerous inconsistencies in both theoretical arguments and empirical findings throughout the literature. This article develops a much more powerful approach to modeling and statistical analysis based explicity on estimating continuous processes from observed event counts. To demonstrate this class of models, I present several new statistical techniques developed for and applied to different areas of international relations. These include the influence of international alliances on the outbreak of war, the contagious process of multilateral economic sanctions, and reciprocity in superpower conflict. I also show how one can extract considerably more information from existing data and relate substantive theory to empirical analyses more explicitly with this approach.
Gary King. 1989. “Variance Specification in Event Count Models: From Restrictive Assumptions to a Generalized Estimator.” American Journal of Political Science, 33, Pp. 762–784.Abstract
This paper discusses the problem of variance specification in models for event count data. Event counts are dependent variables that can take on only nonnegative integer values, such as the number of wars or coups d’etat in a year. I discuss several generalizations of the Poisson regression model, presented in King (1988), to allow for substantively interesting stochastic processes that do not fit into the Poisson framework. Individual models that cope with, and help analyze, heterogeneity, contagion, and negative contagion are each shown to lead to specific statistical models for event count data. In addition, I derive a new generalized event count (GEC) model that enables researchers to extract significant amounts of new information from existing data by estimating features of these unobserved substantive processes. Applications of this model to congressional challenges of presidential vetoes and superpower conflict demonstrate the dramatic advantages of this approach.
Gary King. 1989. “A Seemingly Unrelated Poisson Regression Model.” Sociological Methods and Research, 17, Pp. 235–255.Abstract
This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.
Rainer Winkelmann, Curtis Signorino, and Gary King. 1995. “A Correction for an Underdispersed Event Count Probability Distribution.” Political Analysis, Pp. 215–228.Abstract
We demonstrate that the expected value and variance commonly given for a well-known probability distribution are incorrect. We also provide corrected versions and report changes in a computer program to account for the known practical uses of this distribution.
Federico Girosi and Gary King. 2008. Demographic Forecasting. Princeton: Princeton University Press.Abstract - see sections on dealing with small death counts

We introduce a new framework for forecasting age-sex-country-cause-specific mortality rates that incorporates considerably more information, and thus has the potential to forecast much better, than any existing approach. Mortality forecasts are used in a wide variety of academic fields, and for global and national health policy making, medical and pharmaceutical research, and social security and retirement planning.

As it turns out, the tools we developed in pursuit of this goal also have broader statistical implications, in addition to their use for forecasting mortality or other variables with similar statistical properties. First, our methods make it possible to include different explanatory variables in a time series regression for each cross-section, while still borrowing strength from one regression to improve the estimation of all. Second, we show that many existing Bayesian (hierarchical and spatial) models with explanatory variables use prior densities that incorrectly formalize prior knowledge. Many demographers and public health researchers have fortuitously avoided this problem so prevalent in other fields by using prior knowledge only as an ex post check on empirical results, but this approach excludes considerable information from their models. We show how to incorporate this demographic knowledge into a model in a statistically appropriate way. Finally, we develop a set of tools useful for developing models with Bayesian priors in the presence of partial prior ignorance. This approach also provides many of the attractive features claimed by the empirical Bayes approach, but fully within the standard Bayesian theory of inference.

## Duration of Parliamentary Governments

A statistical model, and related work, that united two warring scholarly literatures.
Gary King, James Alt, Nancy Burns, and Michael Laver. 1990. “A Unified Model of Cabinet Dissolution in Parliamentary Democracies.” American Journal of Political Science, 34, Pp. 846–871.Abstract
The literature on cabinet duration is split between two apparently irreconcilable positions. The attributes theorists seek to explain cabinet duration as a fixed function of measured explanatory variables, while the events process theorists model cabinet durations as a product of purely stochastic processes. In this paper we build a unified statistical model that combines the insights of these previously distinct approaches. We also generalize this unified model, and all previous models, by including (1) a stochastic component that takes into account the censoring that occurs as a result of governments lasting to the vicinity of the maximum constitutional interelection period, (2) a systematic component that precludes the possibility of negative duration predictions, and (3) a much more objective and parsimonious list of explanatory variables, the explanatory power of which would not be improved by including a list of indicator variables for individual countries.
James E Alt and Gary King. 1994. “Transfers of Governmental Power: The Meaning of Time Dependence.” Comparative Political Studies, 27, Pp. 190–210.Abstract
King, Alt, Burns, and Laver (1990) proposed and estimated a unified model in which cabinet durations depended on seven explanatory variables reflecting features of the cabinets and the bargaining environments in which they formed, along with a stochastic component in which the risk of a cabinet falling was treated as a constant across its tenure. Two recent research reports take issue with one aspect of this model. Warwick and Easton replicate the earlier findings for explanatory variables but claim that the stochastic risk should be seen as rising, and at a rate which varies, across the life of the cabinet. Bienen and van de Walle, using data on the duration of leaders, allege that random risk is falling. We continue in our goal of unifying this literature by providing further estimates with both cabinet and leader duration data that confirm the original explanatory variables’ effects, showing that leaders’ durations are affected by many of the same factors that affect the durability of the cabinets they lead, demonstrating that cabinets have stochastic risk of ending that is indeed constant across the theoretically most interesting range of durations, and suggesting that stochastic risk for leaders in countries with cabinet government is, if not constant, more likely to rise than fall.
James E Alt, Gary King, and Curtis Signorino. 2001. “Aggregation Among Binary, Count, and Duration Models: Estimating the Same Quantities from Different Levels of Data.” Political Analysis, 9, Pp. 21–44.Abstract
Binary, count and duration data all code discrete events occurring at points in time. Although a single data generation process can produce all of these three data types, the statistical literature is not very helpful in providing methods to estimate parameters of the same process from each. In fact, only single theoretical process exists for which know statistical methods can estimate the same parameters - and it is generally used only for count and duration data. The result is that seemingly trivial decisions abut which level of data to use can have important consequences for substantive interpretations. We describe the theoretical event process for which results exist, based on time independence. We also derive a set of models for a time-dependent process and compare their predictions to those of a commonly used model. Any hope of understanding and avoiding the more serious problems of aggregation bias in events data is contingent on first deriving a much wider arsenal of statistical models and theoretical processes that are not constrained by the particular forms of data that happen to be available. We discuss these issues and suggest an agenda for political methodologists interested in this very large class of aggregation problems.

## Software

Gary King. 2002. “COUNT: A Program for Estimating Event Count and Duration Regressions”.Abstract
A stand-alone, easy-to-use program for running event count and duration regression models, developed by and/or discussed in a series of journal articles by me. (Event count models have a dependent variable measured as the number of times something happens, such as the number of uncontested seats per state or the number of wars per year. Duration models explain dependent variables measured as the time until some event, such as the number of months a parliamentary cabinet endures.) Winner of the APSA Research Software Award.

## Ecological Inference

Inferring individual behavior from group-level data: The first approach to incorporate both unit-level deterministic bounds and cross-unit statistical information, methods for 2x2 and larger tables, Bayesian model averaging, applications to elections, software.

## Methods

Summarizes the explosion of research in ecological inference that has occurred in the previous eight years, all following the key insight of models that include both deterministic and statistical information. Gary King, Ori Rosen, Martin Tanner, Gary King, Ori Rosen, and Martin A Tanner. 2004. Ecological Inference: New Methodological Strategies. New York: Cambridge University Press.Abstract
Ecological Inference: New Methodological Strategies brings together a diverse group of scholars to survey the latest strategies for solving ecological inference problems in various fields. The last half decade has witnessed an explosion of research in ecological inference – the attempt to infer individual behavior from aggregate data. The uncertainties and the information lost in aggregation make ecological inference one of the most difficult areas of statistical inference, but such inferences are required in many academic fields, as well as by legislatures and the courts in redistricting, by businesses in marketing research, and by governments in policy analysis.
Related research on aggregation, revealing the logical inconsistency of some popularly used models. James E Alt, Gary King, and Curtis Signorino. 2001. “Aggregation Among Binary, Count, and Duration Models: Estimating the Same Quantities from Different Levels of Data.” Political Analysis, 9, Pp. 21–44.Abstract
Binary, count and duration data all code discrete events occurring at points in time. Although a single data generation process can produce all of these three data types, the statistical literature is not very helpful in providing methods to estimate parameters of the same process from each. In fact, only single theoretical process exists for which know statistical methods can estimate the same parameters - and it is generally used only for count and duration data. The result is that seemingly trivial decisions abut which level of data to use can have important consequences for substantive interpretations. We describe the theoretical event process for which results exist, based on time independence. We also derive a set of models for a time-dependent process and compare their predictions to those of a commonly used model. Any hope of understanding and avoiding the more serious problems of aggregation bias in events data is contingent on first deriving a much wider arsenal of statistical models and theoretical processes that are not constrained by the particular forms of data that happen to be available. We discuss these issues and suggest an agenda for political methodologists interested in this very large class of aggregation problems.
An extension of the work in the above book to use MCMC technology, making models for larger tables possible. Gary King, Ori Rosen, and Martin A Tanner. 1999. “Binomial-Beta Hierarchical Models for Ecological Inference.” Sociological Methods and Research, 28, Pp. 61–90.Abstract
The authors develop binomial-beta hierarchical models for ecological inference using insights from the literature on hierarchical models based on Markov chain Monte Carlo algorithms and King’s ecological inference model. The new approach reveals some features of the data that King’s approach does not, can easily be generalized to more complicated problems such as general R x C tables, allows the data analyst to adjust for covariates, and provides a formal evaluation of the significance of the covariates. It may also be better suited to cases in which the observed aggregate cells are estimated from very few observations or have some forms of measurement error. This article also provides an example of a hierarchical model in which the statistical idea of "borrowing strength" is used not merely to increase the efficiency of the estimates but to enable the data analyst to obtain estimates.
Outlines some of the history of ecological inference research, and introduces this new book. Gary King, Ori Rosen, and Martin Tanner. 2004. “Information in Ecological Inference: An Introduction.” In Ecological Inference: New Methodological Strategies, edited by Gary King, Ori Rosen, and Martin Tanner. New York: Cambridge University Press.
This article uses MCMC technology, and a quicker approximation, to make ecological inferences using deterministic and statistical information in larger tables. Ori Rosen, Wenxin Jiang, Gary King, and Martin A Tanner. 2001. “Bayesian and Frequentist Inference for Ecological Inference: The RxC Case.” Statistica Neerlandica, 55, Pp. 134–156.Abstract
In this paper we propose Bayesian and frequentist approaches to ecological inference, based on R x C contingency tables, including a covariate. The proposed Bayesian model extends the binomial-beta hierarchical model developed by King, Rosen and Tanner (1999) from the 2 x 2 case to the R x C case, the inferential procedure employs Markov chain Monte Carlo (MCMC) methods. As such the resulting MCMC analysis is rich but computationally intensive. The frequentist approach, based on first moments rather than on the entire likelihood, provides quick inference via nonlinear least-squares, while retaining good frequentist properties. The two approaches are illustrated with simulated data, as well as with real data on voting patterns in Weimar Germany. In the final section of the paper we provide an overview of a range of alternative inferential approaches which trade-off computational intensity for statistical efficiency.
Details of an application conducted for the New York Times, including extensions of ecological inference to Bayesian model averaging. Kosuke Imai and Gary King. 2004. “Did Illegal Overseas Absentee Ballots Decide the 2000 U.S. Presidential Election?” Perspectives on Politics, 2, Pp. 537–549.Abstract

Although not widely known until much later, Al Gore received 202 more votes than George W. Bush on election day in Florida. George W. Bush is president because he overcame his election day deficit with overseas absentee ballots that arrived and were counted after election day. In the final official tally, Bush received 537 more votes than Gore. These numbers are taken from the official results released by the Florida Secretary of State's office and so do not reflect overvotes, undervotes, unsuccessful litigation, butterfly ballot problems, recounts that might have been allowed but were not, or any other hypothetical divergence between voter preferences and counted votes. After the election, the New York Times conducted a six month long investigation and found that 680 of the overseas absentee ballots were illegally counted, and no partisan, pundit, or academic has publicly disagreed with their assessment. In this paper, we describe the statistical procedures we developed and implemented for the Times to ascertain whether disqualifying these 680 ballots would have changed the outcome of the election. The methods involve adding formal Bayesian model averaging procedures to King's (1997) ecological inference model. Formal Bayesian model averaging has not been used in political science but is especially useful when substantive conclusions depend heavily on apparently minor but indefensible model choices, when model generalization is not feasible, and when potential critics are more partisan than academic. We show how we derived the results for the Times so that other scholars can use these methods to make ecological inferences for other purposes. We also present a variety of new empirical results that delineate the precise conditions under which Al Gore would have been elected president, and offer new evidence of the striking effectiveness of the Republican effort to convince local election officials to count invalid ballots in Bush counties and not count them in Gore counties.

Gary King, Ori Rosen, and Martin Tanner. 2006. “Ecological Inference.” In The New Palgrave Dictionary of Economics, edited by Larry Blume and Steven N. Durlauf, 2nd ed. London: Palgrave.Abstract
Dictionary entry on the definition of "ecological inference," and a brief summary of the history of ecological inference research.
Wenxin Jiang, Gary King, Allen Schmaltz, and Martin A. Tanner. 2019. “Ecological Regression with Partial Identification.” Political Analysis, 28, 1, Pp. 1--22.Abstract

Ecological inference (EI) is the process of learning about individual behavior from aggregate data. We relax assumptions by allowing for linear contextual effects,'' which previous works have regarded as plausible but avoided due to non-identification, a problem we sidestep by deriving bounds instead of point estimates. In this way, we offer a conceptual framework to improve on the Duncan-Davis bound, derived more than sixty-five years ago. To study the effectiveness of our approach, we collect and analyze 8,430  2x2 EI datasets with known ground truth from several sources --- thus bringing considerably more data to bear on the problem than the existing dozen or so datasets available in the literature for evaluating EI estimators. For the 88% of real data sets in our collection that fit a proposed rule, our approach reduces the width of the Duncan-Davis bound, on average, by about 44%, while still capturing the true district level parameter about 99% of the time. The remaining 12% revert to the Duncan-Davis bound.

Easy-to-use software is available that implements all the methods described in the paper.

Gary King, Ori Rosen, Martin Tanner, and Alexander Wagner. 2008. “Ordinary Economic Voting Behavior in the Extraordinary Election of Adolf Hitler.” Journal of Economic History, 68, 4, Pp. 996.Abstract

The enormous Nazi voting literature rarely builds on modern statistical or economic research. By adding these approaches, we find that the most widely accepted existing theories of this era cannot distinguish the Weimar elections from almost any others in any country. Via a retrospective voting account, we show that voters most hurt by the depression, and most likely to oppose the government, fall into separate groups with divergent interests. This explains why some turned to the Nazis and others turned away. The consequences of Hitler's election were extraordinary, but the voting behavior that led to it was not.

## Software

Gary King and Kenneth Benoit. 2003. “EzI: A(n Easy) Program for Ecological Inference”. Published as part of the Gauss Package by Aptech Systems, Kent, Washington, and as a stand-alone program called EzI: A(n Easy) Program for Ecological Inference, by Kenneth Benoit and me.

## Discussions and Extensions

Gary King. 1999. “The Future of Ecological Inference Research: A Reply to Freedman et al.” Journal of the American Statistical Association, 94, Pp. 352-355.Abstract
I appreciate the editor’s invitation to reply to Freedman et al.’s (1998) review of "A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data" (Princeton University Press.) I welcome this scholarly critique and JASA’s decision to publish in this field. Ecological inference is a large and very important area for applications that is especially rich with open statistical questions. I hope this discussion stimulates much new scholarship. Freedman et al. raise several interesting issues, but also misrepresent or misunderstand the prior literature, my approach, and their own empirical analyses, and compound the problem, by refusing requests from me and the editor to make their data and software available for this note. Some clarification is thus in order.
Christopher Adolph, Gary King, Kenneth W Shotts, and Michael C Herron. 2003. “A Consensus on Second Stage Analyses in Ecological Inference Models.” Political Analysis, 11, Pp. 86–94.Abstract
Since Herron and Shotts (2003a and hereinafter HS), Adolph and King (2003 andhereinafter AK), and Herron and Shotts (2003b and hereinafter HS2), the four of us have iterated many more times, learned a great deal, and arrived at a consensus on this issue. This paper describes our joint recommendations for how to run second-stage ecological regressions, and provides detailed analyses to back up our claims.
Gary King. 2002. “Isolating Spatial Autocorrelation, Aggregation Bias, and Distributional Violations in Ecological Inference.” Political Analysis, 10, Pp. 298–300.Abstract
This is an invited response to an article by Anselin and Cho. I make two main points: The numerical results in this article violate no conclusions from prior literature, and the absence of the deterministic information from the bounds in the article’s analyses invalidates its theoretical discussion of spatial autocorrelation and all of its actual simulation results. An appendix shows how to draw simulations correctly.
Gary King. 2000. “Geography, Statistics, and Ecological Inference.” Annals of the Association of American Geographers, 90, Pp. 601–606.Abstract
I am grateful for such thoughtful review from these three distinguished geographers. Fotheringham provides an excellent summary of the approach offered, including how it combines the two methods that have dominated applications (and methodological analysis) for nearly half a century– the method of bounds (Duncan and Davis, 1953) and Goodman’s (1953) least squares regression. Since Goodman’s regression is the only method of ecological inference "widely used in Geography" (O’Loughlin), adding information that is known to be true from the method of bounds (for each observation) would seem to have the chance to improve a lot of research in this field. The other addition that EI provides is estimates at the lowest level of geography available, making it possible to map results, instead of giving only single summary numbers for the entire geographic region. Whether one considers the combined method offered "the" solution (as some reviewers and commentators have portrayed it), "a" solution (as I tried to describe it), or, perhaps better and more simply, as an improved method of ecological inference, is not importatnt. The point is that more data are better, and this method incorporates more. I am gratified that all three reviewers seem to support these basic points. In this response, I clarify a few points, correct some misunderstandings, and present additional evidence. I conclude with some possible directions for future research.

## Missing Data & Measurement Error

Statistical methods to accommodate missing information in data sets due to scattered unit nonresponse, missing variables, or values or variables measured with error. Easy-to-use algorithms and software for multiple imputation and multiple overimputation for surveys, time series, and time series cross-sectional data. Applications to electoral, and other compositional, data.
Georgina Evans, Gary King, Adam D. Smith, and Abhradeep Thakurta. Working Paper. “Differentially Private Survey Research”.Abstract
Survey researchers have long sought to protect the privacy of their respondents via de-identification (removing names and other directly identifying information) before sharing data. Although these procedures can help, recent research demonstrates that they fail to protect respondents from intentional re-identification attacks, a problem that threatens to undermine vast survey enterprises in academia, government, and industry. This is especially a problem in political science because political beliefs are not merely the subject of our scholarship; they represent some of the most important information respondents want to keep private. We confirm the problem in practice by re-identifying individuals from a survey about a controversial referendum declaring life beginning at conception. We build on the concept of "differential privacy" to offer new data sharing procedures with mathematical guarantees for protecting respondent privacy and statistical validity guarantees for social scientists analyzing differentially private data.  The cost of these new procedures is larger standard errors, which can be overcome with somewhat larger sample sizes.

## Methods

The methods developed in this paper greatly expands the size and types of data sets that can be imputed without difficulty, for cross-sectional, time series, and time series cross-sectional data. Gary King, James Honaker, Anne Joseph, and Kenneth Scheve. 2001. “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation.” American Political Science Review, 95, Pp. 49–69.Abstract

We propose a remedy for the discrepancy between the way political scientists analyze data with missing values and the recommendations of the statistics community. Methodologists and statisticians agree that "multiple imputation" is a superior approach to the problem of missing data scattered through one’s explanatory and dependent variables than the methods currently used in applied data analysis. The discrepancy occurs because the computational algorithms used to apply the best multiple imputation models have been slow, difficult to implement, impossible to run with existing commercial statistical packages, and have demanded considerable expertise. We adapt an algorithm and use it to implement a general-purpose, multiple imputation model for missing data. This algorithm is considerably easier to use than the leading method recommended in statistics literature. We also quantify the risks of current missing data practices, illustrate how to use the new procedure, and evaluate this alternative through simulated data as well as actual empirical examples. Finally, we offer easy-to-use that implements our suggested methods. (Software: AMELIA)

Develops multiple imputation methods for when entire survey questions are missing from some of a series of cross-sectional samples. Andrew Gelman, Gary King, and Chuanhai Liu. 1999. “Not Asked and Not Answered: Multiple Imputation for Multiple Surveys.” Journal of the American Statistical Association, 93, Pp. 846–857.Abstract
We present a method of analyzing a series of independent cross-sectional surveys in which some questions are not answered in some surveys and some respondents do not answer some of the questions posed. The method is also applicable to a single survey in which different questions are asked or different sampling methods are used in different strata or clusters. Our method involves multiply imputing the missing items and questions by adding to existing methods of imputation designed for single surveys a hierarchical regression model that allows covariates at the individual and survey levels. Information from survey weights is exploited by including in the analysis the variables on which the weights are based, and then reweighting individual responses (observed and imputed) to estimate population quantities. We also develop diagnostics for checking the fit of the imputation model based on comparing imputed data to nonimputed data. We illustrate with the example that motivated this project: a study of pre-election public opinion polls in which not all the questions of interest are asked in all the surveys, so that it is infeasible to impute within each survey separately.
A general purpose method for analyzing multiparty electoral data. Jonathan Katz and Gary King. 1999. “A Statistical Model for Multiparty Electoral Data.” American Political Science Review, 93, Pp. 15–32.Abstract
We propose a comprehensive statistical model for analyzing multiparty, district-level elections. This model, which provides a tool for comparative politics research analagous to that which regression analysis provides in the American two-party context, can be used to explain or predict how geographic distributions of electoral results depend upon economic conditions, neighborhood ethnic compositions, campaign spending, and other features of the election campaign or aggregate areas. We also provide new graphical representations for data exploration, model evaluation, and substantive interpretation. We illustrate the use of this model by attempting to resolve a controversy over the size of and trend in electoral advantage of incumbency in Britain. Contrary to previous analyses, all based on measures now known to be biased, we demonstrate that the advantage is small but meaningful, varies substantially across the parties, and is not growing. Finally, we show how to estimate the party from which each party’s advantage is predominantly drawn.
Multiple imputation for missing data had long been recognized as theoretical appropriate, but algorithms to use it were difficult, and applications were rare. This article introduced an easy-to-apply algorithm, making multiple imputation within reach of practicing social scientists. It, and the related software, has been widely used. James Honaker and Gary King. 2010. “What to do About Missing Values in Time Series Cross-Section Data.” American Journal of Political Science, 54, 3, Pp. 561-581. Publisher's VersionAbstract

Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last half-decade become common in American politics and political behavior. Scholars in these fields have thus increasingly avoided the biases and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation. However, researchers in much of comparative politics and international relations, and others with similar data, have been unable to do the same because the best available imputation methods work poorly with the time-series cross-section data structures common in these fields. We attempt to rectify this situation. First, we build a multiple imputation model that allows smooth time trends, shifts across cross-sectional units, and correlations over time and space, resulting in far more accurate imputations. Second, we build nonignorable missingness models by enabling analysts to incorporate knowledge from area studies experts via priors on individual missing cell values, rather than on difficult-to-interpret model parameters. Third, since these tasks could not be accomplished within existing imputation algorithms, in that they cannot handle as many variables as needed even in the simpler cross-sectional data for which they were designed, we also develop a new algorithm that substantially expands the range of computationally feasible data types and sizes for which multiple imputation can be used. These developments also made it possible to implement the methods introduced here in freely available open source software that is considerably more reliable than existing strategies.

Matthew Blackwell, James Honaker, and Gary King. 2017. “A Unified Approach to Measurement Error and Missing Data: Details and Extensions.” Sociological Methods and Research, 46, 3, Pp. 342-369. Publisher's VersionAbstract

We extend a unified and easy-to-use approach to measurement error and missing data. In our companion article, Blackwell, Honaker, and King give an intuitive overview of the new technique, along with practical suggestions and empirical applications. Here, we offer more precise technical details, more sophisticated measurement error model specifications and estimation procedures, and analyses to assess the approach’s robustness to correlated measurement errors and to errors in categorical variables. These results support using the technique to reduce bias and increase efficiency in a wide variety of empirical research.

Uses the insights from the above two articles to greatly increase the number of parties that can be analyzed. James Honaker, Gary King, and Jonathan N. Katz. 2002. “A Fast, Easy, and Efficient Estimator for Multiparty Electoral Data.” Political Analysis, 10, Pp. 84–100.Abstract
Katz and King (1999) develop a model for predicting or explaining aggregate electoral results in multiparty democracies. This model is, in principle, analogous to what least squares regression provides American politics researchers in that two-party system. Katz and King applied this model to three-party elections in England and revealed a variety of new features of incumbency advantage and where each party pulls support from. Although the mathematics of their statistical model covers any number of political parties, it is computationally very demanding, and hence slow and numerically imprecise, with more than three. The original goal of our work was to produce an approximate method that works quicker in practice with many parties without making too many theoretical compromises. As it turns out, the method we offer here improves on Katz and King’s (in bias, variance, numerical stability, and computational speed) even when the latter is computationally feasible. We also offer easy-to-use software that implements our suggestions.
We extend the algorithm in the previous paper to encompass classic missing data as an extreme version of measurement error, and to correct for both. Matthew Blackwell, James Honaker, and Gary King. 2017. “A Unified Approach to Measurement Error and Missing Data: Overview and Applications.” Sociological Methods and Research, 46, 3, Pp. 303-341. Publisher's VersionAbstract

Although social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (MI) framework by treating missing data problems as a limiting special case of extreme measurement error, and corrects for both. Like MI, the proposed framework is a simple two-step procedure, so that in the second step researchers can use whatever statistical method they would have if there had been no problem in the first place. We also offer empirical illustrations, open source software that implements all the methods described herein, and a companion paper with technical details and extensions (Blackwell, Honaker, and King, 2017b).

## Software

- James Honaker, Gary King, and Matthew Blackwell. 2011. “Amelia II: A Program for Missing Data.” Journal of Statistical Software, 45, 7, Pp. 1-47.Abstract

Amelia II is a complete R package for multiple imputation of missing data. The package implements a new expectation-maximization with bootstrapping algorithm that works faster, with larger numbers of variables, and is far easier to use, than various Markov chain Monte Carlo approaches, but gives essentially the same answers. The program also improves imputation models by allowing researchers to put Bayesian priors on individual cell values, thereby including a great deal of potentially valuable and extensive information. It also includes features to accurately impute cross-sectional datasets, individual time series, or sets of time series for diﬀerent cross-sections. A full set of graphical diagnostics are also available. The program is easy to use, and the simplicity of the algorithm makes it far more robust; both a simple command line and extensive graphical user interface are included.

Amelia II software web site

Michael Tomz, Jason Wittenberg, and Gary King. 2003. “CLARIFY: Software for Interpreting and Presenting Statistical Results.” Journal of Statistical Software.Abstract , which easily combines multiply imputed data in Stata.

This is a set of easy-to-use Stata macros that implement the techniques described in Gary King, Michael Tomz, and Jason Wittenberg's "Making the Most of Statistical Analyses: Improving Interpretation and Presentation". To install Clarify, type "net from https://gking.harvard.edu/clarify (https://gking.harvard.edu/clarify)" at the Stata command line.

Winner of the Okidata Best Research Software Award. Also try -ssc install qsim- to install a wrapper, donated by Fred Wolfe, to automate Clarify's simulation of dummy variables.

James Honaker, Gary King, and Matthew Blackwell. 2009. “AMELIA II: A Program for Missing Data”.Abstract
This program multiply imputes missing data in cross-sectional, time series, and time series cross-sectional data sets. It includes a Windows version (no knowledge of R required), and a version that works with R either from the command line or via a GUI.

## How Surveys Work

D. Steven Voss, Andrew Gelman, and Gary King. 1995. “Pre-Election Survey Methodology: Details From Nine Polling Organizations, 1988 and 1992.” Public Opinion Quarterly, 59, Pp. 98–132.Abstract

Before every presidential election, journalists, pollsters, and politicians commission dozens of public opinion polls. Although the primary function of these surveys is to forecast the election winners, they also generate a wealth of political data valuable even after the election. These preelection polls are useful because they are conducted with such frequency that they allow researchers to study change in estimates of voter opinion within very narrow time increments (Gelman and King 1993). Additionally, so many are conducted that the cumulative sample size of these polls is large enough to construct aggregate measures of public opinion within small demographic or geographical groupings (Wright, Erikson, and McIver 1985).

These advantages, however, are mitigated by the decentralized origin of the many preelection polls. The surveys are conducted by diverse private enterprises with procedures that differ significantly. Moreover, important methodological detail does not appear in the public record. Codebooks provided by the survey organizations are all incomplete; many are outdated and most are at least partly inaccurate. The most recent treatment in the academic literature, by Brady and Orren (1992), discusses the approach used by three companies but conceals their identities and omits most of the detail. ...

## Qualitative Research

How the same unified theory of inference underlies quantitative and qualitative research alike; scientific inference when quantification is difficult or impossible; research design; empirical research in legal scholarship.

## Scientific Inference in Qualitative Research

Gary King, Robert O Keohane, and Sidney Verba. 1995. “The Importance of Research Design in Political Science.” American Political Science Review, 89, Pp. 454–481.Abstract
Receiving five serious reviews in this symposium is gratifying and confirms our belief that research design should be a priority for our discipline. We are pleased that our five distinguished reviewers appear to agree with our unified approach to the logic of inference in the social sciences, and with our fundamental point: that good quantitative and good qualitative research designs are based fundamentally on the same logic of inference. The reviewers also raised virtually no objections to the main practical contribution of our book– our many specific procedures for avoiding bias, getting the most out of qualitative data, and making reliable inferences. However, the reviews make clear that although our book may be the latest word on research design in political science, it is surely not the last. We are taxed for failing to include important issues in our analysis and for dealing inadequately with some of what we included. Before responding to the reviewers’ more direct criticisms, let us explain what we emphasize in Designing Social Inquiry and how it relates to some of the points raised by the reviewers.
Gary King and Eleanor Neff Powell. 2008. “How Not to Lie Without Statistics”.Abstract
We highlight, and suggest ways to avoid, a large number of common misunderstandings in the literature about best practices in qualitative research. We discuss these issues in four areas: theory and data, qualitative and quantitative strategies, causation and explanation, and selection bias. Some of the misunderstandings involve incendiary debates within our discipline that are readily resolved either directly or with results known in research areas that happen to be unknown to political scientists. Many of these misunderstandings can also be found in quantitative research, often with different names, and some of which can be fixed with reference to ideas better understood in the qualitative methods literature. Our goal is to improve the ability of quantitatively and qualitatively oriented scholars to enjoy the advantages of insights from both areas. Thus, throughout, we attempt to construct specific practical guidelines that can be used to improve actual qualitative research designs, not only the qualitative methods literatures that talk about them.

## In Legal Research

Lee Epstein and Gary King. 2002. “The Rules of Inference.” University of Chicago Law Review, 69, Pp. 1–209.Abstract

Although the term "empirical research" has become commonplace in legal scholarship over the past two decades, law professors have, in fact, been conducting research that is empirical – that is, learning about the world using quantitative data or qualitative information – for almost as long as they have been conducting research. For just as long, however, they have been proceeding with little awareness of, much less compliance with, the rules of inference, and without paying heed to the key lessons of the revolution in empirical analysis that has been taking place over the last century in other disciplines. The tradition of including some articles devoted to exclusively to the methododology of empirical analysis – so well represented in journals in traditional academic fields – is virtually nonexistent in the nation’s law reviews. As a result, readers learn considerably less accurate information about the empirical world than the studies’ stridently stated, but overconfident, conclusions suggest. To remedy this situation both for the producers and consumers of empirical work, this Article adapts the rules of inference used in the natural and social sciences to the special needs, theories, and data in legal scholarship, and explicate them with extensive illustrations from existing research. The Article also offers suggestions for how the infrastructure of teaching and research at law schools might be reorganized so that it can better support the creation of first-rate empirical research without compromising other important objectives.

Lee Epstein and Gary King. 2003. “Building An Infrastructure for Empirical Research in the Law.” Journal of Legal Education, 53, Pp. 311–320.Abstract
In every discipline in which "empirical research" has become commonplace, scholars have formed a subfield devoted to solving the methodological problems unique to that discipline’s data and theoretical questions. Although students of economics, political science, psychology, sociology, business, education, medicine, public health, and so on primarily focus on specific substantive questions, they cannot wait for those in other fields to solve their methoodological problems or to teach them "new" methods, wherever they were initially developed. In "The Rules of Inference," we argued for the creation of an analogous methodological subfield devoted to legal scholarship. We also had two other objectives: (1) to adapt the rules of inference used in the natural and social sciences, which apply equally to quantitative and qualitative research, to the special needs, theories, and data in legal scholarship, and (2) to offer recommendations on how the infrastructure of teaching and research at law schools might be reorganized so that it could better support the creation of first-rate quantitative and qualitative empirical research without compromising other important objectives. Published commentaries on our paper, along with citations to it, have focused largely on the first-our application of the rules of inference to legal scholarship. Until now, discussions of our second goal-suggestions for the improvement of legal scholarship, as well as our argument for the creation of a group that would focus on methodological problems unique to law-have been relegated to less public forums, even though, judging from the volume of correspondence we have received, they seem to be no less extensive.

## Rare Events

How to save 99% of your data collection costs; bias corrections for logistic regression in estimating probabilities and causal effects in rare events data; estimating base probabilities or any quantity from case-control data; automated coding of events.

## Bias Correction

Develops corrections for the biases in logistic regression that occur when predicting or explaining rare outcomes (such as when you have many more zeros than ones). Corrections developed for standard prospective studies, as well as case-control designs. How to use "case-control designs" to save 99% of your data collection costs. These articles overlap:
For general mathematical proofs and other technical material: Gary King and Langche Zeng. 2001. “Logistic Regression in Rare Events Data.” Political Analysis, 9, Pp. 137–163.Abstract
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros ("nonevents"). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all variable events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.
An applied companion paper to the previous article that has more examples and pedagogical material but none of the mathematical proofs. Gary King and Langche Zeng. 2001. “Explaining Rare Events in International Relations.” International Organization, 55, Pp. 693–715.Abstract
Some of the most important phenomena in international conflict are coded s "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that seems to have at least two sources. First, and most importantly, the data collection strategies used in international conflict are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of non-events (peace). This enables scholars to save as much as 99% of their (non-fixed) data collection costs, or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly used statistical procedures, can underestimate the probability of rare events. We introduce some corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. We also provide easy-to-use methods and software that link these two results, enabling both types of corrections to work simultaneously.

## Hidden Region 1

Example of an analysis of case-control data. Also an independent evaluation of the U.S. State Failure Task Force, including improved methods of forecasting state failure and assessing its causes. Gary King and Langche Zeng. 2001. “Improving Forecasts of State Failure.” World Politics, 53, Pp. 623–658.Abstract

## Estimating Base Probabilities

A method to estimate base probabilities or any quantity of interest from case-control data, even with no (or partial) auxiliary information. Discusses problems with odds-ratios.
The original article: Gary King and Langche Zeng. 2002. “Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control Studies.” Statistics in Medicine, 21, Pp. 1409–1427.Abstract
Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just the relative risks and rates, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information.
A revised and extended version of the previous article. Gary King and Langche Zeng. 2004. “Inference in Case-Control Studies.” In Encyclopedia of Biopharmaceutical Statistics, edited by Shein-Chung Chow, 2nd ed. New York: Marcel Dekker.Abstract

Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just the relative risks and rates, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information.

## Hidden Region 2

The first extensive empirical study of the probability of your vote changing the outcome of a U.S. presidential election? Most previous studies of the probability of a tied vote have involved theoretical calculation without data. Andrew Gelman, Gary King, and John Boscardin. 1998. “Estimating the Probability of Events that Have Never Occurred: When Is Your Vote Decisive?” Journal of the American Statistical Association, 93, Pp. 1–9.Abstract
Researchers sometimes argue that statisticians have little to contribute when few realizations of the process being estimated are observed. We show that this argument is incorrect even in the extreme situation of estimating the probabilities of events so rare that they have never occurred. We show how statistical forecasting models allow us to use empirical data to improve inferences about the probabilities of these events. Our application is estimating the probability that your vote will be decisive in a U.S. presidential election, a problem that has been studied by political scientists for more than two decades. The exact value of this probability is of only minor interest, but the number has important implications for understanding the optimal allocation of campaign resources, whether states and voter groups receive their fair share of attention from prospective presidents, and how formal "rational choice" models of voter behavior might be able to explain why people vote at all. We show how the probability of a decisive vote can be estimated empirically from state-level forecasts of the presidential election and illustrate with the example of 1992. Based on generalizations of standard political science forecasting models, we estimate the (prospective) probability of a single vote being decisive as about 1 in 10 million for close national elections such as 1992, varying by about a factor of 10 among states. Our results support the argument that subjective probabilities of many types are best obtained through empirically based statistical prediction models rather than solely through mathematical reasoning. We discuss the implications of our findings for the types of decision analyses used in public choice studies.
James E Alt, Gary King, and Curtis Signorino. 2001. “Aggregation Among Binary, Count, and Duration Models: Estimating the Same Quantities from Different Levels of Data.” Political Analysis, 9, Pp. 21–44.Abstract
Binary, count and duration data all code discrete events occurring at points in time. Although a single data generation process can produce all of these three data types, the statistical literature is not very helpful in providing methods to estimate parameters of the same process from each. In fact, only single theoretical process exists for which know statistical methods can estimate the same parameters - and it is generally used only for count and duration data. The result is that seemingly trivial decisions abut which level of data to use can have important consequences for substantive interpretations. We describe the theoretical event process for which results exist, based on time independence. We also derive a set of models for a time-dependent process and compare their predictions to those of a commonly used model. Any hope of understanding and avoiding the more serious problems of aggregation bias in events data is contingent on first deriving a much wider arsenal of statistical models and theoretical processes that are not constrained by the particular forms of data that happen to be available. We discuss these issues and suggest an agenda for political methodologists interested in this very large class of aggregation problems.

## Survey Research

How surveys work and a variety of methods to use with surveys. Surveys for estimating death rates, why election polls are so variable when the vote is so predictable, and health inequality.
Aaron Kaufman, Gary King, and Mayya Komisarchik. Forthcoming. “How to Measure Legislative District Compactness If You Only Know it When You See It.” American Journal of Political Science.Abstract

To deter gerrymandering, many state constitutions require legislative districts to be "compact." Yet, the law offers few precise definitions other than "you know it when you see it," which effectively implies a common understanding of the concept. In contrast, academics have shown that compactness has multiple dimensions and have generated many conflicting measures. We hypothesize that both are correct -- that compactness is complex and multidimensional, but a common understanding exists across people. We develop a survey to elicit this understanding, with high reliability (in data where the standard paired comparisons approach fails). We create a statistical model that predicts, with high accuracy, solely from the geometric features of the district, compactness evaluations by judges and public officials responsible for redistricting, among others. We also offer compactness data from our validated measure for 20,160 state legislative and congressional districts, as well as open source software to compute this measure from any district.

Winner of the 2018 Robert H Durr Award from the MPSA.

Georgina Evans, Gary King, Adam D. Smith, and Abhradeep Thakurta. Working Paper. “Differentially Private Survey Research”.Abstract
Survey researchers have long sought to protect the privacy of their respondents via de-identification (removing names and other directly identifying information) before sharing data. Although these procedures can help, recent research demonstrates that they fail to protect respondents from intentional re-identification attacks, a problem that threatens to undermine vast survey enterprises in academia, government, and industry. This is especially a problem in political science because political beliefs are not merely the subject of our scholarship; they represent some of the most important information respondents want to keep private. We confirm the problem in practice by re-identifying individuals from a survey about a controversial referendum declaring life beginning at conception. We build on the concept of "differential privacy" to offer new data sharing procedures with mathematical guarantees for protecting respondent privacy and statistical validity guarantees for social scientists analyzing differentially private data.  The cost of these new procedures is larger standard errors, which can be overcome with somewhat larger sample sizes.
Soubhik Barari, Stefano Caria, Antonio Davola, Paolo Falco, Thiemo Fetzer, Stefano Fiorin, Lukas Hensel, Andriy Ivchenko, Jon Jachimowicz, Gary King, Gordon Kraft-Todd, Alice Ledda, Mary MacLennan, Lucian Mutoi, Claudio Pagani, Elena Reutskaja, Christopher Roth, and Federico Raimondi Slepoi. 2020. “Evaluating COVID-19 Public Health Messaging in Italy: Self-Reported Compliance and Growing Mental Health Concerns”. Publisher's VersionAbstract

Purpose: The COVID-19 death-rate in Italy continues to climb, surpassing that in every other country. We implement one of the first nationally representative surveys about this unprecedented public health crisis and use it to evaluate the Italian government’ public health efforts and citizen responses.
Findings: (1) Public health messaging is being heard. Except for slightly lower compliance among young adults, all subgroups we studied understand how to keep themselves and others safe from the SARS-Cov-2 virus. Remarkably, even those who do not trust the government, or think the government has been untruthful about the crisis believe the messaging and claim to be acting in accordance. (2) The quarantine is beginning to have serious negative effects on the population’s mental health.
Policy Recommendations: Communications focus should move from explaining to citizens that they should stay at home to what they can do there. We need interventions that make staying at home and following public health protocols more desirable. These interventions could include virtual social interactions, such as online social reading activities, classes, exercise routines, etc. — all designed to reduce the boredom of long term social isolation and to increase the attractiveness of following public health recommendations. Interventions like these will grow in importance as the crisis wears on around the world, and staying inside wears on people.

Replication data for this study in dataverse

D. Steven Voss, Andrew Gelman, and Gary King. 1995. “Pre-Election Survey Methodology: Details From Nine Polling Organizations, 1988 and 1992.” Public Opinion Quarterly, 59, Pp. 98–132.Abstract

Before every presidential election, journalists, pollsters, and politicians commission dozens of public opinion polls. Although the primary function of these surveys is to forecast the election winners, they also generate a wealth of political data valuable even after the election. These preelection polls are useful because they are conducted with such frequency that they allow researchers to study change in estimates of voter opinion within very narrow time increments (Gelman and King 1993). Additionally, so many are conducted that the cumulative sample size of these polls is large enough to construct aggregate measures of public opinion within small demographic or geographical groupings (Wright, Erikson, and McIver 1985).

These advantages, however, are mitigated by the decentralized origin of the many preelection polls. The surveys are conducted by diverse private enterprises with procedures that differ significantly. Moreover, important methodological detail does not appear in the public record. Codebooks provided by the survey organizations are all incomplete; many are outdated and most are at least partly inaccurate. The most recent treatment in the academic literature, by Brady and Orren (1992), discusses the approach used by three companies but conceals their identities and omits most of the detail. ...

Andrew Gelman and Gary King. 1993. “Why are American Presidential Election Campaign Polls so Variable when Votes are so Predictable?” British Journal of Political Science, 23, Pp. 409–451.Abstract

As most political scientists know, the outcome of the U.S. Presidential election can be predicted within a few percentage points (in the popular vote), based on information available months before the election. Thus, the general election campaign for president seems irrelevant to the outcome (except in very close elections), despite all the media coverage of campaign strategy. However, it is also well known that the pre-election opinion polls can vary wildly over the campaign, and this variation is generally attributed to events in the campaign. How can campaign events affect people’s opinions on whom they plan to vote for, and yet not affect the outcome of the election? For that matter, why do voters consistently increase their support for a candidate during his nominating convention, even though the conventions are almost entirely predictable events whose effects can be rationally forecast? In this exploratory study, we consider several intuitively appealing, but ultimately wrong, resolutions to this puzzle, and discuss our current understanding of what causes opinion polls to fluctuate and yet reach a predictable outcome. Our evidence is based on graphical presentation and analysis of over 67,000 individual-level responses from forty-nine commercial polls during the 1988 campaign and many other aggregate poll results from the 1952–1992 campaigns. We show that responses to pollsters during the campaign are not generally informed or even, in a sense we describe, "rational." In contrast, voters decide which candidate to eventually support based on their enlightened preferences, as formed by the information they have learned during the campaign, as well as basic political cues such as ideology and party identification. We cannot prove this conclusion, but we do show that it is consistent with the aggregate forecasts and individual-level opinion poll responses. Based on the enlightened preferences hypothesis, we conclude that the news media have an important effect on the outcome of Presidential elections–-not due to misleading advertisements, sound bites, or spin doctors, but rather by conveying candidates’ positions on important issues.

Anchoring Vignettes (for interpersonal incomparability) Methods for interpersonal incomparability, when respondents (from different cultures, genders, countries, or ethnic groups) understand survey questions in different ways; for developing theoretical definitions of complicated concepts apparently definable only by example (i.e., "you know it when you see it").: Website

Imputing Missing Data due to survey nonresponse: Website

Analyzing Rare Events, including rare survey outcomes and alternative methods of sampling for rare events: Website

Gary King and Ying Lu. 2008. “Verbal Autopsy Methods with Multiple Causes of Death.” Statistical Science, 23, Pp. 78–91.Abstract
Verbal autopsy procedures are widely used for estimating cause-specific mortality in areas without medical death certification. Data on symptoms reported by caregivers along with the cause of death are collected from a medical facility, and the cause-of-death distribution is estimated in the population where only symptom data are available. Current approaches analyze only one cause at a time, involve assumptions judged difficult or impossible to satisfy, and require expensive, time consuming, or unreliable physician reviews, expert algorithms, or parametric statistical models. By generalizing current approaches to analyze multiple causes, we show how most of the difficult assumptions underlying existing methods can be dropped. These generalizations also make physician review, expert algorithms, and parametric statistical assumptions unnecessary. With theoretical results, and empirical analyses in data from China and Tanzania, we illustrate the accuracy of this approach. While no method of analyzing verbal autopsy data, including the more computationally intensive approach offered here, can give accurate estimates in all circumstances, the procedure offered is conceptually simpler, less expensive, more general, as or more replicable, and easier to use in practice than existing approaches. We also show how our focus on estimating aggregate proportions, which are the quantities of primary interest in verbal autopsy studies, may also greatly reduce the assumptions necessary, and thus improve the performance of, many individual classifiers in this and other areas. As a companion to this paper, we also offer easy-to-use software that implements the methods discussed herein.
Gary King, Ying Lu, and Kenji Shibuya. 2010. “Designing Verbal Autopsy Studies.” Population Health Metrics, 8, 19.Abstract
Background: Verbal autopsy analyses are widely used for estimating cause-specific mortality rates (CSMR) in the vast majority of the world without high quality medical death registration. Verbal autopsies -- survey interviews with the caretakers of imminent decedents -- stand in for medical examinations or physical autopsies, which are infeasible or culturally prohibited. Methods and Findings: We introduce methods, simulations, and interpretations that can improve the design of automated, data-derived estimates of CSMRs, building on a new approach by King and Lu (2008). Our results generate advice for choosing symptom questions and sample sizes that is easier to satisfy than existing practices. For example, most prior effort has been devoted to searching for symptoms with high sensitivity and specificity, which has rarely if ever succeeded with multiple causes of death. In contrast, our approach makes this search irrelevant because it can produce unbiased estimates even with symptoms that have very low sensitivity and specificity. In addition, the new method is optimized for survey questions caretakers can easily answer rather than questions physicians would ask themselves. We also offer an automated method of weeding out biased symptom questions and advice on how to choose the number of causes of death, symptom questions to ask, and observations to collect, among others. Conclusions: With the advice offered here, researchers should be able to design verbal autopsy surveys and conduct analyses with greatly reduced statistical biases and research costs.
Emmanuela Gakidou and Gary King. 2006. “Death by Survey: Estimating Adult Mortality without Selection Bias from Sibling Survival Data.” Demography, 43, Pp. 569–585.Abstract
The widely used methods for estimating adult mortality rates from sample survey responses about the survival of siblings, parents, spouses, and others depend crucially on an assumption that we demonstrate does not hold in real data. We show that when this assumption is violated – so that the mortality rate varies with sibship size – mortality estimates can be massively biased. By using insights from work on the statistical analysis of selection bias, survey weighting, and extrapolation problems, we propose a new and relatively simple method of recovering the mortality rate with both greatly reduced potential for bias and increased clarity about the source of necessary assumptions.
Emmanuela Gakidou and Gary King. 2002. “Measuring Total Health Inequality: Adding Individual Variation to Group-Level Differences.” BioMed Central: International Journal for Equity in Health, 1.Abstract
Background: Studies have revealed large variations in average health status across social, economic, and other groups. No study exists on the distribution of the risk of ill-health across individuals, either within groups or across all people in a society, and as such a crucial piece of total health inequality has been overlooked. Some of the reason for this neglect has been that the risk of death, which forms the basis for most measures, is impossible to observe directly and difficult to estimate. Methods: We develop a measure of total health inequality – encompassing all inequalities among people in a society, including variation between and within groups – by adapting a beta-binomial regression model. We apply it to children under age two in 50 low- and middle-income countries. Our method has been adopted by the World Health Organization and is being implemented in surveys around the world and preliminary estimates have appeared in the World Health Report (2000). Results: Countries with similar average child mortality differ considerably in total health inequality. Liberia and Mozambique have the largest inequalities in child survival, while Colombia, the Philippines and Kazakhstan have the lowest levels among the countries measured. Conclusions: Total health inequality estimates should be routinely reported alongside average levels of health in populations and groups, as they reveal important policy-related information not otherwise knowable. This approach enables meaningful comparisons of inequality across countries and future analyses of the determinants of inequality.

## Unifying Statistical Analysis

Development of a unified approach to statistical modeling, inference, interpretation, presentation, analysis, and software; integrated with most of the other projects listed here.
Beau Coker, Cynthia Rudin, and Gary King. 2021. “A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results.” Management Science, Pp. 1-24. Publisher's VersionAbstract
Inference is the process of using facts we know to learn about facts we do not know. A theory of inference gives assumptions necessary to get from the former to the latter, along with a definition for and summary of the resulting uncertainty. Any one theory of inference is neither right nor wrong, but merely an axiom that may or may not be useful. Each of the many diverse theories of inference can be valuable for certain applications. However, no existing theory of inference addresses the tendency to choose, from the range of plausible data analysis specifications consistent with prior evidence, those that inadvertently favor one's own hypotheses. Since the biases from these choices are a growing concern across scientific fields, and in a sense the reason the scientific community was invented in the first place, we introduce a new theory of inference designed to address this critical problem. We derive "hacking intervals," which are the range of a summary statistic one may obtain given a class of possible endogenous manipulations of the data. Hacking intervals require no appeal to hypothetical data sets drawn from imaginary superpopulations. A scientific result with a small hacking interval is more robust to researcher manipulation than one with a larger interval, and is often easier to interpret than a classical confidence interval. Some versions of hacking intervals turn out to be equivalent to classical confidence intervals, which means they may also provide a more intuitive and potentially more useful interpretation of classical confidence intervals.

## Unifying Approaches to Statistical Analysis

Gary King and Margaret E. Roberts. 2015. “How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It.” Political Analysis, 23, 2, Pp. 159–179. Publisher's VersionAbstract

"Robust standard errors" are used in a vast array of scholarship to correct standard errors for model misspecification. However, when misspecification is bad enough to make classical and robust standard errors diverge, assuming that it is nevertheless not so bad as to bias everything else requires considerable optimism. And even if the optimism is warranted, settling for a misspecified model, with or without robust standard errors, will still bias estimators of all but a few quantities of interest. The resulting cavernous gap between theory and practice suggests that considerable gains in applied statistics may be possible. We seek to help researchers realize these gains via a more productive way to understand and use robust standard errors; a new general and easier-to-use "generalized information matrix test" statistic that can formally assess misspecification (based on differences between robust and classical variance estimates); and practical illustrations via simulations and real examples from published research. How robust standard errors are used needs to change, but instead of jettisoning this popular tool we show how to use it to provide effective clues about model misspecification, likely biases, and a guide to considerably more reliable, and defensible, inferences. Accompanying this article is open source software that implements the methods we describe.

Gary King. 1991. “Calculating Standard Errors of Predicted Values based on Nonlinear Functional Forms.” The Political Methodologist, 4.Abstract

Whenever we report predicted values, we should also report some measure of the uncertainty of these estimates. In the linear case, this is relatively simple, and the answer well-known, but with nonlinear models the answer may not be apparent. This short article shows how to make these calculations. I first present this for the familiar linear case, also reviewing the two forms of uncertainty in these estimates, and then show how to calculate these for any arbitrary function. An example appears last.

Generalizes the unification in the book (replacing its Section 5.2 with simulation to compute quantities of interest). This paper, which was originally titled "Enough with the Logit Coefficients, Already!", explains how to compute any quantity of interest from almost any statistical model; and shows, with replications of several published works, how to extract considerably more information than standard practices, without changing any data or statistical assumptions. Gary King, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science, 44, Pp. 341–355. Publisher's VersionAbstract
Social Scientists rarely take full advantage of the information available in their statistical results. As a consequence, they miss opportunities to present quantities that are of greatest substantive interest for their research and express the appropriate degree of certainty about these quantities. In this article, we offer an approach, built on the technique of statistical simulation, to extract the currently overlooked information from any statistical method and to interpret and present it in a reader-friendly manner. Using this technique requires some expertise, which we try to provide herein, but its application should make the results of quantitative articles more informative and transparent. To illustrate our recommendations, we replicate the results of several published works, showing in each case how the authors’ own conclusions can be expressed more sharply and informatively, and, without changing any data or statistical assumptions, how our approach reveals important new information about the research questions at hand. We also offer very easy-to-use Clarify software that implements our suggestions.
Software that accompanies the above article and implements its key ideas in easy-to-use Stata macros. Michael Tomz, Jason Wittenberg, and Gary King. 2003. “CLARIFY: Software for Interpreting and Presenting Statistical Results.” Journal of Statistical Software.Abstract

This is a set of easy-to-use Stata macros that implement the techniques described in Gary King, Michael Tomz, and Jason Wittenberg's "Making the Most of Statistical Analyses: Improving Interpretation and Presentation". To install Clarify, type "net from https://gking.harvard.edu/clarify (https://gking.harvard.edu/clarify)" at the Stata command line.

Winner of the Okidata Best Research Software Award. Also try -ssc install qsim- to install a wrapper, donated by Fred Wolfe, to automate Clarify's simulation of dummy variables.

A generalization of Clarify, and much other software, implemented in R. The extensive manual encompasses most of the above works and can be read independently as an introduction to wide range of models. Under active development. Kosuke Imai, Gary King, and Olivia Lau. 2006. “Zelig: Everyone's Statistical Software”.
A paper that describes the advances underlying Zelig software: Kosuke Imai, Gary King, and Olivia Lau. 2008. “Toward A Common Framework for Statistical Analysis and Development.” Journal of Computational Graphics and Statistics, 17, Pp. 1–22.Abstract
We describe some progress toward a common framework for statistical analysis and software development built on and within the R language, including R’s numerous existing packages. The framework we have developed offers a simple unified structure and syntax that can encompass a large fraction of statistical procedures already implemented in R, without requiring any changes in existing approaches. We conjecture that it can be used to encompass and present simply a vast majority of existing statistical methods, regardless of the theory of inference on which they are based, notation with which they were developed, and programming syntax with which they have been implemented. This development enabled us, and should enable others, to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous academic disciplines. The approach also enables one to build a graphical user interface that automatically includes any method encompassed within the framework. We hope that the result of this line of research will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied researchers whether or not they use or program in R.

## Related Materials

Gary King. 2009. “The Changing Evidence Base of Social Science Research.” In The Future of Political Science: 100 Perspectives, edited by Gary King, Kay Schlozman, and Norman Nie. New York: Routledge Press.Abstract

This (two-page) article argues that the evidence base of political science and the related social sciences are beginning an underappreciated but historic change.

Gary King. 1986. “How Not to Lie With Statistics: Avoiding Common Mistakes in Quantitative Political Science.” American Journal of Political Science, 30, Pp. 666–687.Abstract
This article identifies a set of serious theoretical mistakes appearing with troublingly high frequency throughout the quantitative political science literature. These mistakes are all based on faulty statistical theory or on erroneous statistical analysis. Through algebraic and interpretive proofs, some of the most commonly made mistakes are explicated and illustrated. The theoretical problem underlying each is highlighted, and suggested solutions are provided throughout. It is argued that closer attention to these problems and solutions will result in more reliable quantitative analyses and more useful theoretical contributions.
Gary King. 1991. “On Political Methodology.” Political Analysis, 2, Pp. 1–30.Abstract
"Politimetrics" (Gurr 1972), "polimetrics" (Alker 1975), "politometrics" (Hilton 1976), "political arithmetic" (Petty [1672] 1971), "quantitative Political Science (QPS)," "governmetrics," "posopolitics" (Papayanopoulos 1973), "political science statistics (Rai and Blydenburgh 1973), "political statistics" (Rice 1926). These are some of the names that scholars have used to describe the field we now call "political methodology." The history of political methodology has been quite fragmented until recently, as reflected by this patchwork of names. The field has begun to coalesce during the past decade and we are developing persistent organizations, a growing body of scholarly literature, and an emerging consensus about important problems that need to be solved. I make one main point in this article: If political methodology is to play an important role in the future of political science, scholars will need to find ways of representing more interesting political contexts in quantitative analyses. This does not mean that scholars should just build more and more complicated statistical models. Instead, we need to represent more of the essence of political phenomena in our models. The advantage of formal and quantitative approaches is that they are abstract representations of the political world and are, thus, much clearer. We need methods that enable us to abstract the right parts of the phenomenon we are studying and exclude everything superfluous. Despite the fragmented history of quantitative political analysis, a version of this goal has been voiced frequently by both quantitative researchers and their critics (Sec. 2). However, while recognizing this shortcoming, earlier scholars were not in the position to rectify it, lacking the mathematical and statistical tools and, early on, the data. Since political methodologists have made great progress in these and other areas in recent years, I argue that we are now capable of realizing this goal. In section 3, I suggest specific approaches to this problem. Finally, in section 4, I provide two modern examples, ecological inference and models of spatial autocorrelation, to illustrate these points.
Jeff Gill and Gary King. 2004. “What to do When Your Hessian is Not Invertible: Alternatives to Model Respecification in Nonlinear Estimation.” Sociological Methods and Research, 32, Pp. 54-87.Abstract
What should a researcher do when statistical analysis software terminates before completion with a message that the Hessian is not invertable? The standard textbook advice is to respecify the model, but this is another way of saying that the researcher should change the question being asked. Obviously, however, computer programs should not be in the business of deciding what questions are worthy of study. Although noninvertable Hessians are sometimes signals of poorly posed questions, nonsensical models, or inappropriate estimators, they also frequently occur when information about the quantities of interest exists in the data, through the likelihood function. We explain the problem in some detail and lay out two preliminary proposals for ways of dealing with noninvertable Hessians without changing the question asked.