Gary King is the Weatherhead University Professor at Harvard University. He also serves as Director of the Institute for Quantitative Social Science. He and his research group develop and apply empirical methods in many areas of social science research. Full bio and CV

Research Areas

    • Evaluating Social Security Forecasts
      The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, government programs comprising greater than 50% of all federal government expenditures, industry decision making, and the evidence base of many scholarly articles. Forecasts are also essential for scoring policy proposals, put forward by both political parties. Because SSA makes public little replication information, and uses ad hoc, qualitative, and antiquated statistical forecasting methods, no one in or out of government has been able to produce fully independent alternative forecasts or policy scorings. Yet, no systematic evaluation of SSA forecasts has ever been published by SSA or anyone else. We show that SSA's forecasting errors were approximately unbiased until about 2000, but then began to grow quickly, with increasingly overconfident uncertainty intervals. Moreover, the errors all turn out to be in the same potentially dangerous direction, each making the Social Security Trust Funds look healthier than they actually are. We also discover the cause of these findings with evidence from a large number of interviews we conducted with participants at every level of the forecasting and policy processes. We show that SSA's forecasting procedures meet all the conditions the modern social-psychology and statistical literatures demonstrate make bias likely. When those conditions mixed with potent new political forces trying to change Social Security and influence the forecasts, SSA's actuaries hunkered down trying hard to insulate themselves from the intense political pressures. Unfortunately, this otherwise laudable resistance to undue influence, along with their ad hoc qualitative forecasting models, led them to also miss important changes in the input data such as retirees living longer lives, and drawing more benefits, than predicted by simple extrapolations. We explain that solving this problem involves using (a) removing human judgment where possible, by using formal statistical methods -- via the revolution in data science and big data; (b) instituting formal structural procedures when human judgment is required -- via the revolution in social psychological research; and (c) requiring transparency and data sharing to catch errors that slip through -- via the revolution in data sharing & replication.An article at Barron's about our work.
    • Incumbency Advantage
      Proof that previously used estimators of electoral incumbency advantage were biased, and a new unbiased estimator. Also, the first systematic demonstration that constituency service by legislators increases the incumbency advantage.
    • Information Control by Authoritarian Governments
      Reverse engineering Chinese information controls -- the most extensive effort to selectively control human expression in the history of the world. We show that this massive effort to slow the flow of information paradoxically also conveys a great deal about the intentions, goals, and actions of the leaders. We downloaded all Chinese social media posts before the government could read and censor them; wrote and posted comments randomly assigned to our categories on hundreds of websites across the country to see what would be censored; set up our own social media website in China; and discovered that the Chinese government fabricates and posts 450 million social media comments a year in the names of ordinary people and convinced those posting (and inadvertently even the government) to admit to their activities. We found that the goverment does not engage on controversial issues (they do not censor criticism or fabricate posts that argue with those who disagree with the government), but they respond on an emergency basis to stop collective action (with censorship, fabricating posts with giant bursts of cheerleading-type distractions, responding to citizen greviances, etc.). They don't care what you think of them or say about them; they only care what you can do.
    • Mexican Health Care Evaluation
      An evaluation of the Mexican Seguro Popular program (designed to extend health insurance and regular and preventive medical care, pharmaceuticals, and health facilities to 50 million uninsured Mexicans), one of the world's largest health policy reforms of the last two decades. Our evaluation features a new design for field experiments that is more robust to the political interventions and implementation errors that have ruined many similar previous efforts; new statistical methods that produce more reliable and efficient results using fewer resources, assumptions, and data, as well as standard errors that are as much as 600% smaller; and an implementation of these methods in the largest randomized health policy experiment to date. (See the Harvard Gazette story on this project.)
    • Presidency Research; Voting Behavior
      Resolution of the paradox of why polls are so variable over time during presidential campaigns even though the vote outcome is easily predictable before it starts. Also, a resolution of a key controversy over absentee ballots during the 2000 presidential election; and the methodology of small-n research on executives.
    • Informatics and Data Sharing
      Replication Standards New standards, protocols, and software for citing, sharing, analyzing, archiving, preserving, distributing, cataloging, translating, disseminating, naming, verifying, and replicating scholarly research data and analyses. Also includes proposals to improve the norms of data sharing and replication in science.
    • International Conflict
      Methods for coding, analyzing, and forecasting international conflict and state failure. Evidence that the causes of conflict, theorized to be important but often found to be small or ephemeral, are indeed tiny for the vast majority of dyads, but are large, stable, and replicable wherever the ex ante probability of conflict is large.
    • Legislative Redistricting
      The definition of partisan symmetry as a standard for fairness in redistricting; methods and software for measuring partisan bias and electoral responsiveness; discussion of U.S. Supreme Court rulings about this work. Evidence that U.S. redistricting reduces bias and increases responsiveness, and that the electoral college is fair; applications to legislatures, primaries, and multiparty systems.
    • Mortality Studies
      Methods for forecasting mortality rates (overall or for time series data cross-classified by age, sex, country, and cause); estimating mortality rates in areas without vital registration; measuring inequality in risk of death; applications to US mortality, the future of the Social Security, armed conflict, heart failure, and human security.
    • Teaching and Administration
      Publications and other projects designed to improve teaching, learning, and university administration, as well as broader writings on the future of the social sciences.
    • Automated Text Analysis
      Automated and computer-assisted methods of extracting, organizing, understanding, conceptualizing, and consuming knowledge from massive quantities of unstructured text.
    • Anchoring Vignettes (for interpersonal incomparability)
      Methods for interpersonal incomparability, when respondents (from different cultures, genders, countries, or ethnic groups) understand survey questions in different ways; for developing theoretical definitions of complicated concepts apparently definable only by example (i.e., "you know it when you see it").
    • Causal Inference
      Methods for detecting and reducing model dependence (i.e., when minor model changes produce substantively different inferences) in inferring causal effects and other counterfactuals. Matching methods; "politically robust" and cluster-randomized experimental designs; causal bias decompositions.
    • Event Counts and Durations
      Statistical models to explain or predict how many events occur for each fixed time period, or the time between events. An application to cabinet dissolution in parliamentary democracies which united two previously warring scholarly literature. Other applications to international relations and U.S. Supreme Court appointments.
    • Ecological Inference
      Inferring individual behavior from group-level data: The first approach to incorporate both unit-level deterministic bounds and cross-unit statistical information, methods for 2x2 and larger tables, Bayesian model averaging, applications to elections, software.
    • Missing Data & Measurement Error
      Statistical methods to accommodate missing information in data sets due to scattered unit nonresponse, missing variables, or values or variables measured with error. Easy-to-use algorithms and software for multiple imputation and multiple overimputation for surveys, time series, and time series cross-sectional data. Applications to electoral, and other compositional, data.
    • Qualitative Research
      How the same unified theory of inference underlies quantitative and qualitative research alike; scientific inference when quantification is difficult or impossible; research design; empirical research in legal scholarship.
    • Rare Events
      How to save 99% of your data collection costs; bias corrections for logistic regression in estimating probabilities and causal effects in rare events data; estimating base probabilities or any quantity from case-control data; automated coding of events.
    • Survey Research
      How surveys work and a variety of methods to use with surveys. Surveys for estimating death rates, why election polls are so variable when the vote is so predictable, and health inequality.
    • Unifying Statistical Analysis
      Development of a unified approach to statistical modeling, inference, interpretation, presentation, analysis, and software; integrated with most of the other projects listed here.

Recent Papers

Racial Fairness in Legislative Redistricting

Racial Fairness in Legislative Redistricting
Gary King, John Bruce, and Andrew Gelman. 1996. “Racial Fairness in Legislative Redistricting.” In Classifying by Race, edited by Paul E Peterson, Pp. 85-110. Princeton: Princeton University Press.Abstract
In this chapter, we study standards of racial fairness in legislative redistricting- a field that has been the subject of considerable legislation, jurisprudence, and advocacy, but very little serious academic scholarship. We attempt to elucidate how basic concepts about "color-blind" societies, and similar normative preferences, can generate specific practical standards for racial fairness in representation and redistricting. We also provide the normative and theoretical foundations on which concepts such as proportional representation rest, in order to give existing preferences of many in the literature a firmer analytical foundation.
Read more

Advantages of Conflictual Redistricting

Advantages of Conflictual Redistricting
Andrew Gelman and Gary King. 1996. “Advantages of Conflictual Redistricting.” In Fixing the Boundary: Defining and Redefining Single-Member Electoral Districts, edited by Iain McLean and David Butler, Pp. 207–218. Aldershot, England: Dartmouth Publishing Company.Abstract
This article describes the results of an analysis we did of state legislative elections in the United States, where each state is required to redraw the boundaries of its state legislative districts every ten years. In the United States, redistrictings are sometimes controlled by the Democrats, sometimes by the Republicans, and sometimes by bipartisan committees, but never by neutral boundary commissions. Our goal was to study the consequences of redistricting and at the conclusion of this article, we discuss how our findings might be relevant to British elections.
Read more

Why Context Should Not Count

Gary King. 1996. “Why Context Should Not Count.” Political Geography, 15, Pp. 159–164.Abstract

This paper is an invited comment on a paper by John Agnew. I largely agree with Agnew’s comments and thus focus on remaining areas wehre an alternative perspective might be useful. My argument is that political geographers should not be so concerned with demonstrating that context matters. My reasoning is based on three arguments. First, in fact context rarely counts (Section 1) and, second, the most productive practical goal for political researchers should be to show that it does not count (Section 2). Finally, a disproportionate focus on ‘context counting’ can lead, and has led, to some seriosu problems in practical research situations, such as attempting to give theoretical answers to empirical questions (Section 3) and empirical answers to theoretical questions (Section 4).

Read more

A Preview of EI and EzI: Programs for Ecological Inference

Kenneth Benoit and Gary King. 1996. “A Preview of EI and EzI: Programs for Ecological Inference.” Social Science Computer Review, 14, Pp. 433–438.Abstract
Ecological inference, as traditionally defined, is the process of using aggregate (i.e., "ecological") data to infer discrete individual-level relationships of interest when individual-level data are not available. Existing methods of ecological inference generate very inaccurate conclusions about the empirical world- which thus gives rise to the ecological inference problem. Most scholars who analyze aggregate data routinely encounter some form of this problem. EI (by Gary King) and EzI (by Kenneth Benoit and Gary King) are freely available software that implement the statistical and graphical methods detailed in Gary King’s book A Solution to the Ecological Inference Problem. These methods make it possible to infer the attributes of individual behavior from aggregate data. EI works within the statistics program Gauss and will run on any computer hardware and operating system that runs Gauss (the Gauss module, CML, or constrained maximum likelihood- by Ronald J. Schoenberg- is also required). EzI is a menu-oriented stand-alone version of the program that runs under MS-DOS (and soon Windows 95, OS/2, and HP-UNIX). EI allows users to make ecological inferences as part of the powerful and open Gauss statistical environment. In contrast, EzI requires no additional software, and provides an attractive menu-based user interface for non-Gauss users, although it lacks the flexibility afforded by the Gauss version. Both programs presume that the user has read or is familiar with A Solution to the Ecological Inference Problem.
Read more

Estimating the Probability of Events that Have Never Occurred: When Is Your Vote Decisive?

Estimating the Probability of Events that Have Never Occurred: When Is Your Vote Decisive?
Andrew Gelman, Gary King, and John Boscardin. 1998. “Estimating the Probability of Events that Have Never Occurred: When Is Your Vote Decisive?” Journal of the American Statistical Association, 93, Pp. 1–9.Abstract
Researchers sometimes argue that statisticians have little to contribute when few realizations of the process being estimated are observed. We show that this argument is incorrect even in the extreme situation of estimating the probabilities of events so rare that they have never occurred. We show how statistical forecasting models allow us to use empirical data to improve inferences about the probabilities of these events. Our application is estimating the probability that your vote will be decisive in a U.S. presidential election, a problem that has been studied by political scientists for more than two decades. The exact value of this probability is of only minor interest, but the number has important implications for understanding the optimal allocation of campaign resources, whether states and voter groups receive their fair share of attention from prospective presidents, and how formal "rational choice" models of voter behavior might be able to explain why people vote at all. We show how the probability of a decisive vote can be estimated empirically from state-level forecasts of the presidential election and illustrate with the example of 1992. Based on generalizations of standard political science forecasting models, we estimate the (prospective) probability of a single vote being decisive as about 1 in 10 million for close national elections such as 1992, varying by about a factor of 10 among states. Our results support the argument that subjective probabilities of many types are best obtained through empirically based statistical prediction models rather than solely through mathematical reasoning. We discuss the implications of our findings for the types of decision analyses used in public choice studies.
Read more
All writings

Presentations

Reverse-Engineering Censorship in China, at IARPA seminar on "Science, Intelligence, and Security," Virginia Tech Research Center, Monday, November 16, 2015:

Chinese government censorship of social media constitutes the largest selective suppression of human communication in recorded history. In three ways, we show, paradoxically, that this large system also leaves large footprints that reveal a great deal about itself and the intentions of the government. First is an observational study where we download all social media posts before the Chinese government can read and censor those they deem objectionable, and then detect from a network of computers all over the world which are censored. Second, we conduct...

Read more about Reverse-Engineering Censorship in China
Reverse-Engineering Censorship in China, at Ohio State University, Mershon Center for International Security Studies, Thursday, October 22, 2015:

Chinese government censorship of social media constitutes the largest selective suppression of human communication in recorded history. In three ways, we show, paradoxically, that this large system also leaves large footprints that reveal a great deal about itself and the intentions of the government. First is an observational study where we download all social media posts before the Chinese government can read and censor those they deem objectionable, and then detect from a network of computers all over the world which are censored. Second, we conduct...

Read more about Reverse-Engineering Censorship in China
All presentations

Gary King on Twitter