Gary King is the Weatherhead University Professor at Harvard University. He also serves as Director of the Institute for Quantitative Social Science. He and his research group develop and apply empirical methods in many areas of social science research. Full bio and CV

Research Areas

    • Evaluating Social Security Forecasts
      The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, government programs comprising greater than 50% of all federal government expenditures, industry decision making, and the evidence base of many scholarly articles. Forecasts are also essential for scoring policy proposals, put forward by both political parties. Because SSA makes public little replication information, and uses ad hoc, qualitative, and antiquated statistical forecasting methods, no one in or out of government has been able to produce fully independent alternative forecasts or policy scorings. Yet, no systematic evaluation of SSA forecasts has ever been published by SSA or anyone else. We show that SSA's forecasting errors were approximately unbiased until about 2000, but then began to grow quickly, with increasingly overconfident uncertainty intervals. Moreover, the errors all turn out to be in the same potentially dangerous direction, each making the Social Security Trust Funds look healthier than they actually are. We also discover the cause of these findings with evidence from a large number of interviews we conducted with participants at every level of the forecasting and policy processes. We show that SSA's forecasting procedures meet all the conditions the modern social-psychology and statistical literatures demonstrate make bias likely. When those conditions mixed with potent new political forces trying to change Social Security and influence the forecasts, SSA's actuaries hunkered down trying hard to insulate themselves from the intense political pressures. Unfortunately, this otherwise laudable resistance to undue influence, along with their ad hoc qualitative forecasting models, led them to also miss important changes in the input data such as retirees living longer lives, and drawing more benefits, than predicted by simple extrapolations. We explain that solving this problem involves using (a) removing human judgment where possible, by using formal statistical methods -- via the revolution in data science and big data; (b) instituting formal structural procedures when human judgment is required -- via the revolution in social psychological research; and (c) requiring transparency and data sharing to catch errors that slip through -- via the revolution in data sharing & replication.An article at Barron's about our work.
    • Incumbency Advantage
      Proof that previously used estimators of electoral incumbency advantage were biased, and a new unbiased estimator. Also, the first systematic demonstration that constituency service by legislators increases the incumbency advantage.
    • Information Control by Authoritarian Governments
      Reverse engineering Chinese information controls -- the most extensive effort to selectively control human expression in the history of the world. We show that this massive effort to slow the flow of information paradoxically also conveys a great deal about the intentions, goals, and actions of the leaders. We downloaded all Chinese social media posts before the government could read and censor them; wrote and posted comments randomly assigned to our categories on hundreds of websites across the country to see what would be censored; set up our own social media website in China; and discovered that the Chinese government fabricates and posts 450 million social media comments a year in the names of ordinary people and convinced those posting (and inadvertently even the government) to admit to their activities. We found that the goverment does not engage on controversial issues (they do not censor criticism or fabricate posts that argue with those who disagree with the government), but they respond on an emergency basis to stop collective action (with censorship, fabricating posts with giant bursts of cheerleading-type distractions, responding to citizen greviances, etc.). They don't care what you think of them or say about them; they only care what you can do.
    • Mexican Health Care Evaluation
      An evaluation of the Mexican Seguro Popular program (designed to extend health insurance and regular and preventive medical care, pharmaceuticals, and health facilities to 50 million uninsured Mexicans), one of the world's largest health policy reforms of the last two decades. Our evaluation features a new design for field experiments that is more robust to the political interventions and implementation errors that have ruined many similar previous efforts; new statistical methods that produce more reliable and efficient results using fewer resources, assumptions, and data, as well as standard errors that are as much as 600% smaller; and an implementation of these methods in the largest randomized health policy experiment to date. (See the Harvard Gazette story on this project.)
    • Presidency Research; Voting Behavior
      Resolution of the paradox of why polls are so variable over time during presidential campaigns even though the vote outcome is easily predictable before it starts. Also, a resolution of a key controversy over absentee ballots during the 2000 presidential election; and the methodology of small-n research on executives.
    • Informatics and Data Sharing
      Replication Standards New standards, protocols, and software for citing, sharing, analyzing, archiving, preserving, distributing, cataloging, translating, disseminating, naming, verifying, and replicating scholarly research data and analyses. Also includes proposals to improve the norms of data sharing and replication in science.
    • International Conflict
      Methods for coding, analyzing, and forecasting international conflict and state failure. Evidence that the causes of conflict, theorized to be important but often found to be small or ephemeral, are indeed tiny for the vast majority of dyads, but are large, stable, and replicable wherever the ex ante probability of conflict is large.
    • Legislative Redistricting
      The definition of partisan symmetry as a standard for fairness in redistricting; methods and software for measuring partisan bias and electoral responsiveness; discussion of U.S. Supreme Court rulings about this work. Evidence that U.S. redistricting reduces bias and increases responsiveness, and that the electoral college is fair; applications to legislatures, primaries, and multiparty systems.
    • Mortality Studies
      Methods for forecasting mortality rates (overall or for time series data cross-classified by age, sex, country, and cause); estimating mortality rates in areas without vital registration; measuring inequality in risk of death; applications to US mortality, the future of the Social Security, armed conflict, heart failure, and human security.
    • Teaching and Administration
      Publications and other projects designed to improve teaching, learning, and university administration, as well as broader writings on the future of the social sciences.
    • Automated Text Analysis
      Automated and computer-assisted methods of extracting, organizing, understanding, conceptualizing, and consuming knowledge from massive quantities of unstructured text.
    • Anchoring Vignettes (for interpersonal incomparability)
      Methods for interpersonal incomparability, when respondents (from different cultures, genders, countries, or ethnic groups) understand survey questions in different ways; for developing theoretical definitions of complicated concepts apparently definable only by example (i.e., "you know it when you see it").
    • Causal Inference
      Methods for detecting and reducing model dependence (i.e., when minor model changes produce substantively different inferences) in inferring causal effects and other counterfactuals. Matching methods; "politically robust" and cluster-randomized experimental designs; causal bias decompositions.
    • Event Counts and Durations
      Statistical models to explain or predict how many events occur for each fixed time period, or the time between events. An application to cabinet dissolution in parliamentary democracies which united two previously warring scholarly literature. Other applications to international relations and U.S. Supreme Court appointments.
    • Ecological Inference
      Inferring individual behavior from group-level data: The first approach to incorporate both unit-level deterministic bounds and cross-unit statistical information, methods for 2x2 and larger tables, Bayesian model averaging, applications to elections, software.
    • Missing Data & Measurement Error
      Statistical methods to accommodate missing information in data sets due to scattered unit nonresponse, missing variables, or values or variables measured with error. Easy-to-use algorithms and software for multiple imputation and multiple overimputation for surveys, time series, and time series cross-sectional data. Applications to electoral, and other compositional, data.
    • Qualitative Research
      How the same unified theory of inference underlies quantitative and qualitative research alike; scientific inference when quantification is difficult or impossible; research design; empirical research in legal scholarship.
    • Rare Events
      How to save 99% of your data collection costs; bias corrections for logistic regression in estimating probabilities and causal effects in rare events data; estimating base probabilities or any quantity from case-control data; automated coding of events.
    • Survey Research
      How surveys work and a variety of methods to use with surveys. Surveys for estimating death rates, why election polls are so variable when the vote is so predictable, and health inequality.
    • Unifying Statistical Analysis
      Development of a unified approach to statistical modeling, inference, interpretation, presentation, analysis, and software; integrated with most of the other projects listed here.

Recent Papers

An Overview of the Virtual Data Center Project and Software

Micah Altman, Leonid Andreev, Mark Diggory, Gary King, Daniel L Kiskis, Elizabeth Kolster, Michael Krot, and Sidney Verba. 2001. “An Overview of the Virtual Data Center Project and Software.” JCDL ’01: First Joint Conference on Digital Libraries, Pp. 203-204.Abstract
In this paper, we present an overview of the Virtual Data Center (VDC) software, an open-source digital library system for the management and dissemination of distributed collections of quantitative data. (see http://TheData.org). The VDC functionality provides everything necessary to maintain and disseminate an individual collection of research studies, including facilities for the storage, archiving, cataloging, translation, and on-line analysis of a particular collection. Moreover, the system provides extensive support for distributed and federated collections including: location-independent naming of objects, distributed authentication and access control, federated metadata harvesting, remote repository caching, and distributed "virtual" collections of remote objects.
Read more

Bayesian and Frequentist Inference for Ecological Inference: The RxC Case

Ori Rosen, Wenxin Jiang, Gary King, and Martin A Tanner. 2001. “Bayesian and Frequentist Inference for Ecological Inference: The RxC Case.” Statistica Neerlandica, 55, Pp. 134–156.Abstract
In this paper we propose Bayesian and frequentist approaches to ecological inference, based on R x C contingency tables, including a covariate. The proposed Bayesian model extends the binomial-beta hierarchical model developed by King, Rosen and Tanner (1999) from the 2 x 2 case to the R x C case, the inferential procedure employs Markov chain Monte Carlo (MCMC) methods. As such the resulting MCMC analysis is rich but computationally intensive. The frequentist approach, based on first moments rather than on the entire likelihood, provides quick inference via nonlinear least-squares, while retaining good frequentist properties. The two approaches are illustrated with simulated data, as well as with real data on voting patterns in Weimar Germany. In the final section of the paper we provide an overview of a range of alternative inferential approaches which trade-off computational intensity for statistical efficiency.
Read more

Proper Nouns and Methodological Propriety: Pooling Dyads in International Relations Data

Proper Nouns and Methodological Propriety: Pooling Dyads in International Relations Data
Gary King. 2001. “Proper Nouns and Methodological Propriety: Pooling Dyads in International Relations Data.” International Organization, 55, Pp. 497–507.Abstract
The intellectual stakes at issue in this symposium are very high: Green, Kim, and Yoon (2000 and hereinafter GKY) apply their proposed methodological prescriptions and conclude that they key findings in the field is wrong and democracy "has no effect on militarized disputes." GKY are mainly interested in convincing scholars about their methodological points and see themselves as having no stake in the resulting substantive conclusions. However, their methodological points are also high stakes claims: if correct, the vast majority of statistical analyses of military conflict ever conducted would be invalidated. GKY say they "make no attempt to break new ground statistically," but, as we will see, this both understates their methodological contribution to the field and misses some unique features of their application and data in international relations. On the ltter, GKY’s critics are united: Oneal and Russett (2000) conclude that GKY’s method "produces distorted results," and show even in GKY’s framework how democracy’s effect can be reinstated. Beck and Katz (2000) are even more unambiguous: "GKY’s conclusion, in table 3, that variables such as democracy have no pacific impact, is simply nonsense...GKY’s (methodological) proposal...is NEVER a good idea." My given task is to sort out and clarify these conflicting claims and counterclaims. The procedure I followed was to engage in extensive discussions with the participants that included joint reanalyses provoked by our discussions and passing computer program code (mostly with Monte Carlo simulations) back and forth to ensure we were all talking about the same methods and agreed with the factual results. I learned a great deal from this process and believe that the positions of the participants are now a lot closer than it may seem from their written statements. Indeed, I believe that all the participants now agree with what I have written here, even though they would each have different emphases (and although my believing there is agreement is not the same as there actually being agreement!).
Read more

Measuring Total Health Inequality: Adding Individual Variation to Group-Level Differences

Emmanuela Gakidou and Gary King. 2002. “Measuring Total Health Inequality: Adding Individual Variation to Group-Level Differences.” BioMed Central: International Journal for Equity in Health, 1.Abstract
Background: Studies have revealed large variations in average health status across social, economic, and other groups. No study exists on the distribution of the risk of ill-health across individuals, either within groups or across all people in a society, and as such a crucial piece of total health inequality has been overlooked. Some of the reason for this neglect has been that the risk of death, which forms the basis for most measures, is impossible to observe directly and difficult to estimate. Methods: We develop a measure of total health inequality – encompassing all inequalities among people in a society, including variation between and within groups – by adapting a beta-binomial regression model. We apply it to children under age two in 50 low- and middle-income countries. Our method has been adopted by the World Health Organization and is being implemented in surveys around the world and preliminary estimates have appeared in the World Health Report (2000). Results: Countries with similar average child mortality differ considerably in total health inequality. Liberia and Mozambique have the largest inequalities in child survival, while Colombia, the Philippines and Kazakhstan have the lowest levels among the countries measured. Conclusions: Total health inequality estimates should be routinely reported alongside average levels of health in populations and groups, as they reveal important policy-related information not otherwise knowable. This approach enables meaningful comparisons of inequality across countries and future analyses of the determinants of inequality.
Read more

Explaining Rare Events in International Relations

Explaining Rare Events in International Relations
Gary King and Langche Zeng. 2001. “Explaining Rare Events in International Relations.” International Organization, 55, Pp. 693–715.Abstract
Some of the most important phenomena in international conflict are coded s "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that seems to have at least two sources. First, and most importantly, the data collection strategies used in international conflict are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of non-events (peace). This enables scholars to save as much as 99% of their (non-fixed) data collection costs, or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly used statistical procedures, can underestimate the probability of rare events. We introduce some corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. We also provide easy-to-use methods and software that link these two results, enabling both types of corrections to work simultaneously.
Read more

A Digital Library for the Dissemination and Replication of Quantitative Social Science Research

A Digital Library for the Dissemination and Replication of Quantitative Social Science Research
Micah Altman, Leonid Andreev, Mark Diggory, Gary King, Daniel L Kiskis, Elizabeth Kolster, Michael Krot, and Sidney Verba. 2001. “A Digital Library for the Dissemination and Replication of Quantitative Social Science Research.” Social Science Computer Review, 19, Pp. 458–470.Abstract
The Virtual Data Center (VDC) software is an open-source, digital library system for quantitative data. We discuss what the software does, and how it provides an infrastructure for the management and dissemination of disturbed collections of quantitative data, and the replication of results derived from this data.
Read more

Aggregation Among Binary, Count, and Duration Models: Estimating the Same Quantities from Different Levels of Data

Aggregation Among Binary, Count, and Duration Models: Estimating the Same Quantities from Different Levels of Data
James E Alt, Gary King, and Curtis Signorino. 2001. “Aggregation Among Binary, Count, and Duration Models: Estimating the Same Quantities from Different Levels of Data.” Political Analysis, 9, Pp. 21–44.Abstract
Binary, count and duration data all code discrete events occurring at points in time. Although a single data generation process can produce all of these three data types, the statistical literature is not very helpful in providing methods to estimate parameters of the same process from each. In fact, only single theoretical process exists for which know statistical methods can estimate the same parameters - and it is generally used only for count and duration data. The result is that seemingly trivial decisions abut which level of data to use can have important consequences for substantive interpretations. We describe the theoretical event process for which results exist, based on time independence. We also derive a set of models for a time-dependent process and compare their predictions to those of a commonly used model. Any hope of understanding and avoiding the more serious problems of aggregation bias in events data is contingent on first deriving a much wider arsenal of statistical models and theoretical processes that are not constrained by the particular forms of data that happen to be available. We discuss these issues and suggest an agenda for political methodologists interested in this very large class of aggregation problems.
Read more
All writings

Presentations

Why Propensity Scores Should Not Be Used For Matching, at Department of Epidemiology, Harvard T.H. Chan School of Public Health, Thursday, October 15, 2015:

This talk summarizes a paper -- Gary King and Richard Nielsen. 2015. “Why Propensity Scores Should Not Be Used for Matching” -- with this abstract:  Researchers use propensity score matching (PSM) as a data preprocessing step to selectively prune units prior to applying a model to estimate a causal effect. The goal of PSM is to reduce imbalance in the chosen pre-treatment covariates between the treated and control groups, thereby reducing the...

Read more about Why Propensity Scores Should Not Be Used For Matching
Explaining Systematic Bias and Nontransparency in US Social Security Administration Forecasts, at Inaugural Distinguished Lecture, Institute for Social Science, UC-Davis, Wednesday, October 7, 2015:


The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, government programs comprising greater than 50% of all federal government expenditures, industry decision making, and the evidence base of many scholarly articles. Forecasts are also essential for scoring policy proposals put forward by both political parties or anyone else. Because SSA makes public little replication information, and uses ad hoc, qualitative, and antiquated statistical forecasting methods...

Read more about Explaining Systematic Bias and Nontransparency in US Social Security Administration Forecasts
Why Propensity Scores Should Not Be Used for Matching, at Harvard Medical School, Brigham and Women's Hosptial, Division of Pharmacoepidemiology and Pharmacoeconomics, Wednesday, September 23, 2015:

This talk summarizes a paper -- Gary King and Richard Nielsen. 2015. “Why Propensity Scores Should Not Be Used for Matching” -- with this abstract:  Researchers use propensity score matching (PSM) as a data preprocessing step to selectively prune units prior to applying a model to estimate a causal effect. The goal of PSM is to reduce imbalance in the chosen pre-treatment covariates between the treated and control groups, thereby reducing the...

Read more about Why Propensity Scores Should Not Be Used for Matching
All presentations

Gary King on Twitter