Gary King is the Weatherhead University Professor at Harvard University. He also serves as Director of the Institute for Quantitative Social Science. He and his research group develop and apply empirical methods in many areas of social science research. Full bio and CV

Research Areas

    • Evaluating Social Security Forecasts
      The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, government programs comprising greater than 50% of all federal government expenditures, industry decision making, and the evidence base of many scholarly articles. Forecasts are also essential for scoring policy proposals, put forward by both political parties. Because SSA makes public little replication information, and uses ad hoc, qualitative, and antiquated statistical forecasting methods, no one in or out of government has been able to produce fully independent alternative forecasts or policy scorings. Yet, no systematic evaluation of SSA forecasts has ever been published by SSA or anyone else. We show that SSA's forecasting errors were approximately unbiased until about 2000, but then began to grow quickly, with increasingly overconfident uncertainty intervals. Moreover, the errors all turn out to be in the same potentially dangerous direction, each making the Social Security Trust Funds look healthier than they actually are. We also discover the cause of these findings with evidence from a large number of interviews we conducted with participants at every level of the forecasting and policy processes. We show that SSA's forecasting procedures meet all the conditions the modern social-psychology and statistical literatures demonstrate make bias likely. When those conditions mixed with potent new political forces trying to change Social Security and influence the forecasts, SSA's actuaries hunkered down trying hard to insulate themselves from the intense political pressures. Unfortunately, this otherwise laudable resistance to undue influence, along with their ad hoc qualitative forecasting models, led them to also miss important changes in the input data such as retirees living longer lives, and drawing more benefits, than predicted by simple extrapolations. We explain that solving this problem involves using (a) removing human judgment where possible, by using formal statistical methods -- via the revolution in data science and big data; (b) instituting formal structural procedures when human judgment is required -- via the revolution in social psychological research; and (c) requiring transparency and data sharing to catch errors that slip through -- via the revolution in data sharing & replication.An article at Barron's about our work.
    • Incumbency Advantage
      Proof that previously used estimators of electoral incumbency advantage were biased, and a new unbiased estimator. Also, the first systematic demonstration that constituency service by legislators increases the incumbency advantage.
    • Information Control by Authoritarian Governments
      Reverse engineering Chinese information controls -- the most extensive effort to selectively control human expression in the history of the world. We show that this massive effort to slow the flow of information paradoxically also conveys a great deal about the intentions, goals, and actions of the leaders. We downloaded all Chinese social media posts before the government could read and censor them; wrote and posted comments randomly assigned to our categories on hundreds of websites across the country to see what would be censored; set up our own social media website in China; and discovered that the Chinese government fabricates and posts 450 million social media comments a year in the names of ordinary people and convinced those posting (and inadvertently even the government) to admit to their activities. We found that the goverment does not engage on controversial issues (they do not censor criticism or fabricate posts that argue with those who disagree with the government), but they respond on an emergency basis to stop collective action (with censorship, fabricating posts with giant bursts of cheerleading-type distractions, responding to citizen greviances, etc.). They don't care what you think of them or say about them; they only care what you can do.
    • Mexican Health Care Evaluation
      An evaluation of the Mexican Seguro Popular program (designed to extend health insurance and regular and preventive medical care, pharmaceuticals, and health facilities to 50 million uninsured Mexicans), one of the world's largest health policy reforms of the last two decades. Our evaluation features a new design for field experiments that is more robust to the political interventions and implementation errors that have ruined many similar previous efforts; new statistical methods that produce more reliable and efficient results using fewer resources, assumptions, and data, as well as standard errors that are as much as 600% smaller; and an implementation of these methods in the largest randomized health policy experiment to date. (See the Harvard Gazette story on this project.)
    • Presidency Research; Voting Behavior
      Resolution of the paradox of why polls are so variable over time during presidential campaigns even though the vote outcome is easily predictable before it starts. Also, a resolution of a key controversy over absentee ballots during the 2000 presidential election; and the methodology of small-n research on executives.
    • Informatics and Data Sharing
      Replication Standards New standards, protocols, and software for citing, sharing, analyzing, archiving, preserving, distributing, cataloging, translating, disseminating, naming, verifying, and replicating scholarly research data and analyses. Also includes proposals to improve the norms of data sharing and replication in science.
    • International Conflict
      Methods for coding, analyzing, and forecasting international conflict and state failure. Evidence that the causes of conflict, theorized to be important but often found to be small or ephemeral, are indeed tiny for the vast majority of dyads, but are large, stable, and replicable wherever the ex ante probability of conflict is large.
    • Legislative Redistricting
      The definition of partisan symmetry as a standard for fairness in redistricting; methods and software for measuring partisan bias and electoral responsiveness; discussion of U.S. Supreme Court rulings about this work. Evidence that U.S. redistricting reduces bias and increases responsiveness, and that the electoral college is fair; applications to legislatures, primaries, and multiparty systems.
    • Mortality Studies
      Methods for forecasting mortality rates (overall or for time series data cross-classified by age, sex, country, and cause); estimating mortality rates in areas without vital registration; measuring inequality in risk of death; applications to US mortality, the future of the Social Security, armed conflict, heart failure, and human security.
    • Teaching and Administration
      Publications and other projects designed to improve teaching, learning, and university administration, as well as broader writings on the future of the social sciences.
    • Automated Text Analysis
      Automated and computer-assisted methods of extracting, organizing, understanding, conceptualizing, and consuming knowledge from massive quantities of unstructured text.
    • Anchoring Vignettes (for interpersonal incomparability)
      Methods for interpersonal incomparability, when respondents (from different cultures, genders, countries, or ethnic groups) understand survey questions in different ways; for developing theoretical definitions of complicated concepts apparently definable only by example (i.e., "you know it when you see it").
    • Causal Inference
      Methods for detecting and reducing model dependence (i.e., when minor model changes produce substantively different inferences) in inferring causal effects and other counterfactuals. Matching methods; "politically robust" and cluster-randomized experimental designs; causal bias decompositions.
    • Event Counts and Durations
      Statistical models to explain or predict how many events occur for each fixed time period, or the time between events. An application to cabinet dissolution in parliamentary democracies which united two previously warring scholarly literature. Other applications to international relations and U.S. Supreme Court appointments.
    • Ecological Inference
      Inferring individual behavior from group-level data: The first approach to incorporate both unit-level deterministic bounds and cross-unit statistical information, methods for 2x2 and larger tables, Bayesian model averaging, applications to elections, software.
    • Missing Data & Measurement Error
      Statistical methods to accommodate missing information in data sets due to scattered unit nonresponse, missing variables, or values or variables measured with error. Easy-to-use algorithms and software for multiple imputation and multiple overimputation for surveys, time series, and time series cross-sectional data. Applications to electoral, and other compositional, data.
    • Qualitative Research
      How the same unified theory of inference underlies quantitative and qualitative research alike; scientific inference when quantification is difficult or impossible; research design; empirical research in legal scholarship.
    • Rare Events
      How to save 99% of your data collection costs; bias corrections for logistic regression in estimating probabilities and causal effects in rare events data; estimating base probabilities or any quantity from case-control data; automated coding of events.
    • Survey Research
      How surveys work and a variety of methods to use with surveys. Surveys for estimating death rates, why election polls are so variable when the vote is so predictable, and health inequality.
    • Unifying Statistical Analysis
      Development of a unified approach to statistical modeling, inference, interpretation, presentation, analysis, and software; integrated with most of the other projects listed here.

Recent Papers

Estimating Incumbency Advantage Without Bias

Estimating Incumbency Advantage Without Bias
Andrew Gelman and Gary King. 1990. “Estimating Incumbency Advantage Without Bias.” American Journal of Political Science, 34, Pp. 1142–1164.Abstract
In this paper we prove theoretically and demonstrate empirically that all existing measures of incumbency advantage in the congressional elections literature are biased or inconsistent. We then provide an unbiased estimator based on a very simple linear regression model. We apply this new method to congressional elections since 1900, providing the first evidence of a positive incumbency advantage in the first half of the century.
Read more

Stochastic Variation: A Comment on Lewis-Beck and Skalaban’s ’The R-Square’

Gary King. 1991. “Stochastic Variation: A Comment on Lewis-Beck and Skalaban’s ’The R-Square’.” Political Analysis, 2, Pp. 185–200.Abstract
In an interesting and provocative article, Michael Lewis-Beck and Andrew Skalaban make an important contribution by emphasizing several philosophical issues in political methodology that have received too little attention from methodologists and quantitative researchers. These issues involve the role of systematic, and especially stochastic, variation in statistical models. After briefly discussing a few points of disagreement, hoping to reduce them to points of clarification, I turn to the philosophical issues. Examples with real data follow.
Read more

Calculating Standard Errors of Predicted Values based on Nonlinear Functional Forms

Calculating Standard Errors of Predicted Values based on Nonlinear Functional Forms
Gary King. 1991. “Calculating Standard Errors of Predicted Values based on Nonlinear Functional Forms.” The Political Methodologist, 4.Abstract

Whenever we report predicted values, we should also report some measure of the uncertainty of these estimates. In the linear case, this is relatively simple, and the answer well-known, but with nonlinear models the answer may not be apparent. This short article shows how to make these calculations. I first present this for the familiar linear case, also reviewing the two forms of uncertainty in these estimates, and then show how to calculate these for any arbitrary function. An example appears last.

 

Read more

Systemic Consequences of Incumbency Advantage in the U.S. House

Systemic Consequences of Incumbency Advantage in the U.S. House
Gary King and Andrew Gelman. 1991. “Systemic Consequences of Incumbency Advantage in the U.S. House.” American Journal of Political Science, 35, Pp. 110–138.Abstract
The dramatic increase in the electoral advantage of incumbency has sparked widespread interest among congressional researchers over the last 15 years. Although many scholars have studied the advantages of incumbency for incumbents, few have analyzed its effects on the underlying electoral system. We examine the influence of the incumbency advantage on two features of the electoral system in the U.S. House elections: electoral responsiveness and partisan bias. Using a district-level seats-votes model of House elections, we are able to distinguish systematic changes from unique, election-specific variations. Our results confirm the significant drop in responsiveness, and even steeper decline outside the South, over the past 40 years. Contrary to expectations, we find that increased incumbency advantage explains less than a third of this trend, indicating that some other unknown factor is responsible. Moreover, our analysis also reveals another dramatic pattern, largely overlooked in the congressional literature: in the 1940’s and 1950’s the electoral system was severely biased in favor of the Republican party. The system shifted incrementally from this severe Republican bias over the next several decades to a moderate Democratic bias by the mid-1980’s. Interestingly, changes in incumbency advantage explain virtually all of this trend in partisan bias since the 1940’s. By removing incumbency advantage and the existing configuration of incumbents and challengers analytically, our analysis reveals an underlying electoral system that remains consistently biased in favor of the Republican party. Thus, our results indicate that incumbency advantage affects the underlying electoral system, but contrary to conventional wisdom, this changes the trend in partisan bias more than electoral responsiveness.
Read more

Constituency Service and Incumbency Advantage

Constituency Service and Incumbency Advantage
Gary King. 1991. “Constituency Service and Incumbency Advantage.” British Journal of Political Science, 21, Pp. 119–128.Abstract
This Note addresses the long-standing discrepancy between scholarly support for the effect of constituency service on incumbency advantage and a large body of contradictory empirical evidence. I show first that many of the methodological problems noticed in past research reduce to a single methodological problem that is readily resolved. The core of this Note then provides among the first systematic empirical evidence for the constituency service hypothesis. Specifically, an extra $10,000 added to the budget of the average state legislator gives this incumbent an additional 1.54 percentage points in the next election (with a 95% confidence interval of 1.14 to 1.94 percentage points).
Read more

'Truth' is Stranger than Prediction, More Questionable Than Causal Inference

'Truth' is Stranger than Prediction, More Questionable Than Causal Inference
Gary King. 1991. “'Truth' is Stranger than Prediction, More Questionable Than Causal Inference.” American Journal of Political Science, 35, Pp. 1047–1053.Abstract
Robert Luskin’s article in this issue provides a useful service by appropriately qualifying several points I made in my 1986 American Journal of Political Science article. Whereas I focused on how to avoid common mistakes in quantitative political sciences, Luskin clarifies ways to extract some useful information from usually problematic statistics: correlation coefficients, standardized coefficients, and especially R2. Since these three statistics are very closely related (and indeed deterministic functions of one another in some cases), I focus in this discussion primarily on R2, the most widely used and abused. Luskin also widens the discussion to various kinds of specification tests, a general issue I also address. In fact, as Beck (1991) reports, a large number of formal specification tests are just functions of R2, with differences among them primarily due to how much each statistic penalizes one for including extra parameters and fewer observations. Quantitative political scientists often worry about model selection and specification, asking questions about parameter identification, autocorrelated or heteroscedastic disturbances, parameter constancy, variable choice, measurement error, endogeneity, functional forms, stochastic assumptions, and selection bias, among numerous others. These model specification questions are all important, but we may have forgotten why we pose them. Political scientists commonly give three reasons: (1) finding the "true" model, or the "full" explanation and (2) prediction and and (3) estimating specific causal effects. I argue here that (1) is used the most but useful the least and (2) is very useful but not usually in political science where forecasting is not often a central concern and and (3) correctly represents the goals of political scientists and should form the basis of most of our quantitative empirical work.
Read more

On Political Methodology

On Political Methodology
Gary King. 1991. “On Political Methodology.” Political Analysis, 2, Pp. 1–30.Abstract
"Politimetrics" (Gurr 1972), "polimetrics" (Alker 1975), "politometrics" (Hilton 1976), "political arithmetic" (Petty [1672] 1971), "quantitative Political Science (QPS)," "governmetrics," "posopolitics" (Papayanopoulos 1973), "political science statistics (Rai and Blydenburgh 1973), "political statistics" (Rice 1926). These are some of the names that scholars have used to describe the field we now call "political methodology." The history of political methodology has been quite fragmented until recently, as reflected by this patchwork of names. The field has begun to coalesce during the past decade and we are developing persistent organizations, a growing body of scholarly literature, and an emerging consensus about important problems that need to be solved. I make one main point in this article: If political methodology is to play an important role in the future of political science, scholars will need to find ways of representing more interesting political contexts in quantitative analyses. This does not mean that scholars should just build more and more complicated statistical models. Instead, we need to represent more of the essence of political phenomena in our models. The advantage of formal and quantitative approaches is that they are abstract representations of the political world and are, thus, much clearer. We need methods that enable us to abstract the right parts of the phenomenon we are studying and exclude everything superfluous. Despite the fragmented history of quantitative political analysis, a version of this goal has been voiced frequently by both quantitative researchers and their critics (Sec. 2). However, while recognizing this shortcoming, earlier scholars were not in the position to rectify it, lacking the mathematical and statistical tools and, early on, the data. Since political methodologists have made great progress in these and other areas in recent years, I argue that we are now capable of realizing this goal. In section 3, I suggest specific approaches to this problem. Finally, in section 4, I provide two modern examples, ecological inference and models of spatial autocorrelation, to illustrate these points.
Read more
All writings

Presentations

Discovering and Explaining Systematic Bias and Nontransparency in US Social Security Administration Forecasts, at University of Florida, Department of Political Science, Friday, March 18, 2016:

The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, government programs comprising greater than 50% of all federal government expenditures, industry decision making, and the evidence base of many scholarly articles. Forecasts are also essential for scoring policy proposals put forward by both political parties or anyone else. Because SSA makes public little replication information, and uses ad hoc, qualitative, and antiquated statistical forecasting methods, no one in or out of government has...

Read more about Discovering and Explaining Systematic Bias and Nontransparency in US Social Security Administration Forecasts
Big Data is Not About the Data!, at University of Florida, Informatics Symposium, Thursday, March 17, 2016:

In this talk, Gary King explains that the spectacular progress the media describes as "big data" has little to do with the data.  Data, after all, is becoming commoditized, less expensive, and an automatic byproduct of other changes in organizations and society. More data alone doesn't generate insights; it often just makes data analysis harder. The real revolution isn't about the data, it is about the stunning progress in the statistical methods of extracting insights from the data. He will illustrate these points...

Read more about Big Data is Not About the Data!
Why Propensity Scores Should Not Be Used For Matching, at Yale University, MacMillan-CSAP Workshop on Quantitative Research Methods, Thursday, March 10, 2016:

This talk summarizes a paper -- Gary King and Richard Nielsen. 2016. “Why Propensity Scores Should Not Be Used for Matching” -- with this abstract:  Researchers use propensity score matching (PSM) as a data preprocessing step to selectively prune units prior to applying a model to estimate a causal effect. The goal of PSM is to reduce imbalance in the chosen pre-treatment covariates between the treated and control groups, thereby reducing the...

Read more about Why Propensity Scores Should Not Be Used For Matching
The Next Big [Social Science] Thing, at National Academy of Sciences, Friday, March 4, 2016:

"Dr. Gary King, NAS member and Professor at Harvard University, will talk about progress in and the future of the Social Sciences, illustrated with a wide range of examples from his research. These examples include forecasting the solvency of Social Security; reverse engineering Chinese censorship; estimating causes of death in developing countries; automated text analysis of billions of social media posts; dataverse, software and protocols his team developed to run the largest archive of...

Read more about The Next Big [Social Science] Thing
All presentations

Gary King on Twitter

  • kinggary
    kinggary ITProPortal: Unlocking the potential of Big Data. t.co/cANsiqDWzR
  • kinggary
    kinggary Final pre-publication version: "Do Nonpartisan Programmatic Policies Have Partisan Electoral Effects? Evidence from Two Large Scale Experiments", to appear Journal of Politics, t.co/otyIlkMaYl with Kosuke Imai and Carlos Velasco Rivera
  • kinggary
    kinggary We convinced Harvard to adopt 5 levels of data security & certify our compute facilities. Negotiating with research data providers is now MUCH faster. If you can convince your university too, providers will become accustomed & all of science will be better t.co/meWfR1qJo8