Gary King is the Weatherhead University Professor at Harvard University. He also serves as Director of the Institute for Quantitative Social Science. He and his research group develop and apply empirical methods in many areas of social science research. Full bio and CV

Research Areas

    • Evaluating Social Security Forecasts
      The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, government programs comprising greater than 50% of all federal government expenditures, industry decision making, and the evidence base of many scholarly articles. Forecasts are also essential for scoring policy proposals, put forward by both political parties. Because SSA makes public little replication information, and uses ad hoc, qualitative, and antiquated statistical forecasting methods, no one in or out of government has been able to produce fully independent alternative forecasts or policy scorings. Yet, no systematic evaluation of SSA forecasts has ever been published by SSA or anyone else. We show that SSA's forecasting errors were approximately unbiased until about 2000, but then began to grow quickly, with increasingly overconfident uncertainty intervals. Moreover, the errors all turn out to be in the same potentially dangerous direction, each making the Social Security Trust Funds look healthier than they actually are. We also discover the cause of these findings with evidence from a large number of interviews we conducted with participants at every level of the forecasting and policy processes. We show that SSA's forecasting procedures meet all the conditions the modern social-psychology and statistical literatures demonstrate make bias likely. When those conditions mixed with potent new political forces trying to change Social Security and influence the forecasts, SSA's actuaries hunkered down trying hard to insulate themselves from the intense political pressures. Unfortunately, this otherwise laudable resistance to undue influence, along with their ad hoc qualitative forecasting models, led them to also miss important changes in the input data such as retirees living longer lives, and drawing more benefits, than predicted by simple extrapolations. We explain that solving this problem involves using (a) removing human judgment where possible, by using formal statistical methods -- via the revolution in data science and big data; (b) instituting formal structural procedures when human judgment is required -- via the revolution in social psychological research; and (c) requiring transparency and data sharing to catch errors that slip through -- via the revolution in data sharing & replication.An article at Barron's about our work.
    • Incumbency Advantage
      Proof that previously used estimators of electoral incumbency advantage were biased, and a new unbiased estimator. Also, the first systematic demonstration that constituency service by legislators increases the incumbency advantage.
    • Information Control by Authoritarian Governments
      Reverse engineering Chinese information controls -- the most extensive effort to selectively control human expression in the history of the world. We show that this massive effort to slow the flow of information paradoxically also conveys a great deal about the intentions, goals, and actions of the leaders. We downloaded all Chinese social media posts before the government could read and censor them; wrote and posted comments randomly assigned to our categories on hundreds of websites across the country to see what would be censored; set up our own social media website in China; and discovered that the Chinese government fabricates and posts 450 million social media comments a year in the names of ordinary people and convinced those posting (and inadvertently even the government) to admit to their activities. We found that the goverment does not engage on controversial issues (they do not censor criticism or fabricate posts that argue with those who disagree with the government), but they respond on an emergency basis to stop collective action (with censorship, fabricating posts with giant bursts of cheerleading-type distractions, responding to citizen greviances, etc.). They don't care what you think of them or say about them; they only care what you can do.
    • Mexican Health Care Evaluation
      An evaluation of the Mexican Seguro Popular program (designed to extend health insurance and regular and preventive medical care, pharmaceuticals, and health facilities to 50 million uninsured Mexicans), one of the world's largest health policy reforms of the last two decades. Our evaluation features a new design for field experiments that is more robust to the political interventions and implementation errors that have ruined many similar previous efforts; new statistical methods that produce more reliable and efficient results using fewer resources, assumptions, and data, as well as standard errors that are as much as 600% smaller; and an implementation of these methods in the largest randomized health policy experiment to date. (See the Harvard Gazette story on this project.)
    • Presidency Research; Voting Behavior
      Resolution of the paradox of why polls are so variable over time during presidential campaigns even though the vote outcome is easily predictable before it starts. Also, a resolution of a key controversy over absentee ballots during the 2000 presidential election; and the methodology of small-n research on executives.
    • Informatics and Data Sharing
      Replication Standards New standards, protocols, and software for citing, sharing, analyzing, archiving, preserving, distributing, cataloging, translating, disseminating, naming, verifying, and replicating scholarly research data and analyses. Also includes proposals to improve the norms of data sharing and replication in science.
    • International Conflict
      Methods for coding, analyzing, and forecasting international conflict and state failure. Evidence that the causes of conflict, theorized to be important but often found to be small or ephemeral, are indeed tiny for the vast majority of dyads, but are large, stable, and replicable wherever the ex ante probability of conflict is large.
    • Legislative Redistricting
      The definition of partisan symmetry as a standard for fairness in redistricting; methods and software for measuring partisan bias and electoral responsiveness; discussion of U.S. Supreme Court rulings about this work. Evidence that U.S. redistricting reduces bias and increases responsiveness, and that the electoral college is fair; applications to legislatures, primaries, and multiparty systems.
    • Mortality Studies
      Methods for forecasting mortality rates (overall or for time series data cross-classified by age, sex, country, and cause); estimating mortality rates in areas without vital registration; measuring inequality in risk of death; applications to US mortality, the future of the Social Security, armed conflict, heart failure, and human security.
    • Teaching and Administration
      Publications and other projects designed to improve teaching, learning, and university administration, as well as broader writings on the future of the social sciences.
    • Automated Text Analysis
      Automated and computer-assisted methods of extracting, organizing, understanding, conceptualizing, and consuming knowledge from massive quantities of unstructured text.
    • Anchoring Vignettes (for interpersonal incomparability)
      Methods for interpersonal incomparability, when respondents (from different cultures, genders, countries, or ethnic groups) understand survey questions in different ways; for developing theoretical definitions of complicated concepts apparently definable only by example (i.e., "you know it when you see it").
    • Causal Inference
      Methods for detecting and reducing model dependence (i.e., when minor model changes produce substantively different inferences) in inferring causal effects and other counterfactuals. Matching methods; "politically robust" and cluster-randomized experimental designs; causal bias decompositions.
    • Event Counts and Durations
      Statistical models to explain or predict how many events occur for each fixed time period, or the time between events. An application to cabinet dissolution in parliamentary democracies which united two previously warring scholarly literature. Other applications to international relations and U.S. Supreme Court appointments.
    • Ecological Inference
      Inferring individual behavior from group-level data: The first approach to incorporate both unit-level deterministic bounds and cross-unit statistical information, methods for 2x2 and larger tables, Bayesian model averaging, applications to elections, software.
    • Missing Data & Measurement Error
      Statistical methods to accommodate missing information in data sets due to scattered unit nonresponse, missing variables, or values or variables measured with error. Easy-to-use algorithms and software for multiple imputation and multiple overimputation for surveys, time series, and time series cross-sectional data. Applications to electoral, and other compositional, data.
    • Qualitative Research
      How the same unified theory of inference underlies quantitative and qualitative research alike; scientific inference when quantification is difficult or impossible; research design; empirical research in legal scholarship.
    • Rare Events
      How to save 99% of your data collection costs; bias corrections for logistic regression in estimating probabilities and causal effects in rare events data; estimating base probabilities or any quantity from case-control data; automated coding of events.
    • Survey Research
      How surveys work and a variety of methods to use with surveys. Surveys for estimating death rates, why election polls are so variable when the vote is so predictable, and health inequality.
    • Unifying Statistical Analysis
      Development of a unified approach to statistical modeling, inference, interpretation, presentation, analysis, and software; integrated with most of the other projects listed here.

Recent Papers

Causal Inference Without Balance Checking: Coarsened Exact Matching

Causal Inference Without Balance Checking: Coarsened Exact Matching
Stefano M. Iacus, Gary King, and Giuseppe Porro. 2012. “Causal Inference Without Balance Checking: Coarsened Exact Matching.” Political Analysis, 20, 1, Pp. 1--24. WebsiteAbstract

We discuss a method for improving causal inferences called "Coarsened Exact Matching'' (CEM), and the new "Monotonic Imbalance Bounding'' (MIB) class of matching methods from which CEM is derived. We summarize what is known about CEM and MIB, derive and illustrate several new desirable statistical properties of CEM, and then propose a variety of useful extensions. We show that CEM possesses a wide range of desirable statistical properties not available in most other matching methods, but is at the same time exceptionally easy to comprehend and use. We focus on the connection between theoretical properties and practical applications. We also make available easy-to-use open source software for R and Stata which implement all our suggestions.

An Explanation of CEM Weights

Read more

Publication, Publication

Publication, Publication
Gary King. 2006. “Publication, Publication.” PS: Political Science and Politics, 39, Pp. 119–125. Continuing updates to this paperAbstract

I show herein how to write a publishable paper by beginning with the replication of a published article. This strategy seems to work well for class projects in producing papers that ultimately get published, helping to professionalize students into the discipline, and teaching them the scientific norms of the free exchange of academic information. I begin by briefly revisiting the prominent debate on replication our discipline had a decade ago and some of the progress made in data sharing since.

Read more

Demographic Forecasting

Demographic Forecasting
Federico Girosi and Gary King. 2008. Demographic Forecasting. Princeton: Princeton University Press.Abstract

We introduce a new framework for forecasting age-sex-country-cause-specific mortality rates that incorporates considerably more information, and thus has the potential to forecast much better, than any existing approach. Mortality forecasts are used in a wide variety of academic fields, and for global and national health policy making, medical and pharmaceutical research, and social security and retirement planning.

As it turns out, the tools we developed in pursuit of this goal also have broader statistical implications, in addition to their use for forecasting mortality or other variables with similar statistical properties. First, our methods make it possible to include different explanatory variables in a time series regression for each cross-section, while still borrowing strength from one regression to improve the estimation of all. Second, we show that many existing Bayesian (hierarchical and spatial) models with explanatory variables use prior densities that incorrectly formalize prior knowledge. Many demographers and public health researchers have fortuitously avoided this problem so prevalent in other fields by using prior knowledge only as an ex post check on empirical results, but this approach excludes considerable information from their models. We show how to incorporate this demographic knowledge into a model in a statistically appropriate way. Finally, we develop a set of tools useful for developing models with Bayesian priors in the presence of partial prior ignorance. This approach also provides many of the attractive features claimed by the empirical Bayes approach, but fully within the standard Bayesian theory of inference.

Read more

Detecting Model Dependence in Statistical Inference: A Response

Gary King and Langche Zeng. 2007. “Detecting Model Dependence in Statistical Inference: A Response.” International Studies Quarterly, 51, Pp. 231-241.Abstract

Inferences about counterfactuals are essential for prediction, answering "what if" questions, and estimating causal effects. However, when the counterfactuals posed are too far from the data at hand, conclusions drawn from well-specified statistical analyses become based on speculation and convenient but indefensible model assumptions rather than empirical evidence. Unfortunately, standard statistical approaches assume the veracity of the model rather than revealing the degree of model-dependence, and so this problem can be hard to detect. We develop easy-to-apply methods to evaluate counterfactuals that do not require sensitivity testing over specified classes of models. If an analysis fails the tests we offer, then we know that substantive results are sensitive to at least some modeling choices that are not based on empirical evidence. We use these methods to evaluate the extensive scholarly literatures on the effects of changes in the degree of democracy in a country (on any dependent variable) and separate analyses of the effects of UN peacebuilding efforts. We find evidence that many scholars are inadvertently drawing conclusions based more on modeling hypotheses than on their data. For some research questions, history contains insufficient information to be our guide.

Read more

Empirically Evaluating the Electoral College

Empirically Evaluating the Electoral College
Andrew Gelman, Jonathan Katz, and Gary King. 2004. “Empirically Evaluating the Electoral College.” In Rethinking the Vote: The Politics and Prospects of American Electoral Reform, edited by Ann N Crigler, Marion R Just, and Edward J McCaffery, Pp. 75-88. New York: Oxford University Press.Abstract

The 2000 U.S. presidential election rekindled interest in possible electoral reform. While most of the popular and academic accounts focused on balloting irregularities in Florida, such as the now infamous "butterfly" ballot and mishandled absentee ballots, some also noted that this election marked only the fourth time in history that the candidate with a plurality of the popular vote did not also win the Electoral College. This "anti-democratic" outcome has fueled desire for reform or even outright elimination of the electoral college. We show that after appropriate statistical analysis of the available historical electoral data, there is little basis to argue for reforming the Electoral College. We first show that while the Electoral College may once have been biased against the Democrats, the current distribution of voters advantages neither party. Further, the electoral vote will differ from the popular vote only when the average vote shares of the two major candidates are extremely close to 50 percent. As for individual voting power, we show that while there has been much temporal variation in relative voting power over the last several decades, the voting power of individual citizens would not likely increase under a popular vote system of electing the president.

Read more

From Preserving the Past to Preserving the Future: The Data-PASS Project and the Challenges of Preserving Digital Social Science Data

From Preserving the Past to Preserving the Future: The Data-PASS Project and the Challenges of Preserving Digital Social Science Data
Myron P Gutmann, Mark Abrahamson, Margaret O Adams, Micah Altman, Caroline Arms, Kenneth Bollen, Michael Carlson, Jonathan Crabtree, Darrell Donakowski, Gary King, Jaret Lyle, Marc Maynard, Amy Pienta, Richard Rockwell, Lois Rocms-Ferrara, and Copeland H Young. 2009. “From Preserving the Past to Preserving the Future: The Data-PASS Project and the Challenges of Preserving Digital Social Science Data.” Library Trends, 57, Pp. 315–337.Abstract

Social science data are an unusual part of the past, present, and future of digital preservation. They are both an unqualified success, due to long-lived and sustainable archival organizations, and in need of further development because not all digital content is being preserved. This article is about the Data Preservation Alliance for Social Sciences (Data-PASS), a project supported by the National Digital Information Infrastructure and Preservation Program (NDIIPP), which is a partnership of five major U.S. social science data archives. Broadly speaking, Data-PASS has the goal of ensuring that at-risk social science data are identified, acquired, and preserved, and that we have a future-oriented organization that could collaborate on those preservation tasks for the future. Throughout the life of the Data-PASS project we have worked to identify digital materials that have never been systematically archived, and to appraise and acquire them. As the project has progressed, however, it has increasingly turned its attention from identifying and acquiring legacy and at-risk social science data to identifying on going and future research projects that will produce data. This article is about the project's history, with an emphasis on the issues that underlay the transition from looking backward to looking forward.

Read more

A Proposed Standard for the Scholarly Citation of Quantitative Data

A Proposed Standard for the Scholarly Citation of Quantitative Data
Micah Altman and Gary King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data.” D-Lib Magazine, 13. Publisher's VersionAbstract

An essential aspect of science is a community of scholars cooperating and competing in the pursuit of common goals. A critical component of this community is the common language of and the universal standards for scholarly citation, credit attribution, and the location and retrieval of articles and books. We propose a similar universal standard for citing quantitative data that retains the advantages of print citations, adds other components made possible by, and needed due to, the digital form and systematic nature of quantitative data sets, and is consistent with most existing subfield-specific approaches. Although the digital library field includes numerous creative ideas, we limit ourselves to only those elements that appear ready for easy practical use by scientists, journal editors, publishers, librarians, and archivists.

Read more
All writings

Presentations

Big Data is Not About the Data!, at Abt Associates, Thursday, September 28, 2017:
The spectacular progress the media describes as "big data" has little to do with the growth of data.  Data, after all, is becoming commoditized, less expensive, and an automatic byproduct of other changes in organizations and society. More data alone doesn't generate insights; it often merely makes data analysis harder. The real revolution isn't about the data, it is about the stunning progress in the statistical and other methods of extracting insights from the data. I illustrate these points with a wide range of examples from research I've participated in, including ... Read more about Big Data is Not About the Data!
Fabricating News In Chinese Social Media, at Congress of the Mexican Political Science Association, University of Quintana Roo, Cancun, Mexico, Friday, September 15, 2017:

This talk is based on this paper (in the current issue of the American Political Science Review), by Jen Pan, Molly Roberts, and me, along with a brief summary of our previous work (2014 in Science here, and 2013 in the American Poltiical Science Review ...

Read more about Fabricating News In Chinese Social Media
Big Data Reveals Made Up Data: How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, not Engaged Argument, at University of Texas at Austin, Thursday, September 7, 2017
This talk is based on this paper (forthcoming in the American Political Science Review), by Jen Pan, Molly Roberts, and me, along with a brief summary of our previous work (2014 in Science here, and 2013 in the APSR here).... Read more about Big Data Reveals Made Up Data: How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, not Engaged Argument
Detecting and Reducing Model Dependence in Causal Inference, at Peking University, Wednesday, July 26, 2017:

This presentation discusses methods of detecting counterfactuals (predictions, what if questions, and casual inferences) far enough from the data that any inferences based on it will yield highly model dependent inferences -- where small, indefensible changes in a model specification have large impacts on our conclusions. The talk also shows how to ameliorate many situations like this via  matching for causal inference. We introduce matching methods that are simpler, more powerful, and easier to understand. We also show that the most commonly used...

Read more about Detecting and Reducing Model Dependence in Causal Inference
All presentations

Gary King on Twitter