Gary King is the Weatherhead University Professor at Harvard University. He also serves as Director of the Institute for Quantitative Social Science. He and his research group develop and apply empirical methods in many areas of social science research. Full bio and CV

## Research Areas

• Anchoring Vignettes (for interpersonal incomparability)
Methods for interpersonal incomparability, when respondents (from different cultures, genders, countries, or ethnic groups) understand survey questions in different ways; for developing theoretical definitions of complicated concepts apparently definable only by example (i.e., "you know it when you see it").
• Automated Text Analysis
Automated and computer-assisted methods of extracting, organizing, understanding, conceptualizing, and consuming knowledge from massive quantities of unstructured text.
• Causal Inference
Methods for detecting and reducing model dependence (i.e., when minor model changes produce substantively different inferences) in inferring causal effects and other counterfactuals. Matching methods; "politically robust" and cluster-randomized experimental designs; causal bias decompositions.
• Event Counts and Durations
Statistical models to explain or predict how many events occur for each fixed time period, or the time between events. An application to cabinet dissolution in parliamentary democracies which united two previously warring scholarly literature. Other applications to international relations and U.S. Supreme Court appointments.
• Ecological Inference
Inferring individual behavior from group-level data: The first approach to incorporate both unit-level deterministic bounds and cross-unit statistical information, methods for 2x2 and larger tables, Bayesian model averaging, applications to elections, software.
• Missing Data & Measurement Error
Statistical methods to accommodate missing information in data sets due to scattered unit nonresponse, missing variables, or values or variables measured with error. Easy-to-use algorithms and software for multiple imputation and multiple overimputation for surveys, time series, and time series cross-sectional data. Applications to electoral, and other compositional, data.
• Qualitative Research
How the same unified theory of inference underlies quantitative and qualitative research alike; scientific inference when quantification is difficult or impossible; research design; empirical research in legal scholarship.
• Rare Events
How to save 99% of your data collection costs; bias corrections for logistic regression in estimating probabilities and causal effects in rare events data; estimating base probabilities or any quantity from case-control data; automated coding of events.
• Survey Research
How surveys work and a variety of methods to use with surveys. Surveys for estimating death rates, why election polls are so variable when the vote is so predictable, and health inequality.
• Unifying Statistical Analysis
Development of a unified approach to statistical modeling, inference, interpretation, presentation, analysis, and software; integrated with most of the other projects listed here.
• Evaluating Social Security Forecasts
The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, government programs comprising greater than 50% of all federal government expenditures, industry decision making, and the evidence base of many scholarly articles. Forecasts are also essential for scoring policy proposals, put forward by both political parties. Because SSA makes public little replication information, and uses ad hoc, qualitative, and antiquated statistical forecasting methods, no one in or out of government has been able to produce fully independent alternative forecasts or policy scorings. Yet, no systematic evaluation of SSA forecasts has ever been published by SSA or anyone else. We show that SSA's forecasting errors were approximately unbiased until about 2000, but then began to grow quickly, with increasingly overconfident uncertainty intervals. Moreover, the errors all turn out to be in the same potentially dangerous direction, each making the Social Security Trust Funds look healthier than they actually are. We also discover the cause of these findings with evidence from a large number of interviews we conducted with participants at every level of the forecasting and policy processes. We show that SSA's forecasting procedures meet all the conditions the modern social-psychology and statistical literatures demonstrate make bias likely. When those conditions mixed with potent new political forces trying to change Social Security and influence the forecasts, SSA's actuaries hunkered down trying hard to insulate themselves from the intense political pressures. Unfortunately, this otherwise laudable resistance to undue influence, along with their ad hoc qualitative forecasting models, led them to also miss important changes in the input data such as retirees living longer lives, and drawing more benefits, than predicted by simple extrapolations. We explain that solving this problem involves using (a) removing human judgment where possible, by using formal statistical methods -- via the revolution in data science and big data; (b) instituting formal structural procedures when human judgment is required -- via the revolution in social psychological research; and (c) requiring transparency and data sharing to catch errors that slip through -- via the revolution in data sharing & replication.An article at Barron's about our work.
Proof that previously used estimators of electoral incumbency advantage were biased, and a new unbiased estimator. Also, the first systematic demonstration that constituency service by legislators increases the incumbency advantage.
• Chinese Censorship
• Mexican Health Care Evaluation
An evaluation of the Mexican Seguro Popular program (designed to extend health insurance and regular and preventive medical care, pharmaceuticals, and health facilities to 50 million uninsured Mexicans), one of the world's largest health policy reforms of the last two decades. Our evaluation features a new design for field experiments that is more robust to the political interventions and implementation errors that have ruined many similar previous efforts; new statistical methods that produce more reliable and efficient results using fewer resources, assumptions, and data, as well as standard errors that are as much as 600% smaller; and an implementation of these methods in the largest randomized health policy experiment to date. (See the Harvard Gazette story on this project.)
• Presidency Research; Voting Behavior
Resolution of the paradox of why polls are so variable over time during presidential campaigns even though the vote outcome is easily predictable before it starts. Also, a resolution of a key controversy over absentee ballots during the 2000 presidential election; and the methodology of small-n research on executives.
• Informatics and Data Sharing
Replication Standards New standards, protocols, and software for citing, sharing, analyzing, archiving, preserving, distributing, cataloging, translating, disseminating, naming, verifying, and replicating scholarly research data and analyses. Also includes proposals to improve the norms of data sharing and replication in science.
• International Conflict
Methods for coding, analyzing, and forecasting international conflict and state failure. Evidence that the causes of conflict, theorized to be important but often found to be small or ephemeral, are indeed tiny for the vast majority of dyads, but are large, stable, and replicable wherever the ex ante probability of conflict is large.
• Legislative Redistricting
The definition of partisan symmetry as a standard for fairness in redistricting; methods and software for measuring partisan bias and electoral responsiveness; discussion of U.S. Supreme Court rulings about this work. Evidence that U.S. redistricting reduces bias and increases responsiveness, and that the electoral college is fair; applications to legislatures, primaries, and multiparty systems.
• Mortality Studies
Methods for forecasting mortality rates (overall or for time series data cross-classified by age, sex, country, and cause); estimating mortality rates in areas without vital registration; measuring inequality in risk of death; applications to US mortality, the future of the Social Security, armed conflict, heart failure, and human security.
Publications and other projects designed to improve teaching, learning, and university administration, as well as broader writings on the future of the social sciences.

# Building an International Consortium for Tracking Coronavirus Health Status

Eran Segal, Feng Zhang, Xihong Lin, Gary King, Ophir Shalem, Smadar Shilo, William E. Allen, Yonatan H. Grad, Casey S. Greene, Faisal Alquaddoomi, Simon Anders, Ran Balicer, Tal Bauman, Ximena Bonilla, Gisel Booman, Andrew T. Chan, Ori Cohen, Silvano Coletti, Natalie Davidson, Yuval Dor, David A. Drew, Olivier Elemento, Georgina Evans, Phil Ewels, Joshua Gale, Amir Gavrieli, Benjamin Geiger, Iman Hajirasouliha, Roman Jerala, Andre Kahles, Olli Kallioniemi, Ayya Keshet, Gregory Landua, Tomer Meir, Aline Muller, Long H. Nguyen, Matej Oresic, Svetlana Ovchinnikova, Hedi Peterson, Jay Rajagopal, Gunnar Rätsch, Hagai Rossman, Johan Rung, Andrea Sboner, Alexandros Sigaras, Tim Spector, Ron Steinherz, Irene Stevens, Jaak Vilo, Paul Wilmes, and CCC (Coronavirus Census Collective). 8/2020. “Building an International Consortium for Tracking Coronavirus Health Status.” Nature Medicine, 26, Pp. 1161-1165. Publisher's VersionAbstract
Information is the most potent protective weapon we have to combat a pandemic, at both the individual and global level. For individuals, information can help us make personal decisions and provide a sense of security. For the global community, information can inform policy decisions and offer critical insights into the epidemic of COVID-19 disease. Fully leveraging the power of information, however, requires large amounts of data and access to it. To achieve this, we are making steps to form an international consortium, Coronavirus Census Collective (CCC, coronaviruscensuscollective.org), that will serve as a hub for integrating information from multiple data sources that can be utilized to understand, monitor, predict, and combat global pandemics. These sources may include self-reported health status through surveys (including mobile apps), results of diagnostic laboratory tests, and other static and real-time geospatial data. This collective effort to track and share information will be invaluable in predicting hotspots of disease outbreak, identifying which factors control the rate of spreading, informing immediate policy decisions, evaluating the effectiveness of measures taken by health organizations on pandemic control, and providing critical insight on the etiology of COVID-19. It will also help individuals stay informed on this rapidly evolving situation and contribute to other global efforts to slow the spread of disease. In the past few weeks, several initiatives across the globe have surfaced to use daily self-reported symptoms as a means to track disease spread, predict outbreak locations, guide population measures and help in the allocation of healthcare resources. The aim of this paper is to put out a call to standardize these efforts and spark a collaborative effort to maximize the global gain while protecting participant privacy.

# Survey Data and Human Computation for Improved Flu Tracking

Stefan Wojcik, Avleen Bijral, Richard Johnston, Juan Miguel Lavista, Gary King, Ryan Kennedy, Alessandro Vespignani, and David Lazer. 2021. “Survey Data and Human Computation for Improved Flu Tracking.” Nature Communications, 12, 194, Pp. 1-8. Publisher's VersionAbstract
While digital trace data from sources like search engines hold enormous potential for tracking and understanding human behavior, these streams of data lack information about the actual experiences of those individuals generating the data. Moreover, most current methods ignore or under-utilize human processing capabilities that allow humans to solve problems not yet solvable by computers (human computation). We demonstrate how behavioral research, linking digital and real-world behavior, along with human computation, can be utilized to improve the performance of studies using digital data streams. This study looks at the use of search data to track prevalence of Influenza-Like Illness (ILI). We build a behavioral model of flu search based on survey data linked to users’ online browsing data. We then utilize human computation for classifying search strings. Leveraging these resources, we construct a tracking model of ILI prevalence that outperforms strong historical benchmarks using only a limited stream of search data and lends itself to tracking ILI in smaller geographic units. While this paper only addresses searches related to ILI, the method we describe has potential for tracking a broad set of phenomena in near real-time.

# Evaluating COVID-19 Public Health Messaging in Italy: Self-Reported Compliance and Growing Mental Health Concerns

Soubhik Barari, Stefano Caria, Antonio Davola, Paolo Falco, Thiemo Fetzer, Stefano Fiorin, Lukas Hensel, Andriy Ivchenko, Jon Jachimowicz, Gary King, Gordon Kraft-Todd, Alice Ledda, Mary MacLennan, Lucian Mutoi, Claudio Pagani, Elena Reutskaja, Christopher Roth, and Federico Raimondi Slepoi. 2020. “Evaluating COVID-19 Public Health Messaging in Italy: Self-Reported Compliance and Growing Mental Health Concerns”. Publisher's VersionAbstract

Purpose: The COVID-19 death-rate in Italy continues to climb, surpassing that in every other country. We implement one of the first nationally representative surveys about this unprecedented public health crisis and use it to evaluate the Italian government’ public health efforts and citizen responses.
Findings: (1) Public health messaging is being heard. Except for slightly lower compliance among young adults, all subgroups we studied understand how to keep themselves and others safe from the SARS-Cov-2 virus. Remarkably, even those who do not trust the government, or think the government has been untruthful about the crisis believe the messaging and claim to be acting in accordance. (2) The quarantine is beginning to have serious negative effects on the population’s mental health.
Policy Recommendations: Communications focus should move from explaining to citizens that they should stay at home to what they can do there. We need interventions that make staying at home and following public health protocols more desirable. These interventions could include virtual social interactions, such as online social reading activities, classes, exercise routines, etc. — all designed to reduce the boredom of long term social isolation and to increase the attractiveness of following public health recommendations. Interventions like these will grow in importance as the crisis wears on around the world, and staying inside wears on people.

Replication data for this study in dataverse

# Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset

Georgina Evans and Gary King. Forthcoming. “Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset.” Political Analysis.Abstract

We offer methods to analyze the "differentially private" Facebook URLs Dataset which, at over 10 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it possible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias -- including attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statistically consistent and approximately unbiased linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of non-confidential data but with appropriately larger standard errors.

We have implemented these methods in open source software for R called PrivacyUnbiased.  Facebook has ported PrivacyUnbiased to open source Python code called svinfer.  We have extended these results in Evans and King (2021).

# Statistically Valid Inferences from Privacy Protected Data

Georgina Evans, Gary King, Margaret Schwenzfeier, and Abhradeep Thakurta. Working Paper. “Statistically Valid Inferences from Privacy Protected Data”.Abstract
Unprecedented quantities of data that could help social scientists understand and ameliorate the challenges of human society are presently locked away inside companies, governments, and other organizations, in part because of privacy concerns. We address this problem with a general-purpose data access and analysis system with mathematical guarantees of privacy for research subjects, and statistical validity guarantees for researchers seeking social science insights. We build on the standard of differential privacy,'' correct for biases induced by the privacy-preserving procedures, provide a proper accounting of uncertainty, and impose minimal constraints on the choice of statistical methods and quantities estimated. We also replicate two recent published articles and show how we can obtain approximately the same substantive results while simultaneously protecting the privacy. Our approach is simple to use and computationally efficient; we also offer open source software that implements all our methods.

# The “Math Prefresher” and The Collective Future of Political Science Graduate Training

Gary King, Shiro Kuriwaki, and Yon Soo Park. 2020. “The “Math Prefresher” and The Collective Future of Political Science Graduate Training.” PS: Political Science and Politics, 53, 3, Pp. 537-541. Publisher's VersionAbstract

The political science math prefresher arose a quarter century ago and has now spread to many of our discipline’s Ph.D. programs. Incoming students arrive for graduate school a few weeks early for ungraded instruction in math, statistics, and computer science as they are useful for political science. The prefresher’s benefits, however, go beyond the technical material taught: it develops lasting camaraderie with their entering class, facilitates connections with senior graduate students, opens pathways to mastering methods necessary for research, and eases the transition to the increasingly collaborative nature of graduate work. The prefresher also shows how faculty across a highly diverse discipline can work together to train the next generation. We review this program, highlight its collaborative aspects, and try to take the idea to the next level by building infrastructure to share teaching materials across universities so separate programs can build on each other’s work and improve all our programs.

# So You're a Grad Student Now? Maybe You Should Do This

Gary King. 2020. “So You're a Grad Student Now? Maybe You Should Do This.” In The SAGE Handbook of Research Methods in Political Science and International Relations, edited by Jr. Robert J. Franzese and Luigi Curini, Pp. 1--4. London: Sage Publications.Abstract
Congratulations! You’ve made it to graduate school. This means you’re in a select group, about to embark on a great adventure to learn about the world and teach us all some new things. This also means you obviously know how to follow rules. So I have five for you -- not counting the obvious one that to learn new things you’ll need to break some rules. After all, to be a successful academic, you’ll need to cut a new path, and so if you do exactly what your advisors and I did, you won’t get anywhere near as far since we already did it. So here are some rules, but break some of them, perhaps including this one
All writings

## Presentations

All presentations

• «
• 2 of 2
•
All writings