Writings

Working Paper
Building an International Consortium for Tracking Coronavirus Health Status
Eran Segal, Feng Zhang, Xihong Lin, Gary King, Ophir Shalem, Smadar Shilo, William E. Allen, Yonatan H. Grad, Casey S. Greene, Faisal Alquaddoomi, Simon Anders, Ran Balicer, Tal Bauman, Ximena Bonilla, Gisel Booman, Andrew T. Chan, Ori Cohen, Silvano Coletti, Natalie Davidson, Yuval Dor, David A. Drew, Olivier Elemento, Georgina Evans, Phil Ewels, Joshua Gale, Amir Gavrieli, Benjamin Geiger, Iman Hajirasouliha, Roman Jerala, Andre Kahles, Olli Kallioniemi, Ayya Keshet, Gregory Landua, Tomer Meir, Aline Muller, Long H. Nguyen, Matej Oresic, Svetlana Ovchinnikova, Hedi Peterson, Jay Rajagopal, Gunnar Rätsch, Hagai Rossman, Johan Rung, Andrea Sboner, Alexandros Sigaras, Tim Spector, Ron Steinherz, Irene Stevens, Jaak Vilo, Paul Wilmes, and CCC (Coronavirus Census Collective). Working Paper. “Building an International Consortium for Tracking Coronavirus Health Status”. Publisher's VersionAbstract
Information is the most potent protective weapon we have to combat a pandemic, at both the individual and global level. For individuals, information can help us make personal decisions and provide a sense of security. For the global community, information can inform policy decisions and offer critical insights into the epidemic of COVID-19 disease. Fully leveraging the power of information, however, requires large amounts of data and access to it. To achieve this, we are making steps to form an international consortium, Coronavirus Census Collective (CCC, coronaviruscensuscollective.org), that will serve as a hub for integrating information from multiple data sources that can be utilized to understand, monitor, predict, and combat global pandemics. These sources may include self-reported health status through surveys (including mobile apps), results of diagnostic laboratory tests, and other static and real-time geospatial data. This collective effort to track and share information will be invaluable in predicting hotspots of disease outbreak, identifying which factors control the rate of spreading, informing immediate policy decisions, evaluating the effectiveness of measures taken by health organizations on pandemic control, and providing critical insight on the etiology of COVID-19. It will also help individuals stay informed on this rapidly evolving situation and contribute to other global efforts to slow the spread of disease. In the past few weeks, several initiatives across the globe have surfaced to use daily self-reported symptoms as a means to track disease spread, predict outbreak locations, guide population measures and help in the allocation of healthcare resources. The aim of this paper is to put out a call to standardize these efforts and spark a collaborative effort to maximize the global gain while protecting participant privacy.
Paper
Differentially Private Survey Research
Georgina Evans, Gary King, Adam D. Smith, and Abhradeep Thakurta. Working Paper. “Differentially Private Survey Research”.Abstract
Survey researchers have long sought to protect the privacy of their respondents via de-identification (removing names, addresses, and other directly identifying information) before analyzing or sharing data. Although these procedures obviously help in important circumstances, recent research demonstrates that they fail to protect survey respondents from intentional attempts at re-identification, a problem that threatens to  undermine vast survey enterprises in academia, government, and industry. This is especially a problem for political science because political beliefs are not only the subject of our survey questions and scholarship; they are key information respondents seek to keep private and elected representatives use to write privacy legislation. In this paper, we build on the concept of "differential privacy" to offer new survey research data sharing procedures with mathematical guarantees for protecting respondent privacy and statistical validity guarantees for social scientists analyzing differentially private data.  The cost of these new procedures is larger standard errors or confidence intervals, which can be overcome with somewhat larger sample sizes.
Paper
Evaluating COVID-19 Public Health Messaging in Italy: Self-Reported Compliance and Growing Mental Health Concerns
Soubhik Barari, Stefano Caria, Antonio Davola, Paolo Falco, Thiemo Fetzer, Stefano Fiorin, Lukas Hensel, Andriy Ivchenko, Jon Jachimowicz, Gary King, Gordon Kraft-Todd, Alice Ledda, Mary MacLennan, Lucian Mutoi, Claudio Pagani, Elena Reutskaja, Christopher Roth, and Federico Raimondi Slepoi. Working Paper. “Evaluating COVID-19 Public Health Messaging in Italy: Self-Reported Compliance and Growing Mental Health Concerns”. Publisher's VersionAbstract
Purpose: The COVID-19 death-rate in Italy continues to climb, surpassing that in every other country. We implement one of the first nationally representative surveys about this unprecedented public health crisis and use it to evaluate the Italian government’ public health efforts and citizen responses. 
Findings: (1) Public health messaging is being heard. Except for slightly lower compliance among young adults, all subgroups we studied understand how to keep themselves and others safe from the SARS-Cov-2 virus. Remarkably, even those who do not trust the government, or think the government has been untruthful about the crisis believe the messaging and claim to be acting in accordance. (2) The quarantine is beginning to have serious negative effects on the population’s mental health.
Policy Recommendations: Communications focus should move from explaining to citizens that they should stay at home to what they can do there. We need interventions that make staying at home and following public health protocols more desirable. These interventions could include virtual social interactions, such as online social reading activities, classes, exercise routines, etc. — all designed to reduce the boredom of long term social isolation and to increase the attractiveness of following public health recommendations. Interventions like these will grow in importance as the crisis wears on around the world, and staying inside wears on people.
Paper
How Human Subjects Research Rules Mislead You and Your University, and What to Do About it
Gary King and Melissa Sands. Working Paper. “How Human Subjects Research Rules Mislead You and Your University, and What to Do About it”.Abstract

Universities require faculty and students planning research involving human subjects to pass formal certification tests and then submit research plans for prior approval. Those who diligently take the tests may better understand certain important legal requirements but, at the same time, are often misled into thinking they can apply these rules to their own work which, in fact, they are not permitted to do. They will also be missing many other legal requirements not mentioned in their training but which govern their behaviors. Finally, the training leaves them likely to completely misunderstand the essentially political situation they find themselves in. The resulting risks to their universities, collaborators, and careers may be catastrophic, in addition to contributing to the more common ordinary frustrations of researchers with the system. To avoid these problems, faculty and students conducting research about and for the public need to understand that they are public figures, to whom different rules apply, ones that political scientists have long studied. University administrators (and faculty in their part-time roles as administrators) need to reorient their perspectives as well. University research compliance bureaucracies have grown, in well-meaning but sometimes unproductive ways that are not required by federal laws or guidelines. We offer advice to faculty and students for how to deal with the system as it exists now, and suggestions for changes in university research compliance bureaucracies, that should benefit faculty, students, staff, university budgets, and our research subjects.

Paper
PSI (Ψ): a Private data Sharing Interface
Marco Gaboardi, James Honaker, Gary King, Kobbi Nissim, Jonathan Ullman, and Salil Vadhan. Working Paper. “PSI (Ψ): a Private data Sharing Interface”. Publisher's VersionAbstract

We provide an overview of PSI ("a Private data Sharing Interface"), a system we are developing to enable researchers in the social sciences and other fields to share and explore privacy-sensitive datasets with the strong privacy protections of differential privacy.

Paper
Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset
Georgina Evans and Gary King. Working Paper. “Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset”.Abstract

We offer methods to analyze the "differentially private" Facebook URLs Dataset which, at over 10 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it possible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias -- including attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statistically consistent and approximately unbiased linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of non-confidential data but with appropriately larger standard errors.

We have implemented these methods in open source software for R called PrivacyUnbiased.  Facebook has ported PrivacyUnbiased to open source Python code called svinfer.

Paper
Statistically Valid Inferences from Privacy Protected Data
Georgina Evans, Gary King, Margaret Schwenzfeier, and Abhradeep Thakurta. Working Paper. “Statistically Valid Inferences from Privacy Protected Data”.Abstract
Unprecedented quantities of data that could help social scientists understand and ameliorate the challenges of human society are presently locked away inside  companies, governments, and other organizations, in part because of worries about privacy violations. We address this problem with a general-purpose data access and analysis system with mathematical guarantees of privacy for individuals who may be represented in the data and statistical validity guarantees for researchers seeking population-level insights from it. We build on the standard of "differential privacy" but, unlike most such approaches, we also correct for the serious statistical biases induced by privacy-preserving procedures, provide a proper accounting for statistical uncertainty, and impose minimal constraints on the choice of data analytic methods and types of quantities estimated. Our algorithm is easy to implement, simple to use, and computationally efficient; we also offer open source software to illustrate all our methods.
Paper
Survey Data and Human Computation for Improved Flu Tracking
Stefan Wojcik, Avleen Bijral, Richard Johnston, Juan Miguel Lavista, Gary King, Ryan Kennedy, Alessandro Vespignani, and David Lazer. Working Paper. “Survey Data and Human Computation for Improved Flu Tracking”.Abstract
While digital trace data from sources like search engines hold enormous potential for tracking and understanding human behavior, these streams of data lack information about the actual experiences of those individuals generating the data. Moreover, most current methods ignore or under-utilize human processing capabilities that allow humans to solve problems not yet solvable by computers (human computation). We demonstrate how behavioral research, linking digital and real-world behavior, along with human computation, can be utilized to improve the performance of studies using digital data streams. This study looks at the use of search data to track prevalence of Influenza-Like Illness (ILI). We build a behavioral model of flu search based on survey data linked to users’ online browsing data. We then utilize human computation for classifying search strings. Leveraging these resources, we construct a tracking model of ILI prevalence that outperforms strong historical benchmarks using only a limited stream of search data and lends itself to tracking ILI in smaller geographic units. While this paper only addresses searches related to ILI, the method we describe has potential for tracking a broad set of phenomena in near real-time.
Paper Supporting Information
Forthcoming
booc.io: An Education System with Hierarchical Concept Maps
Michail Schwab, Hendrik Strobelt, James Tompkin, Colin Fredericks, Connor Huff, Dana Higgins, Anton Strezhnev, Mayya Komisarchik, Gary King, and Hanspeter Pfister. Forthcoming. “booc.io: An Education System with Hierarchical Concept Maps.” IEEE Transactions on Visualization and Computer Graphics.Abstract

Information hierarchies are difficult to express when real-world space or time constraints force traversing the hierarchy in linear presentations, such as in educational books and classroom courses. We present booc.io, which allows linear and non-linear presentation and navigation of educational concepts and material. To support a breadth of material for each concept, booc.io is Web based, which allows adding material such as lecture slides, book chapters, videos, and LTIs. A visual interface assists the creation of the needed hierarchical structures. The goals of our system were formed in expert interviews, and we explain how our design meets these goals. We adapt a real-world course into booc.io, and perform introductory qualitative evaluation with students.

Edited transcript of a talk on Partisan Symmetry at the 'Redistricting and Representation Forum'
Gary King. Forthcoming. “Edited transcript of a talk on Partisan Symmetry at the 'Redistricting and Representation Forum'.” Bulletin of the American Academy of Arts and Sciences, Winter, Pp. 55-58.Abstract

The origin, meaning, estimation, and application of the concept of partisan symmetry in legislative redistricting, and the justiciability of partisan gerrymandering. An edited transcript of a talk at the “Redistricting and Representation Forum,” American Academy of Arts & Sciences, Cambridge, MA 11/8/2017.

Here also is a video of the original talk.

Article
How to Measure Legislative District Compactness If You Only Know it When You See It
Aaron Kaufman, Gary King, and Mayya Komisarchik. Forthcoming. “How to Measure Legislative District Compactness If You Only Know it When You See It.” American Journal of Political Science.Abstract

To deter gerrymandering, many state constitutions require legislative districts to be "compact." Yet, the law offers few precise definitions other than "you know it when you see it," which effectively implies a common understanding of the concept. In contrast, academics have shown that compactness has multiple dimensions and have generated many conflicting measures. We hypothesize that both are correct -- that compactness is complex and multidimensional, but a common understanding exists across people. We develop a survey to elicit this understanding, with high reliability (in data where the standard paired comparisons approach fails). We create a statistical model that predicts, with high accuracy, solely from the geometric features of the district, compactness evaluations by judges and public officials responsible for redistricting, among others. We also offer compactness data from our validated measure for 20,160 state legislative and congressional districts, as well as open source software to compute this measure from any district.

Winner of the 2018 Robert H Durr Award from the MPSA.

Paper Supplementary Appendix
An Improved Method of Automated Nonparametric Content Analysis for Social Science
Connor T. Jerzak, Gary King, and Anton Strezhnev. Forthcoming. “An Improved Method of Automated Nonparametric Content Analysis for Social Science.” Political Analysis.Abstract

Some scholars build models to classify documents into chosen categories. Others, especially social scientists who tend to focus on population characteristics, instead usually estimate the proportion of documents in each category -- using either parametric "classify-and-count" methods or "direct" nonparametric estimation of proportions without individual classification. Unfortunately, classify-and-count methods can be highly model dependent or generate more bias in the proportions even as the percent of documents correctly classified increases. Direct estimation avoids these problems, but can suffer when the meaning of language changes between training and test sets or is too similar across categories. We develop an improved direct estimation approach without these issues by including and optimizing continuous text features, along with a form of matching adapted from the causal inference literature. Our approach substantially improves performance in a diverse collection of 73 data sets. We also offer easy-to-use software software that implements all ideas discussed herein.

Paper
A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results
Beau Coker, Cynthia Rudin, and Gary King. Forthcoming. “A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results.” Management Science. Publisher's VersionAbstract
Inference is the process of using facts we know to learn about facts we do not know. A theory of inference gives assumptions necessary to get from the former to the latter, along with a definition for and summary of the resulting uncertainty. Any one theory of inference is neither right nor wrong, but merely an axiom that may or may not be useful. Each of the many diverse theories of inference can be valuable for certain applications. However, no existing theory of inference addresses the tendency to choose, from the range of plausible data analysis specifications consistent with prior evidence, those that inadvertently favor one's own hypotheses. Since the biases from these choices are a growing concern across scientific fields, and in a sense the reason the scientific community was invented in the first place, we introduce a new theory of inference designed to address this critical problem. We derive "hacking intervals," which are the range of a summary statistic one may obtain given a class of possible endogenous manipulations of the data. Hacking intervals require no appeal to hypothetical data sets drawn from imaginary superpopulations. A scientific result with a small hacking interval is more robust to researcher manipulation than one with a larger interval, and is often easier to interpret than a classical confidence interval. Some versions of hacking intervals turn out to be equivalent to classical confidence intervals, which means they may also provide a more intuitive and potentially more useful interpretation of classical confidence intervals. 
Paper
2020
Instructional Support Platform for Interactive Learning Platforms (2nd)
Gary King, Eric Mazur, Kelly Miller, and Brian Lukoff. 6/23/2020. “Instructional Support Platform for Interactive Learning Platforms (2nd).” United States of America US 10,692,391 B2 (U.S Patent and Trademark Office).Abstract
In various embodiments, subject matter for improving discussions in connection with an educational resource is identified and summarized by analyzing annotations made by students assigned to a discussion group to identify high-quality annotations likely to generate responses and stimulate discussion threads, identifying clusters of high quality annotations relating to the same portion or related portions of the educational resource , extracting and summarizing text from the annotations, and combining , in an electronically represented document, the extracted and summarized text and (i) at least some of the annotations and the portion or portions of the educational resource or (ii) click able links thereto.
Patent
2/2020. “The SilverLining Project: Finding Social Good in Clouds on the Dark Web”.
Do Nonpartisan Programmatic Policies Have Partisan Electoral Effects? Evidence from Two Large Scale Experiments
Kosuke Imai, Gary King, and Carlos Velasco Rivera. 1/31/2020. “Do Nonpartisan Programmatic Policies Have Partisan Electoral Effects? Evidence from Two Large Scale Experiments.” Journal of Politics, 81, 2, Pp. 714-730. Publisher's VersionAbstract

A vast literature demonstrates that voters around the world who benefit from their governments' discretionary spending cast more ballots for the incumbent party than those who do not benefit. But contrary to most theories of political accountability, some suggest that voters also reward incumbent parties for implementing "programmatic" spending legislation, over which incumbents have no discretion, and even when passed with support from all major parties. Why voters would attribute responsibility when none exists is unclear, as is why minority party legislators would approve of legislation that would cost them votes. We study the electoral effects of two large prominent programmatic policies that fit the ideal type especially well, with unusually large scale experiments that bring more evidence to bear on this question than has previously been possible. For the first policy, we design and implement ourselves one of the largest randomized social experiments ever. For the second policy, we reanalyze studies that used a large scale randomized experiment and a natural experiment to study the same question but came to opposite conclusions. Using corrected data and improved statistical methods, we show that the evidence from all analyses of both policies is consistent: programmatic policies have no effect on voter support for incumbents. We conclude by discussing how the many other studies in the literature may be interpreted in light of our results.

Article Supplementary Appendix
The “Math Prefresher” and The Collective Future of Political Science Graduate Training
Gary King, Shiro Kuriwaki, and Yon Soo Park. 2020. “The “Math Prefresher” and The Collective Future of Political Science Graduate Training.” PS: Political Science and Politics, 53, 3, Pp. 537-541. Publisher's VersionAbstract

The political science math prefresher arose a quarter century ago and has now spread to many of our discipline’s Ph.D. programs. Incoming students arrive for graduate school a few weeks early for ungraded instruction in math, statistics, and computer science as they are useful for political science. The prefresher’s benefits, however, go beyond the technical material taught: it develops lasting camaraderie with their entering class, facilitates connections with senior graduate students, opens pathways to mastering methods necessary for research, and eases the transition to the increasingly collaborative nature of graduate work. The prefresher also shows how faculty across a highly diverse discipline can work together to train the next generation. We review this program, highlight its collaborative aspects, and try to take the idea to the next level by building infrastructure to share teaching materials across universities so separate programs can build on each other’s work and improve all our programs.

Article
So You're a Grad Student Now? Maybe You Should Do This
Gary King. 2020. “So You're a Grad Student Now? Maybe You Should Do This.” In The SAGE Handbook of Research Methods in Political Science and International Relations, edited by Jr. Robert J. Franzese and Luigi Curini, Pp. 1--4. London: Sage Publications.Abstract
Congratulations! You’ve made it to graduate school. This means you’re in a select group, about to embark on a great adventure to learn about the world and teach us all some new things. This also means you obviously know how to follow rules. So I have five for you -- not counting the obvious one that to learn new things you’ll need to break some rules. After all, to be a successful academic, you’ll need to cut a new path, and so if you do exactly what your advisors and I did, you won’t get anywhere near as far since we already did it. So here are some rules, but break some of them, perhaps including this one
Chapter
Theoretical Foundations and Empirical Evaluations of Partisan Fairness in District-Based Democracies
Jonathan N. Katz, Gary King, and Elizabeth Rosenblatt. 2020. “Theoretical Foundations and Empirical Evaluations of Partisan Fairness in District-Based Democracies.” American Political Science Review, 114, 1, Pp. 164-178. Publisher's VersionAbstract
We clarify the theoretical foundations of partisan fairness standards for district-based democratic electoral systems, including essential assumptions and definitions that have not been recognized, formalized, or in some cases even discussed. We also offer extensive empirical evidence for assumptions with observable implications. Throughout, we follow a fundamental principle of statistical inference too often ignored in this literature -- defining the quantity of interest separately so its measures can be proven wrong, evaluated, or improved. This enables us to prove which of the many newly proposed fairness measures are statistically appropriate and which are biased, limited, or not measures of the theoretical quantity they seek to estimate at all. Because real world redistricting and gerrymandering involves complicated politics with numerous participants and conflicting goals, measures biased for partisan fairness sometimes still provide useful descriptions of other aspects of electoral systems.
Article Online Appendices
2019
Instructional Support Platform for Interactive Learning Platforms
Gary King, Eric Mazur, Kelly Miller, and Brian Lukoff. 10/8/2019. “Instructional Support Platform for Interactive Learning Platforms.” United States of America US 10,438,498 B2 (U.S Patent and Trademark Office).Abstract
In various embodiments, subject matter for improving discussions in connection with an educational resource is identified and summarized by analyzing annotations made by students assigned to a discussion group to identify high-quality annotations likely to generate responses and stimulate discussion threads, identifying clusters of high quality annotations relating to the same portion or related portions of the educational resource , extracting and summarizing text from the annotations, and combining , in an electronically represented document, the extracted and summarized text and (i) at least some of the annotations and the portion or portions of the educational resource or (ii) click able links thereto.
Patent

Pages