Publications by Year: 2021

2021
Letter to US Census Bureau: "Request for release of “noisy measurements file” by September 30 along with redistricting data products"
Cynthia Dwork, Ruth Greenwood, and Gary King. 8/12/2021. “Letter to US Census Bureau: "Request for release of “noisy measurements file” by September 30 along with redistricting data products"”.Abstract
A letter, submitted on behalf of a large group of expert signatories, to request the release of the “noisy measurements file” and other redistricting data by September 30, 2021.  This includes the data created by the Bureau in preparing its differentially private data release, without their unnecessary (and, in many important situations, information destroying) post-processing.
Letter
There’s a simple solution to the latest census fight
Cynthia Dwork, Ruth Greenwood, and Gary King. 7/26/2021. “There’s a simple solution to the latest census fight.” Boston Globe, Pp. A9. Publisher's VersionAbstract
We offer a solution to debates over the use of differential privacy in releasing US Census Data.
Article
Cluster Analysis of Participant Responses for Test Generation or Teaching (2nd)
Gary King, Brian Lukoff, and Eric Mazur. 2/16/2021. “Cluster Analysis of Participant Responses for Test Generation or Teaching (2nd).” United States of America US 10,922,991 B2 (U.S Patent and Trademark Office).Abstract
Textual responses to open-ended (i.e., free-response) items provided by participants (e.g., by means of mobile wireless devices) are automatically classified, enabling an instructor to assess the responses in a convenient, organized fashion and adjust instruction accordingly.
patent
Participant Grouping for Enhanced Interactive Experience (4th)
Gary King, Brian Lukoff, and Eric Mazur. 1/26/2021. “Participant Grouping for Enhanced Interactive Experience (4th).” United States of America 10,902,031 B2 (U.S Patent and Trademark Office).Abstract
Representative embodiments of a method for grouping participants in an activity include the steps of: (i) defining a grouping policy; (ii) storing, in a database, participant records that include a participant identifier, a characteristic associated with the participant, and/or an identifier for a participant's handheld device; (iii) defining groupings based on the policy and characteristics of the participants relating to the policy and to the activity; and (iv) communicating the groupings to the handheld devices to establish the groups.
Patent
UnbiasedPrivacy
Georgina Evans, Gary King, Margaret Schwenzfeier, and Abhradeep Thakurta. 2021. “UnbiasedPrivacy”.
Designing Social Inquiry: Scientific Inference in Qualitative Research, New Edition
Gary King, Robert O. Keohane, and Sidney Verba. 2021. Designing Social Inquiry: Scientific Inference in Qualitative Research, New Edition. 2nd ed. Princeton: Princeton University Press. Publisher's VersionAbstract
"The classic work on qualitative methods in political science"

Designing Social Inquiry presents a unified approach to qualitative and quantitative research in political science, showing how the same logic of inference underlies both. This stimulating book discusses issues related to framing research questions, measuring the accuracy of data and the uncertainty of empirical inferences, discovering causal effects, and getting the most out of qualitative research. It addresses topics such as interpretation and inference, comparative case studies, constructing causal theories, dependent and explanatory variables, the limits of random selection, selection bias, and errors in measurement. The book only uses mathematical notation to clarify concepts, and assumes no prior knowledge of mathematics or statistics.

Featuring a new preface by Robert O. Keohane and Gary King, this edition makes an influential work available to new generations of qualitative researchers in the social sciences.
Education and Scholarship by Video
Gary King. 2021. “Education and Scholarship by Video”. [Direct link to paper]Abstract

When word processors were first introduced into the workplace, they turned scholars into typists. But they also improved our work: Turnaround time for new drafts dropped from days to seconds. Rewriting became easier and more common, and our papers, educational efforts, and research output improved. I discuss the advantages of and mechanisms for doing the same with do-it-yourself video recordings of research talks and class lectures, so that they may become a fully respected channel for scholarly output and education, alongside books and articles. I consider innovations in video design to optimize education and communication, along with technology to make this change possible.

Excerpts of this paper appeared in Political Science Today (Vol. 1, No. 3, August 2021: Pp.5-6, copy here) and in APSAEducate. See also my recorded videos here.

How to Measure Legislative District Compactness If You Only Know it When You See It
Aaron Kaufman, Gary King, and Mayya Komisarchik. 2021. “How to Measure Legislative District Compactness If You Only Know it When You See It.” American Journal of Political Science, 65, 3, Pp. 533-550. Publisher's VersionAbstract

To deter gerrymandering, many state constitutions require legislative districts to be "compact." Yet, the law offers few precise definitions other than "you know it when you see it," which effectively implies a common understanding of the concept. In contrast, academics have shown that compactness has multiple dimensions and have generated many conflicting measures. We hypothesize that both are correct -- that compactness is complex and multidimensional, but a common understanding exists across people. We develop a survey to elicit this understanding, with high reliability (in data where the standard paired comparisons approach fails). We create a statistical model that predicts, with high accuracy, solely from the geometric features of the district, compactness evaluations by judges and public officials responsible for redistricting, among others. We also offer compactness data from our validated measure for 20,160 state legislative and congressional districts, as well as open source software to compute this measure from any district.

Winner of the 2018 Robert H Durr Award from the MPSA.

Article Supplementary Appendix
Precision mapping child undernutrition for nearly 600,000 inhabited census villages in India
Rockli Kim, Avleen S. Bijral, Yun Xu, Xiuyuan Zhang, Jeffrey C. Blossom, Akshay Swaminathan, Gary King, Alok Kumar, Rakesh Sarwal, Juan M. Lavista Ferres, and S.V. Subramanian. 2021. “Precision mapping child undernutrition for nearly 600,000 inhabited census villages in India.” Proceedings of the National Academy of Sciences, 118, 18, Pp. 1-11. Publisher's VersionAbstract
There are emerging opportunities to assess health indicators at truly small areas with increasing availability of data geocoded to micro geographic units and advanced modeling techniques. The utility of such fine-grained data can be fully leveraged if linked to local governance units that are accountable for implementation of programs and interventions. We used data from the 2011 Indian Census for village-level demographic and amenities features and the 2016 Indian Demographic and Health Survey in a bias-corrected semisupervised regression framework to predict child anthropometric failures for all villages in India. Of the total geographic variation in predicted child anthropometric failure estimates, 54.2 to 72.3% were attributed to the village level followed by 20.6 to 39.5% to the state level. The mean predicted stunting was 37.9% (SD: 10.1%; IQR: 31.2 to 44.7%), and substantial variation was found across villages ranging from less than 5% for 691 villages to over 70% in 453 villages. Estimates at the village level can potentially shift the paradigm of policy discussion in India by enabling more informed prioritization and precise targeting. The proposed methodology can be adapted and applied to diverse population health indicators, and in other contexts, to reveal spatial heterogeneity at a finer geographic scale and identify local areas with the greatest needs and with direct implications for actions to take place.
Article
Survey Data and Human Computation for Improved Flu Tracking
Stefan Wojcik, Avleen Bijral, Richard Johnston, Juan Miguel Lavista, Gary King, Ryan Kennedy, Alessandro Vespignani, and David Lazer. 2021. “Survey Data and Human Computation for Improved Flu Tracking.” Nature Communications, 12, 194, Pp. 1-8. Publisher's VersionAbstract
While digital trace data from sources like search engines hold enormous potential for tracking and understanding human behavior, these streams of data lack information about the actual experiences of those individuals generating the data. Moreover, most current methods ignore or under-utilize human processing capabilities that allow humans to solve problems not yet solvable by computers (human computation). We demonstrate how behavioral research, linking digital and real-world behavior, along with human computation, can be utilized to improve the performance of studies using digital data streams. This study looks at the use of search data to track prevalence of Influenza-Like Illness (ILI). We build a behavioral model of flu search based on survey data linked to users’ online browsing data. We then utilize human computation for classifying search strings. Leveraging these resources, we construct a tracking model of ILI prevalence that outperforms strong historical benchmarks using only a limited stream of search data and lends itself to tracking ILI in smaller geographic units. While this paper only addresses searches related to ILI, the method we describe has potential for tracking a broad set of phenomena in near real-time.
Article Supporting Information
A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results
Beau Coker, Cynthia Rudin, and Gary King. 2021. “A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results.” Management Science, Pp. 1-24. Publisher's VersionAbstract
Inference is the process of using facts we know to learn about facts we do not know. A theory of inference gives assumptions necessary to get from the former to the latter, along with a definition for and summary of the resulting uncertainty. Any one theory of inference is neither right nor wrong, but merely an axiom that may or may not be useful. Each of the many diverse theories of inference can be valuable for certain applications. However, no existing theory of inference addresses the tendency to choose, from the range of plausible data analysis specifications consistent with prior evidence, those that inadvertently favor one's own hypotheses. Since the biases from these choices are a growing concern across scientific fields, and in a sense the reason the scientific community was invented in the first place, we introduce a new theory of inference designed to address this critical problem. We derive "hacking intervals," which are the range of a summary statistic one may obtain given a class of possible endogenous manipulations of the data. Hacking intervals require no appeal to hypothetical data sets drawn from imaginary superpopulations. A scientific result with a small hacking interval is more robust to researcher manipulation than one with a larger interval, and is often easier to interpret than a classical confidence interval. Some versions of hacking intervals turn out to be equivalent to classical confidence intervals, which means they may also provide a more intuitive and potentially more useful interpretation of classical confidence intervals. 
Article