Journal Article

Precision mapping child undernutrition for nearly 600,000 inhabited census villages in India
Rockli Kim, Avleen S. Bijral, Yun Xu, Xiuyuan Zhang, Jeffrey C. Blossom, Akshay Swaminathan, Gary King, Alok Kumar, Rakesh Sarwal, Juan M. Lavista Ferres, and S.V. Subramanian. 2021. “Precision mapping child undernutrition for nearly 600,000 inhabited census villages in India.” Proceedings of the National Academy of Sciences, 118, 18, Pp. 1-11. Publisher's VersionAbstract
There are emerging opportunities to assess health indicators at truly small areas with increasing availability of data geocoded to micro geographic units and advanced modeling techniques. The utility of such fine-grained data can be fully leveraged if linked to local governance units that are accountable for implementation of programs and interventions. We used data from the 2011 Indian Census for village-level demographic and amenities features and the 2016 Indian Demographic and Health Survey in a bias-corrected semisupervised regression framework to predict child anthropometric failures for all villages in India. Of the total geographic variation in predicted child anthropometric failure estimates, 54.2 to 72.3% were attributed to the village level followed by 20.6 to 39.5% to the state level. The mean predicted stunting was 37.9% (SD: 10.1%; IQR: 31.2 to 44.7%), and substantial variation was found across villages ranging from less than 5% for 691 villages to over 70% in 453 villages. Estimates at the village level can potentially shift the paradigm of policy discussion in India by enabling more informed prioritization and precise targeting. The proposed methodology can be adapted and applied to diverse population health indicators, and in other contexts, to reveal spatial heterogeneity at a finer geographic scale and identify local areas with the greatest needs and with direct implications for actions to take place.
Survey Data and Human Computation for Improved Flu Tracking
Stefan Wojcik, Avleen Bijral, Richard Johnston, Juan Miguel Lavista, Gary King, Ryan Kennedy, Alessandro Vespignani, and David Lazer. 2021. “Survey Data and Human Computation for Improved Flu Tracking.” Nature Communications, 12, 194, Pp. 1-8. Publisher's VersionAbstract
While digital trace data from sources like search engines hold enormous potential for tracking and understanding human behavior, these streams of data lack information about the actual experiences of those individuals generating the data. Moreover, most current methods ignore or under-utilize human processing capabilities that allow humans to solve problems not yet solvable by computers (human computation). We demonstrate how behavioral research, linking digital and real-world behavior, along with human computation, can be utilized to improve the performance of studies using digital data streams. This study looks at the use of search data to track prevalence of Influenza-Like Illness (ILI). We build a behavioral model of flu search based on survey data linked to users’ online browsing data. We then utilize human computation for classifying search strings. Leveraging these resources, we construct a tracking model of ILI prevalence that outperforms strong historical benchmarks using only a limited stream of search data and lends itself to tracking ILI in smaller geographic units. While this paper only addresses searches related to ILI, the method we describe has potential for tracking a broad set of phenomena in near real-time.
Building an International Consortium for Tracking Coronavirus Health Status
Eran Segal, Feng Zhang, Xihong Lin, Gary King, Ophir Shalem, Smadar Shilo, William E. Allen, Yonatan H. Grad, Casey S. Greene, Faisal Alquaddoomi, Simon Anders, Ran Balicer, Tal Bauman, Ximena Bonilla, Gisel Booman, Andrew T. Chan, Ori Cohen, Silvano Coletti, Natalie Davidson, Yuval Dor, David A. Drew, Olivier Elemento, Georgina Evans, Phil Ewels, Joshua Gale, Amir Gavrieli, Benjamin Geiger, Iman Hajirasouliha, Roman Jerala, Andre Kahles, Olli Kallioniemi, Ayya Keshet, Gregory Landua, Tomer Meir, Aline Muller, Long H. Nguyen, Matej Oresic, Svetlana Ovchinnikova, Hedi Peterson, Jay Rajagopal, Gunnar Rätsch, Hagai Rossman, Johan Rung, Andrea Sboner, Alexandros Sigaras, Tim Spector, Ron Steinherz, Irene Stevens, Jaak Vilo, Paul Wilmes, and CCC (Coronavirus Census Collective). 8/2020. “Building an International Consortium for Tracking Coronavirus Health Status.” Nature Medicine, 26, Pp. 1161-1165. Publisher's VersionAbstract
Information is the most potent protective weapon we have to combat a pandemic, at both the individual and global level. For individuals, information can help us make personal decisions and provide a sense of security. For the global community, information can inform policy decisions and offer critical insights into the epidemic of COVID-19 disease. Fully leveraging the power of information, however, requires large amounts of data and access to it. To achieve this, we are making steps to form an international consortium, Coronavirus Census Collective (CCC, coronaviruscensuscollective.org), that will serve as a hub for integrating information from multiple data sources that can be utilized to understand, monitor, predict, and combat global pandemics. These sources may include self-reported health status through surveys (including mobile apps), results of diagnostic laboratory tests, and other static and real-time geospatial data. This collective effort to track and share information will be invaluable in predicting hotspots of disease outbreak, identifying which factors control the rate of spreading, informing immediate policy decisions, evaluating the effectiveness of measures taken by health organizations on pandemic control, and providing critical insight on the etiology of COVID-19. It will also help individuals stay informed on this rapidly evolving situation and contribute to other global efforts to slow the spread of disease. In the past few weeks, several initiatives across the globe have surfaced to use daily self-reported symptoms as a means to track disease spread, predict outbreak locations, guide population measures and help in the allocation of healthcare resources. The aim of this paper is to put out a call to standardize these efforts and spark a collaborative effort to maximize the global gain while protecting participant privacy.
Computational social science: Obstacles and opportunities
David M. J. Lazer, Alex Pentland, Duncan J. Watts, Sinan Aral, Susan Athey, Noshir Contractor, Deen Freelon, Sandra Gonzalez-Bailon, Gary King, Helen Margetts, Alondra Nelson, Matthew J. Salganik, Markus Strohmaier, Alessandro Vespignani, and Claudia Wagner. 8/28/2020. “Computational social science: Obstacles and opportunities.” Science, 369, 6507, Pp. 1060-1062. Publisher's VersionAbstract
The field of computational social science (CSS) has exploded in prominence over the past decade, with thousands of papers published using observational data, experimental designs, and large-scale simulations that were once unfeasible or unavailable to researchers. These studies have greatly improved our understanding of important phenomena, ranging from social inequality to the spread of infectious diseases. The institutions supporting CSS in the academy have also grown substantially, as evidenced by the proliferation of conferences, workshops, and summer schools across the globe, across disciplines, and across sources of data. But the field has also fallen short in important ways. Many institutional structures around the field—including research ethics, pedagogy, and data infrastructure—are still nascent. We suggest opportunities to address these issues, especially in improving the alignment between the organization of the 20th-century university and the intellectual requirements of the field.
Do Nonpartisan Programmatic Policies Have Partisan Electoral Effects? Evidence from Two Large Scale Experiments
Kosuke Imai, Gary King, and Carlos Velasco Rivera. 1/31/2020. “Do Nonpartisan Programmatic Policies Have Partisan Electoral Effects? Evidence from Two Large Scale Experiments.” Journal of Politics, 81, 2, Pp. 714-730. Publisher's VersionAbstract

A vast literature demonstrates that voters around the world who benefit from their governments' discretionary spending cast more ballots for the incumbent party than those who do not benefit. But contrary to most theories of political accountability, some suggest that voters also reward incumbent parties for implementing "programmatic" spending legislation, over which incumbents have no discretion, and even when passed with support from all major parties. Why voters would attribute responsibility when none exists is unclear, as is why minority party legislators would approve of legislation that would cost them votes. We study the electoral effects of two large prominent programmatic policies that fit the ideal type especially well, with unusually large scale experiments that bring more evidence to bear on this question than has previously been possible. For the first policy, we design and implement ourselves one of the largest randomized social experiments ever. For the second policy, we reanalyze studies that used a large scale randomized experiment and a natural experiment to study the same question but came to opposite conclusions. Using corrected data and improved statistical methods, we show that the evidence from all analyses of both policies is consistent: programmatic policies have no effect on voter support for incumbents. We conclude by discussing how the many other studies in the literature may be interpreted in light of our results.

Population-scale Longitudinal Mapping of COVID-19 Symptoms, Behaviour and Testing
William E. Allen, Han Altae-Tran, James Briggs, Xin Jin, Glen McGee, Andy Shi, Rumya Raghavan, Mireille Kamariza, Nicole Nova, Albert Pereta, Chris Danford, Amine Kamel, Patrik Gothe, Evrhet Milam, Jean Aurambault, Thorben Primke, Weijie Li, Josh Inkenbrandt, Tuan Huynh, Evan Chen, Christina Lee, Michael Croatto, Helen Bentley, Wendy Lu, Robert Murray, Mark Travassos, Brent A. Coull, John Openshaw, Casey S. Greene, Ophir Shalem, Gary King, Ryan Probasco, David R. Cheng, Ben Silbermann, Feng Zhang, and Xihong Lin. 8/26/2020. “Population-scale Longitudinal Mapping of COVID-19 Symptoms, Behaviour and Testing.” Nature Human Behavior. Publisher's VersionAbstract
Despite the widespread implementation of public health measures, coronavirus disease 2019 (COVID-19) continues to spread in the United States. To facilitate an agile response to the pandemic, we developed How We Feel, a web and mobile application that collects longitudinal self-reported survey responses on health, behaviour and demographics. Here, we report results from over 500,000 users in the United States from 2 April 2020 to 12 May 2020. We show that self-reported surveys can be used to build predictive models to identify likely COVID-19-positive individuals. We find evidence among our users for asymptomatic or presymptomatic presentation; show a variety of exposure, occupational and demographic risk factors for COVID-19 beyond symptoms; reveal factors for which users have been SARS-CoV-2 PCR tested; and highlight the temporal dynamics of symptoms and self-isolation behaviour. These results highlight the utility of collecting a diverse set of symptomatic, demographic, exposure and behavioural self-reported data to fight the COVID-19 pandemic.
The “Math Prefresher” and The Collective Future of Political Science Graduate Training
Gary King, Shiro Kuriwaki, and Yon Soo Park. 2020. “The “Math Prefresher” and The Collective Future of Political Science Graduate Training.” PS: Political Science and Politics, 53, 3, Pp. 537-541. Publisher's VersionAbstract

The political science math prefresher arose a quarter century ago and has now spread to many of our discipline’s Ph.D. programs. Incoming students arrive for graduate school a few weeks early for ungraded instruction in math, statistics, and computer science as they are useful for political science. The prefresher’s benefits, however, go beyond the technical material taught: it develops lasting camaraderie with their entering class, facilitates connections with senior graduate students, opens pathways to mastering methods necessary for research, and eases the transition to the increasingly collaborative nature of graduate work. The prefresher also shows how faculty across a highly diverse discipline can work together to train the next generation. We review this program, highlight its collaborative aspects, and try to take the idea to the next level by building infrastructure to share teaching materials across universities so separate programs can build on each other’s work and improve all our programs.

Theoretical Foundations and Empirical Evaluations of Partisan Fairness in District-Based Democracies
Jonathan N. Katz, Gary King, and Elizabeth Rosenblatt. 2020. “Theoretical Foundations and Empirical Evaluations of Partisan Fairness in District-Based Democracies.” American Political Science Review, 114, 1, Pp. 164-178. Publisher's VersionAbstract
We clarify the theoretical foundations of partisan fairness standards for district-based democratic electoral systems, including essential assumptions and definitions that have not been recognized, formalized, or in some cases even discussed. We also offer extensive empirical evidence for assumptions with observable implications. Throughout, we follow a fundamental principle of statistical inference too often ignored in this literature -- defining the quantity of interest separately so its measures can be proven wrong, evaluated, or improved. This enables us to prove which of the many newly proposed fairness measures are statistically appropriate and which are biased, limited, or not measures of the theoretical quantity they seek to estimate at all. Because real world redistricting and gerrymandering involves complicated politics with numerous participants and conflicting goals, measures biased for partisan fairness sometimes still provide useful descriptions of other aspects of electoral systems.
Indaca
Gary King and Nathaniel Persily. 2019. “A New Model for Industry-Academic Partnerships.” PS: Political Science and Politics, 53, 4, Pp. 703-709. Publisher's VersionAbstract

The mission of the social sciences is to understand and ameliorate society’s greatest challenges. The data held by private companies, collected for different purposes, hold vast potential to further this mission. Yet, because of consumer privacy, trade secrets, proprietary content, and political sensitivities, these datasets are often inaccessible to scholars. We propose a novel organizational model to address these problems. We also report on the first partnership under this model, to study the incendiary issues surrounding the impact of social media on elections and democracy: Facebook provides (privacy-preserving) data access; eight ideologically and substantively diverse charitable foundations provide funding; an organization of academics we created, Social Science One (see SocialScience.One), leads the project; and the Institute for Quantitative Social Science at Harvard and the Social Science Research Council provide logistical help.

A Theory of Statistical Inference for Matching Methods in Causal Research
Stefano M. Iacus, Gary King, and Giuseppe Porro. 2019. “A Theory of Statistical Inference for Matching Methods in Causal Research.” Political Analysis, 27, 1, Pp. 46-68.Abstract

Researchers who generate data often optimize efficiency and robustness by choosing stratified over simple random sampling designs. Yet, all theories of inference proposed to justify matching methods are based on simple random sampling. This is all the more troubling because, although these theories require exact matching, most matching applications resort to some form of ex post stratification (on a propensity score, distance metric, or the covariates) to find approximate matches, thus nullifying the statistical properties these theories are designed to ensure. Fortunately, the type of sampling used in a theory of inference is an axiom, rather than an assumption vulnerable to being proven wrong, and so we can replace simple with stratified sampling, so long as we can show, as we do here, that the implications of the theory are coherent and remain true. Properties of estimators based on this theory are much easier to understand and can be satisfied without the unattractive properties of existing theories, such as assumptions hidden in data analyses rather than stated up front, asymptotics, unfamiliar estimators, and complex variance calculations. Our theory of inference makes it possible for researchers to treat matching as a simple form of preprocessing to reduce model dependence, after which all the familiar inferential techniques and uncertainty calculations can be applied. This theory also allows binary, multicategory, and continuous treatment variables from the outset and straightforward extensions for imperfect treatment assignment and different versions of treatments.