Journal Article

Georgina Evans, Gary King, Adam D. Smith, and Abhradeep Thakurta. Forthcoming. “Differentially Private Survey Research.” American Journal of Political Science.Abstract

Paper

Supplementary Appendix

Survey researchers have long sought to protect the privacy of their respondents via de-identification (removing names and other directly identifying information) before sharing data. Although these procedures can help, recent research demonstrates that they fail to protect respondents from intentional re-identification attacks, a problem that threatens to undermine vast survey enterprises in academia, government, and industry. This is especially a problem in political science because political beliefs are not merely the subject of our scholarship; they represent some of the most important information respondents want to keep private. We confirm the problem in practice by re-identifying individuals from a survey about a controversial referendum declaring life beginning at conception. We build on the concept of "differential privacy" to offer new data sharing procedures with mathematical guarantees for protecting respondent privacy and statistical validity guarantees for social scientists analyzing differentially private data. The cost of these new procedures is larger standard errors, which can be overcome with somewhat larger sample sizes.

Georgina Evans, Gary King, Margaret Schwenzfeier, and Abhradeep Thakurta. Forthcoming. “Statistically Valid Inferences from Privacy Protected Data.” American Political Science Review. Publisher's Version Abstract

Article

Supplementary Appendix

Unprecedented quantities of data that could help social scientists understand and ameliorate the challenges of human society are presently locked away inside companies, governments, and other organizations, in part because of privacy concerns. We address this problem with a general-purpose data access and analysis system with mathematical guarantees of privacy for research subjects, and statistical validity guarantees for researchers seeking social science insights. We build on the standard of ``differential privacy,'' correct for biases induced by the privacy-preserving procedures, provide a proper accounting of uncertainty, and impose minimal constraints on the choice of statistical methods and quantities estimated. We also replicate two recent published articles and show how we can obtain approximately the same substantive results while simultaneously protecting the privacy. Our approach is simple to use and computationally efficient; we also offer open source software that implements all our methods.

Jonathan Katz, Gary King, and Elizabeth Rosenblatt. Forthcoming. “The Essential Role of Statistical Inference in Evaluating Electoral Systems: A Response to DeFord et al.” Political Analysis.Abstract

Article

Katz, King, and Rosenblatt (2020) introduces a theoretical framework for understanding redistricting and electoral systems, built on basic statistical and social science principles of inference. DeFord et al. (Forthcoming, 2021) instead focuses solely on descriptive measures, which lead to the problems identified in our arti- cle. In this paper, we illustrate the essential role of these basic principles and then offer statistical, mathematical, and substantive corrections required to apply DeFord et al.’s calculations to social science questions of interest, while also showing how to easily resolve all claimed paradoxes and problems. We are grateful to the authors for their interest in our work and for this opportunity to clarify these principles and our theoretical framework.

Zachary J. Ward, Rifat Atun, Gary King, Brenda Sequeira Dmello, and Sue J. Goldie. 4/20/2023. “A simulation-based comparative effectiveness analysis of policies to improve global maternal health outcomes.” Nature Medicne. Publisher's Version Abstract

Article

The Sustainable Development Goals include a target to reduce the global maternal mortality ratio (MMR) to less than 70 maternal deaths per 100,000 live births by 2030, with no individual country exceeding 140. However, on current trends the goals are unlikely to be met. We used the empirically calibrated Global Maternal Health microsimulation model, which simulates individual women in 200 countries and territories to evaluate the impact of different interventions and strategies from 2022 to 2030. Although individual interventions yielded fairly small reductions in maternal mortality, integrated strategies were more effective. A strategy to simultaneously increase facility births, improve the availability of clinical services and quality of care at facilities, and improve linkages to care would yield a projected global MMR of 72 (95% uncertainty interval (UI) = 58–87) in 2030. A comprehensive strategy adding family planning and community-based interventions would have an even larger impact, with a projected MMR of 58 (95% UI = 46–70). Although integrated strategies consisting of multiple interventions will probably be needed to achieve substantial reductions in maternal mortality, the relative priority of different interventions varies by setting. Our regional and country-level estimates can help guide priority setting in specific contexts to accelerate improvements in maternal health.

Zachary J. Ward, Rifat Atun, Gary King, Brenda Sequeira Dmello, and Sue J. Goldie. 4/20/2023. “Simulation-based estimates and projections of global, regional and country-level maternal mortality by cause, 1990–2050.” Nature Medicine. Publisher's Version Abstract

Article

Maternal mortality is a major global health challenge. Although progress has been made globally in reducing maternal deaths, measurement remains challenging given the many causes and frequent underreporting of maternal deaths. We developed the Global Maternal Health microsimulation model for women in 200 countries and territories, accounting for individual fertility preferences and clinical histories. Demographic, epidemiologic, clinical and health system data were synthesized from multiple sources, including the medical literature, Civil Registration Vital Statistics systems and Demographic and Health Survey data. We calibrated the model to empirical data from 1990 to 2015 and assessed the predictive accuracy of our model using indicators from 2016 to 2020. We projected maternal health indicators from 1990 to 2050 for each country and estimate that between 1990 and 2020 annual global maternal deaths declined by over 40% from 587,500 (95% uncertainty intervals (UI) 520,600–714,000) to 337,600 (95% UI 307,900–364,100), and are projected to decrease to 327,400 (95% UI 287,800–360,700) in 2030 and 320,200 (95% UI 267,100–374,600) in 2050. The global maternal mortality ratio is projected to decline to 167 (95% UI 142–188) in 2030, with 58 countries above 140, suggesting that on current trends, maternal mortality Sustainable Development Goal targets are unlikely to be met. Building on the development of our structural model, future research can identify context-specific policy interventions that could allow countries to accelerate reductions in maternal deaths.

Georgina Evans and Gary King. 2023. “Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset.” Political Analysis, 31, 1, Pp. 1-21. Publisher's Version Abstract

Article

We offer methods to analyze the "differentially private" Facebook URLs Dataset which, at over 40 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it possible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias -- including attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statistically valid linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of non-confidential data but with appropriately larger standard errors.

We have implemented these methods in open source software for R called PrivacyUnbiased. Facebook has ported PrivacyUnbiased to open source Python code called svinfer. We have extended these results in Evans and King (2021).

Connor T. Jerzak, Gary King, and Anton Strezhnev. 2022. “An Improved Method of Automated Nonparametric Content Analysis for Social Science.” Political Analysis, 31, Pp. 42-58.Abstract

Article

Supplementary Appendix

Some scholars build models to classify documents into chosen categories. Others, especially social scientists who tend to focus on population characteristics, instead usually estimate the proportion of documents in each category -- using either parametric "classify-and-count" methods or "direct" nonparametric estimation of proportions without individual classification. Unfortunately, classify-and-count methods can be highly model dependent or generate more bias in the proportions even as the percent of documents correctly classified increases. Direct estimation avoids these problems, but can suffer when the meaning of language changes between training and test sets or is too similar across categories. We develop an improved direct estimation approach without these issues by including and optimizing continuous text features, along with a form of matching adapted from the causal inference literature. Our approach substantially improves performance in a diverse collection of 73 data sets. We also offer easy-to-use software software that implements all ideas discussed herein.

Jonathan Katz, Gary King, and Elizabeth Rosenblatt. 2022. “Rejoinder: Concluding Remarks on Scholarly Communications.” Political Analysis.Abstract

Article

We are grateful to DeFord et al. for the continued attention to our work and the crucial issues of fair representation in democratic electoral systems. Our response (Katz, King, and Rosenblatt, forthcoming) was designed to help readers avoid being misled by mistaken claims in DeFord et al. (forthcoming-a), and does not address other literature or uses of our prior work. As it happens, none of our corrections were addressed (or contradicted) in the most recent submission (DeFord et al., forthcoming-b).

We also offer a recommendation regarding DeFord et al.’s (forthcoming-b) concern with how expert witnesses, consultants, and commentators should present academic scholarship to academic novices, such as judges, public officials, the media, and the general public. In these public service roles, scholars attempt to translate academic understanding of sophisticated scholarly literatures, technical methodologies, and complex theories for those without sufficient background in social science or statistics.

Beau Coker, Cynthia Rudin, and Gary King. 2021. “A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results.” Management Science, Pp. 1-24. Publisher's Version Abstract

Article

Inference is the process of using facts we know to learn about facts we do not know. A theory of inference gives assumptions necessary to get from the former to the latter, along with a definition for and summary of the resulting uncertainty. Any one theory of inference is neither right nor wrong, but merely an axiom that may or may not be useful. Each of the many diverse theories of inference can be valuable for certain applications. However, no existing theory of inference addresses the tendency to choose, from the range of plausible data analysis specifications consistent with prior evidence, those that inadvertently favor one's own hypotheses. Since the biases from these choices are a growing concern across scientific fields, and in a sense the reason the scientific community was invented in the first place, we introduce a new theory of inference designed to address this critical problem. We derive "hacking intervals," which are the range of a summary statistic one may obtain given a class of possible endogenous manipulations of the data. Hacking intervals require no appeal to hypothetical data sets drawn from imaginary superpopulations. A scientific result with a small hacking interval is more robust to researcher manipulation than one with a larger interval, and is often easier to interpret than a classical confidence interval. Some versions of hacking intervals turn out to be equivalent to classical confidence intervals, which means they may also provide a more intuitive and potentially more useful interpretation of classical confidence intervals.

Aaron Kaufman, Gary King, and Mayya Komisarchik. 2021. “How to Measure Legislative District Compactness If You Only Know it When You See It.” American Journal of Political Science, 65, 3, Pp. 533-550. Publisher's Version Abstract

Article

Supplementary Appendix

To deter gerrymandering, many state constitutions require legislative districts to be "compact." Yet, the law offers few precise definitions other than "you know it when you see it," which effectively implies a common understanding of the concept. In contrast, academics have shown that compactness has multiple dimensions and have generated many conflicting measures. We hypothesize that both are correct -- that compactness is complex and multidimensional, but a common understanding exists across people. We develop a survey to elicit this understanding, with high reliability (in data where the standard paired comparisons approach fails). We create a statistical model that predicts, with high accuracy, solely from the geometric features of the district, compactness evaluations by judges and public officials responsible for redistricting, among others. We also offer compactness data from our validated measure for 20,160 state legislative and congressional districts, as well as open source software to compute this measure from any district.

Winner of the 2018 Robert H Durr Award from the MPSA.

Journal Article

Publications By Type

Publications By Year

c05432e852e7f3fbb2c56fc04411b732