Writings

Working Paper

Katherine Clayton, Yusaku Horiuchi, Aaron R. Kaufman, Gary King, and Mayya Komisarchik. Working Paper. “Correcting Measurement Error Bias in Conjoint Survey Experiments”.Abstract

Conjoint survey designs are spreading across the social sciences due to their unusual capacity to estimate many causal effects from a single randomized experiment. Unfortunately, by their ability to mirror complicated real-world choices, these designs often generate substantial measurement error and thus bias. We replicate both the data collection and analysis from eight prominent conjoint studies, all of which closely reproduce published results, and show that a large proportion of observed variation in answers to conjoint questions is effectively random noise. We then discover a common empirical pattern in how measurement error appears in conjoint studies and, with it, introduce an easy-to-use statistical method to correct the bias.

You may be interested in software (in progress) that implements all the suggestions in our paper: "Projoint: The One-Stop Conjoint Shop".

Paper

Supplementary Appendix

Danny Ebanks, Jonathan N. Katz, and Gary King. Working Paper. “How American Politics Ensures Electoral Accountability in Congress”.Abstract

An essential component of democracy is the ability to hold legislators accountable via the threat of electoral defeat, a concept that has rarely been quantified directly. Well known massive changes over time in indirect measures — such as incumbency advantage, electoral margins, partisan bias, partisan advantage, split-ticket voting, and others — all seem to imply wide swings in electoral accountability. In contrast, we show that the (precisely calibrated) probability of defeating incumbent US House members has been surprisingly constant and remarkably high for two-thirds of a century. We resolve this paradox with a generative statistical model of the full vote distribution to avoid biases induced by the common practice of studying only central tendencies, and validate it with extensive out-of-sample tests. We show that different states of the partisan battlefield lead in interestingly different ways to the same high probability of incumbent defeat. Many challenges to American democracy remain, but this core feature remains durable.

Paper

Supplementary Appendix

Danny Ebanks, Jonathan N. Katz, and Gary King. Working Paper. “If a Statistical Model Predicts That Common Events Should Occur Only Once in 10,000 Elections, Maybe it’s the Wrong Model”.Abstract

Political scientists forecast elections, not primarily to satisfy public interest, but to validate statistical models used for estimating many quantities of scholarly interest. Although scholars have learned a great deal from these models, they can be embarrassingly overconfident: Events that should occur once in 10,000 elections occur almost every year, and even those that should occur once in a trillion-trillion elections are sometimes observed. We develop a novel generative statistical model of US congressional elections 1954-2020 and validate it with extensive out-of-sample tests. The generatively accurate descriptive summaries provided by this model demonstrate that the 1950s was as partisan and differentiated as the current period, but with parties not based on ideological differences as they are today. The model also shows that even though the size of the incumbency advantage has varied tremendously over time, the risk of an in-party incumbent losing a midterm election contest has been high and essentially constant over at least the last two thirds of a century.

Please see "How American Politics Ensures Electoral Accountability in Congress," which supersedes this paper.

Paper

Supplementary Appendix

Natalie Ayers, Gary King, Zagreb Mukerjee, and Dominic Skinnion. Working Paper. “Statistical Intuition Without Coding (or Teachers)”.Abstract

Two features of quantitative political methodology make teaching and learning especially difficult: (1) Each new concept in probability, statistics, and inference builds on all previous (and sometimes all other relevant) concepts; and (2) motivating substantively oriented students, by teaching these abstract theories simultaneously with the practical details of a statistical programming language (such as R), makes learning each subject harder. We address both problems through a new type of automated teaching tool that helps students see the big theoretical picture and all its separate parts at the same time without having to simultaneously learn to program. This tool, which we make available via one click in a web browser, can be used in a traditional methods class, but is also designed to work without instructor supervision.

Paper

Georgina Evans and Gary King. Working Paper. “Statistically Valid Inferences from Differentially Private Data Releases, II: Extensions to Nonlinear Transformations”.Abstract

We extend Evans and King (Forthcoming, 2021) to nonlinear transformations, using proportions and weighted averages as our running examples.

Paper

Forthcoming

Georgina Evans, Gary King, Adam D. Smith, and Abhradeep Thakurta. Forthcoming. “Differentially Private Survey Research.” American Journal of Political Science.Abstract

Survey researchers have long sought to protect the privacy of their respondents via de-identification (removing names and other directly identifying information) before sharing data. Although these procedures can help, recent research demonstrates that they fail to protect respondents from intentional re-identification attacks, a problem that threatens to undermine vast survey enterprises in academia, government, and industry. This is especially a problem in political science because political beliefs are not merely the subject of our scholarship; they represent some of the most important information respondents want to keep private. We confirm the problem in practice by re-identifying individuals from a survey about a controversial referendum declaring life beginning at conception. We build on the concept of "differential privacy" to offer new data sharing procedures with mathematical guarantees for protecting respondent privacy and statistical validity guarantees for social scientists analyzing differentially private data. The cost of these new procedures is larger standard errors, which can be overcome with somewhat larger sample sizes.

Paper

Supplementary Appendix

Jonathan Katz, Gary King, and Elizabeth Rosenblatt. Forthcoming. “The Essential Role of Statistical Inference in Evaluating Electoral Systems: A Response to DeFord et al.” Political Analysis.Abstract

Katz, King, and Rosenblatt (2020) introduces a theoretical framework for understanding redistricting and electoral systems, built on basic statistical and social science principles of inference. DeFord et al. (Forthcoming, 2021) instead focuses solely on descriptive measures, which lead to the problems identified in our arti- cle. In this paper, we illustrate the essential role of these basic principles and then offer statistical, mathematical, and substantive corrections required to apply DeFord et al.’s calculations to social science questions of interest, while also showing how to easily resolve all claimed paradoxes and problems. We are grateful to the authors for their interest in our work and for this opportunity to clarify these principles and our theoretical framework.

Article

Georgina Evans, Gary King, Margaret Schwenzfeier, and Abhradeep Thakurta. Forthcoming. “Statistically Valid Inferences from Privacy Protected Data.” American Political Science Review. Publisher's Version Abstract

Unprecedented quantities of data that could help social scientists understand and ameliorate the challenges of human society are presently locked away inside companies, governments, and other organizations, in part because of privacy concerns. We address this problem with a general-purpose data access and analysis system with mathematical guarantees of privacy for research subjects, and statistical validity guarantees for researchers seeking social science insights. We build on the standard of ``differential privacy,'' correct for biases induced by the privacy-preserving procedures, provide a proper accounting of uncertainty, and impose minimal constraints on the choice of statistical methods and quantities estimated. We also replicate two recent published articles and show how we can obtain approximately the same substantive results while simultaneously protecting the privacy. Our approach is simple to use and computationally efficient; we also offer open source software that implements all our methods.

Article

Supplementary Appendix

2023

Zachary J. Ward, Rifat Atun, Gary King, Brenda Sequeira Dmello, and Sue J. Goldie. 4/20/2023. “A simulation-based comparative effectiveness analysis of policies to improve global maternal health outcomes.” Nature Medicne. Publisher's Version Abstract

The Sustainable Development Goals include a target to reduce the global maternal mortality ratio (MMR) to less than 70 maternal deaths per 100,000 live births by 2030, with no individual country exceeding 140. However, on current trends the goals are unlikely to be met. We used the empirically calibrated Global Maternal Health microsimulation model, which simulates individual women in 200 countries and territories to evaluate the impact of different interventions and strategies from 2022 to 2030. Although individual interventions yielded fairly small reductions in maternal mortality, integrated strategies were more effective. A strategy to simultaneously increase facility births, improve the availability of clinical services and quality of care at facilities, and improve linkages to care would yield a projected global MMR of 72 (95% uncertainty interval (UI) = 58–87) in 2030. A comprehensive strategy adding family planning and community-based interventions would have an even larger impact, with a projected MMR of 58 (95% UI = 46–70). Although integrated strategies consisting of multiple interventions will probably be needed to achieve substantial reductions in maternal mortality, the relative priority of different interventions varies by setting. Our regional and country-level estimates can help guide priority setting in specific contexts to accelerate improvements in maternal health.

Article

Zachary J. Ward, Rifat Atun, Gary King, Brenda Sequeira Dmello, and Sue J. Goldie. 4/20/2023. “Simulation-based estimates and projections of global, regional and country-level maternal mortality by cause, 1990–2050.” Nature Medicine. Publisher's Version Abstract

Maternal mortality is a major global health challenge. Although progress has been made globally in reducing maternal deaths, measurement remains challenging given the many causes and frequent underreporting of maternal deaths. We developed the Global Maternal Health microsimulation model for women in 200 countries and territories, accounting for individual fertility preferences and clinical histories. Demographic, epidemiologic, clinical and health system data were synthesized from multiple sources, including the medical literature, Civil Registration Vital Statistics systems and Demographic and Health Survey data. We calibrated the model to empirical data from 1990 to 2015 and assessed the predictive accuracy of our model using indicators from 2016 to 2020. We projected maternal health indicators from 1990 to 2050 for each country and estimate that between 1990 and 2020 annual global maternal deaths declined by over 40% from 587,500 (95% uncertainty intervals (UI) 520,600–714,000) to 337,600 (95% UI 307,900–364,100), and are projected to decrease to 327,400 (95% UI 287,800–360,700) in 2030 and 320,200 (95% UI 267,100–374,600) in 2050. The global maternal mortality ratio is projected to decline to 167 (95% UI 142–188) in 2030, with 58 countries above 140, suggesting that on current trends, maternal mortality Sustainable Development Goal targets are unlikely to be met. Building on the development of our structural model, future research can identify context-specific policy interventions that could allow countries to accelerate reductions in maternal deaths.

Article

Georgina Evans and Gary King. 2023. “Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset.” Political Analysis, 31, 1, Pp. 1-21. Publisher's Version Abstract

We offer methods to analyze the "differentially private" Facebook URLs Dataset which, at over 40 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it possible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias -- including attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statistically valid linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of non-confidential data but with appropriately larger standard errors.

We have implemented these methods in open source software for R called PrivacyUnbiased. Facebook has ported PrivacyUnbiased to open source Python code called svinfer. We have extended these results in Evans and King (2021).

Article

2022

Ian Ayres, Richard A. Berk, Richard R.W. Brooks, Daniel E. Ho, Gary King, Kevin Quinn, Donald B. Rubin, and Sherod Thaxton. 2022. “Brief of Empirical Scholars as Amici Curiae in Support of Respondents.” Filed with the Supreme Court of the United States in Students for Fair Admissions v. President and Fellows of Harvard College.Abstract

Amici curiae are leaders in the field of quantitative social science and statistical methodology. Amici submit this brief to point out the substantial methodological flaws in the “mismatch” research discussed in the Brief for Richard Sander as Amicus Curiae in Support of Petitioner. Professor Sander’s mismatch hypothesis is unsupported and based on work that fails to adhere to basic tenets of research design.

AmiciBrief.pdf

Connor T. Jerzak, Gary King, and Anton Strezhnev. 2022. “An Improved Method of Automated Nonparametric Content Analysis for Social Science.” Political Analysis, 31, Pp. 42-58.Abstract

Some scholars build models to classify documents into chosen categories. Others, especially social scientists who tend to focus on population characteristics, instead usually estimate the proportion of documents in each category -- using either parametric "classify-and-count" methods or "direct" nonparametric estimation of proportions without individual classification. Unfortunately, classify-and-count methods can be highly model dependent or generate more bias in the proportions even as the percent of documents correctly classified increases. Direct estimation avoids these problems, but can suffer when the meaning of language changes between training and test sets or is too similar across categories. We develop an improved direct estimation approach without these issues by including and optimizing continuous text features, along with a form of matching adapted from the causal inference literature. Our approach substantially improves performance in a diverse collection of 73 data sets. We also offer easy-to-use software software that implements all ideas discussed herein.

Article

Supplementary Appendix

Jonathan Katz, Gary King, and Elizabeth Rosenblatt. 2022. “Rejoinder: Concluding Remarks on Scholarly Communications.” Political Analysis.Abstract

We are grateful to DeFord et al. for the continued attention to our work and the crucial issues of fair representation in democratic electoral systems. Our response (Katz, King, and Rosenblatt, forthcoming) was designed to help readers avoid being misled by mistaken claims in DeFord et al. (forthcoming-a), and does not address other literature or uses of our prior work. As it happens, none of our corrections were addressed (or contradicted) in the most recent submission (DeFord et al., forthcoming-b).

We also offer a recommendation regarding DeFord et al.’s (forthcoming-b) concern with how expert witnesses, consultants, and commentators should present academic scholarship to academic novices, such as judges, public officials, the media, and the general public. In these public service roles, scholars attempt to translate academic understanding of sophisticated scholarly literatures, technical methodologies, and complex theories for those without sufficient background in social science or statistics.

Article

2021

Cynthia Dwork, Ruth Greenwood, and Gary King. 8/12/2021. “Letter to US Census Bureau: "Request for release of “noisy measurements file” by September 30 along with redistricting data products"”.Abstract

A letter, submitted on behalf of a large group of expert signatories, to request the release of the “noisy measurements file” and other redistricting data by September 30, 2021. This includes the data created by the Bureau in preparing its differentially private data release, without their unnecessary (and, in many important situations, information destroying) post-processing.

Letter

Cynthia Dwork, Ruth Greenwood, and Gary King. 7/26/2021. “There’s a simple solution to the latest census fight.” Boston Globe, Pp. A9. Publisher's Version Abstract

We offer a solution to debates over the use of differential privacy in releasing US Census Data.

Article

Gary King, Brian Lukoff, and Eric Mazur. 2/16/2021. “Cluster Analysis of Participant Responses for Test Generation or Teaching (2nd).” United States of America US 10,922,991 B2 (U.S Patent and Trademark Office).Abstract

Textual responses to open-ended (i.e., free-response) items provided by participants (e.g., by means of mobile wireless devices) are automatically classified, enabling an instructor to assess the responses in a convenient, organized fashion and adjust instruction accordingly.

patent

Gary King, Brian Lukoff, and Eric Mazur. 1/26/2021. “Participant Grouping for Enhanced Interactive Experience (4th).” United States of America 10,902,031 B2 (U.S Patent and Trademark Office).Abstract

Representative embodiments of a method for grouping participants in an activity include the steps of: (i) defining a grouping policy; (ii) storing, in a database, participant records that include a participant identifier, a characteristic associated with the participant, and/or an identifier for a participant's handheld device; (iii) defining groupings based on the policy and characteristics of the participants relating to the policy and to the activity; and (iv) communicating the groupings to the handheld devices to establish the groups.

Patent

Georgina Evans, Gary King, Margaret Schwenzfeier, and Abhradeep Thakurta. 2021. “UnbiasedPrivacy”.

Gary King, Robert O. Keohane, and Sidney Verba. 2021. Designing Social Inquiry: Scientific Inference in Qualitative Research, New Edition. 2nd ed. Princeton: Princeton University Press. Publisher's Version Abstract

"The classic work on qualitative methods in political science"

Designing Social Inquiry presents a unified approach to qualitative and quantitative research in political science, showing how the same logic of inference underlies both. This stimulating book discusses issues related to framing research questions, measuring the accuracy of data and the uncertainty of empirical inferences, discovering causal effects, and getting the most out of qualitative research. It addresses topics such as interpretation and inference, comparative case studies, constructing causal theories, dependent and explanatory variables, the limits of random selection, selection bias, and errors in measurement. The book only uses mathematical notation to clarify concepts, and assumes no prior knowledge of mathematics or statistics.

Featuring a new preface by Robert O. Keohane and Gary King, this edition makes an influential work available to new generations of qualitative researchers in the social sciences.

Writings

Pages

Publications By Type

Publications By Year

c05432e852e7f3fbb2c56fc04411b732