Journal Article

<p>A Unified Approach to Measurement Error and MissingData: Details and Extensions</p>
Blackwell, Matthew, James Honaker, and Gary King. In Press.

A Unified Approach to Measurement Error and MissingData: Details and Extensions

, Sociological Methods and Research.Abstract
We extend a unified and easy-to-use approach to measurement error and missing data. Blackwell, Honaker, and King (2014a) gives an intuitive overview of the new technique, along with practical suggestions and empirical applications. Here, we offer more precise technical details; more sophisticated measurement error model specifications and estimation procedures; and analyses to assess the approach's robustness to correlated measurement errors and to errors in categorical variables. These results support using the technique to reduce bias and increase efficiency in a wide variety of empirical research.
<p>A Unified Approach to Measurement Error and Missing Data: Overview</p>
Blackwell, Matthew, James Honaker, and Gary King. In Press.

A Unified Approach to Measurement Error and Missing Data: Overview

, Sociological Methods and Research.Abstract
Although social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (MI) framework by treating missing data problems as a limiting special case of extreme measurement error, and corrects for both. Like MI, the proposed framework is a simple two-step procedure, so that in the second step researchers can use whatever statistical method they would have if there had been no problem in the first place. We also offer empirical illustrations, open source software that implements all the methods described herein, and a companion paper with technical details and extensions (Blackwell, Honaker, and King, 2014b).
<p>The Parable of Google Flu: Traps in Big Data Analysis</p>
Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014.

The Parable of Google Flu: Traps in Big Data Analysis

, Science 343, no. 14 March: 1203-1205.Abstract
Large errors in flu prediction were largely avoidable, which offers lessons for the use of big data. In February 2013, Google Flu Trends (GFT) made headlines but not for a reason that Google executives or the creators of the flu tracking system would have hoped. Nature reported that GFT was predicting more than double the proportion of doctor visits for influenza-like illness (ILI) than the Centers for Disease Control and Prevention (CDC), which bases its estimates on surveillance reports from laboratories across the United States ( 1, 2). This happened despite the fact that GFT was built to predict CDC reports. Given that GFT is often held up as an exemplary use of big data ( 3, 4), what lessons can we draw from this error?
<p>How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It</p>
King, Gary, and Margaret E Roberts. 2014.

How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It

, Political Analysis: 1-21.Abstract
"Robust standard errors'" are used in a vast array of scholarship to correct standard errors for model misspecification. However, when misspecification is bad enough to make classical and robust standard errors diverge, assuming that it is nevertheless not so bad as to bias everything else requires considerable optimism. And even if the optimism is warranted, settling for a misspecified model, with or without robust standard errors, will still bias estimators of all but a few quantities of interest. The resulting cavernous gap between theory and practice suggests that considerable gains in applied statistics may be possible. We seek to help researchers realize these gains via a more productive way to understand and use robust standard errors; a new general and easier-to-use "generalized information matrix test" statistic that can formally assess misspecification (based on differences between robust and classical variance estimates); and practical illustrations via simulations and real examples from published research. How robust standard errors are used needs to change, but instead of jettisoning this popular tool we show how to use it to provide effective clues about model misspecification, likely biases, and a guide to considerably more reliable, and defensible, inferences. Accompanying this article [soon!] is software that implements the methods we describe. 
<p>Automating Open Science for Big Data</p>
Crosas, Merce, James Honaker, Gary King, and Latanya Sweeney. 2014.

Automating Open Science for Big Data

, ANNALS of the American Academy of Political and Social Science.Abstract
The vast majority of social science research presently uses small (MB or GB scale) data sets. These fixed-scale data sets are commonly downloaded to the researcher's computer where the analysis is performed locally, and are often shared and cited with well-established technologies, such as the Dataverse Project (see, to support the published results.  The trend towards Big Data -- including large scale streaming data -- is starting to transform research and has the potential to impact policy-making and our understanding of the social, economic, and political problems that affect human societies.  However, this research poses new challenges in execution, accountability, preservation, reuse, and reproducibility. Downloading these data sets to a researcher’s computer is infeasible or not practical; hence, analyses take place in the cloud, require unusual expertise, and benefit from collaborative teamwork and novel tool development. The advantage of these data sets in how informative they are also means that they are much more likely to contain highly sensitive personally identifiable information. In this paper, we discuss solutions to these new challenges so that the social sciences can realize the potential of Big Data.
Restructuring the Social Sciences: Reflections from Harvard’s Institute for Quantitative Social Science
King, Gary. 2014. Restructuring the Social Sciences: Reflections from Harvard’s Institute for Quantitative Social Science, PS: Political Science and Politics 47, no. 1: 165-172. Cambridge University Press versionAbstract
The social sciences are undergoing a dramatic transformation from studying problems to solving them; from making do with a small number of sparse data sets to analyzing increasing quantities of diverse, highly informative data; from isolated scholars toiling away on their own to larger scale, collaborative, interdisciplinary, lab-style research teams; and from a purely academic pursuit to having a major impact on the world. To facilitate these important developments, universities, funding agencies, and governments need to shore up and adapt the infrastructure that supports social science research. We discuss some of these developments here, as well as a new type of organization we created at Harvard to help encourage them -- the Institute for Quantitative Social Science.  An increasing number of universities are beginning efforts to respond with similar institutions. This paper provides some suggestions for how individual universities might respond and how we might work together to advance social science more generally.
<p>Reverse-engineering censorship in China: Randomized experimentation and participant observation</p>
King, Gary, Jennifer Pan, and Margaret E Roberts. 2014.

Reverse-engineering censorship in China: Randomized experimentation and participant observation

, Science 345, no. 6199: 1-10. Publisher's VersionAbstract
Existing research on the extensive Chinese censorship organization uses observational methods with well-known limitations. We conducted the first large-scale experimental study of censorship by creating accounts on numerous social media sites, randomly submitting different texts, and observing from a worldwide network of computers which texts were censored and which were not. We also supplemented interviews with confidential sources by creating our own social media site, contracting with Chinese firms to install the same censoring technologies as existing sites, and—with their software, documentation, and even customer support—reverse-engineering how it all works. Our results offer rigorous support for the recent hypothesis that criticisms of the state, its leaders, and their policies are published, whereas posts about real-world events with collective action potential are censored.
The Troubled Future of Colleges and Universities (with comments from five scholar-administrators)
King, Gary, and Maya Sen. 2013. The Troubled Future of Colleges and Universities (with comments from five scholar-administrators), PS: Political Science and Politics 46, no. 1: 81--113.Abstract
The American system of higher education is under attack by political, economic, and educational forces that threaten to undermine its business model, governmental support, and operating mission. The potential changes are considerably more dramatic and disruptive than what we've already experienced. Traditional colleges and universities urgently need a coherent, thought-out response. Their central role in ensuring the creation, preservation, and distribution of knowledge may be at risk and, as a consequence, so too may be the spectacular progress across fields we have come to expect as a result. Symposium contributors include Henry E. Brady, John Mark Hansen, Gary King, Nannerl O. Keohane, Michael Laver, Virginia Sapiro, and Maya Sen.
How Social Science Research Can Improve Teaching
King, Gary, and Maya Sen. 2013. How Social Science Research Can Improve Teaching, PS: Political Science and Politics 46, no. 3: 621-629.Abstract
We marshal discoveries about human behavior and learning from social science research and show how they can be used to improve teaching and learning. The discoveries are easily stated as three social science generalizations: (1) social connections motivate, (2) teaching teaches the teacher, and (3) instant feedback improves learning. We show how to apply these generalizations via innovations in modern information technology inside, outside, and across university classrooms. We also give concrete examples of these ideas from innovations we have experimented with in our own teaching. See also a video presentation of this talk before the Harvard Board of Overseers
How Censorship in China Allows Government Criticism but Silences Collective Expression
King, Gary, Jennifer Pan, and Margaret E Roberts. 2013. How Censorship in China Allows Government Criticism but Silences Collective Expression, American Political Science Review 107, no. 2 (May): 1-18.Abstract
We offer the first large scale, multiple source analysis of the outcome of what may be the most extensive effort to selectively censor human expression ever implemented. To do this, we have devised a system to locate, download, and analyze the content of millions of social media posts originating from nearly 1,400 different social media services all over China before the Chinese government is able to find, evaluate, and censor (i.e., remove from the Internet) the large subset they deem objectionable. Using modern computer-assisted text analytic methods that we adapt to and validate in the Chinese language, we compare the substantive content of posts censored to those not censored over time in each of 85 topic areas. Contrary to previous understandings, posts with negative, even vitriolic, criticism of the state, its leaders, and its policies are not more likely to be censored. Instead, we show that the censorship program is aimed at curtailing collective action by silencing comments that represent, reinforce, or spur social mobilization, regardless of content. Censorship is oriented toward attempting to forestall collective activities that are occurring now or may occur in the future --- and, as such, seem to clearly expose government intent.