Working Paper

<p>A Unified Approach to Measurement Error and Missing Data: Overview</p>
Blackwell, Matthew, James Honaker, and Gary King. 2014.

A Unified Approach to Measurement Error and Missing Data: Overview

Although social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (MI) framework by treating missing data problems as a limiting special case of extreme measurement error, and corrects for both. Like MI, the proposed framework is a simple two-step procedure, so that in the second step researchers can use whatever statistical method they would have if there had been no problem in the first place. We also offer empirical illustrations, open source software that implements all the methods described herein, and a companion paper with technical details and extensions (Blackwell, Honaker, and King, 2014b).
Reverse Engineering Chinese Censorship through Randomized Experimentation and Participant Observation
King, Gary, Jennifer Pan, and Margaret Roberts. 2014. Reverse Engineering Chinese Censorship through Randomized Experimentation and Participant Observation.Abstract
Chinese government censorship of social media constitutes the largest coordinated selective suppression of human communication in recorded history. Although existing research on the subject has revealed a great deal, it is based on passive, observational methods, with well known inferential limitations. For example, these methods can reveal nothing about censorship that occurs before submissions are posted, such as via automated review which we show is used at two-thirds of all social media sites. We offer two approaches to overcome these limitations. For causal inferences, we conduct the first large scale experimental study of censorship by creating accounts on numerous social media sites spread throughout the country, submitting different randomly assigned types of social media texts, and detecting from a network of computers all over the world which types are censored. Then, for descriptive inferences, we supplement the current uncertain practice of conducting anonymous interviews with secret informants, by participant observation: we set up our own social media site in China, contract with Chinese firms to install the same censoring technologies as their existing sites, and -- with direct access to their software, documentation, and even customer service help desk support -- reverse engineer how it all works. Our results offer the first rigorous experimental support for the recent hypothesis that criticism of the state, its leaders, and their policies are routinely published, whereas posts about real world events with collective action potential are censored. We also extend the hypothesis by showing that it applies even to accusations of corruption by high-level officials and massive online-only protests, neither of which are censored. We also reveal for the first time the inner workings of the process of automated review, and as a result are able to reconcile conflicting accounts of keyword-based content filtering in the academic literature. We show that the Chinese government tolerates surprising levels of diversity in automated review technology, but still ensures a uniform outcome by post hoc censorship using huge numbers of human coders.
<p>The Balance-Sample Size Frontier in Matching Methods for Causal Inference</p>
King, Gary, Christopher Lucas, and Richard Nielsen. 2014.

The Balance-Sample Size Frontier in Matching Methods for Causal Inference

We propose a simplified approach to matching for causal inference that simultaneously optimizes both balance (between the treated and control groups) and matched sample size. This procedure resolves two widespread tensions in the use of this powerful and popular methodology. First, current practice is to run a matching method that maximizes one balance metric (such as a propensity score or average Mahalanobis distance), but then to check whether it succeeds with respect to a different balance metric for which it was not designed (such as differences in means or L1). Second, current matching methods either fix the sample size and maximize balance (e.g., Mahalanobis or propensity score matching), fix balance and maximize the sample size (such as coarsened exact matching), or are arbitrary compromises between the two (such as calipers with ad hoc thresholds applied to other methods). These tensions lead researchers to either try to optimize manually, by iteratively tweaking their matching method and rechecking balance, or settle for suboptimal solutions. We address these tensions by first defining and showing how to calculate the matching frontier as the set of matching solutions with maximum balance for each possible sample size. Researchers can then choose one, several, or all matching solutions from the frontier for analysis in one step without iteration. The main difficulty in this strategy is that checking all possible solutions is exponentially difficult. We solve this problem with new algorithms that finish fast, optimally, and without iteration or manual tweaking. We (will) also offer easy-to-use software that implements these ideas, along with several empirical applications.
<p>Google Flu Trends Still Appears Sick:&nbsp;An Evaluation of the 2013‐2014 Flu Season</p>
Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014.

Google Flu Trends Still Appears Sick: An Evaluation of the 2013‐2014 Flu Season

Last year was difficult for Google Flu Trends (GFT). In early 2013, Nature reported that GFT was estimating more than double the percentage of doctor visits for influenza like illness than the Centers for Disease Control and Prevention s (CDC) sentinel reports during the 2012 2013 flu season (1). Given that GFT was designed to forecast upcoming CDC reports, this was a problematic finding. In March 2014, our report in Science found that the overestimation problem in GFT was also present in the 2011 2012 flu season (2). The report also found strong evidence of autocorrelation and seasonality in the GFT errors, and presented evidence that the issues were likely, at least in part, due to modifications made by Google s search algorithm and the decision by GFT engineers not to use previous CDC reports or seasonality estimates in their models what the article labeled algorithm dynamics and big data hubris respectively. Moreover, the report and the supporting online materials detailed how difficult/impossible it is to replicate the GFT results, undermining independent efforts to explore the source of GFT errors and formulate improvements.
<p>A Unified Approach to Measurement Error and Missing&nbsp;Data: Details and Extensions</p>
Blackwell, Matthew, James Honaker, and Gary King. 2014.

A Unified Approach to Measurement Error and Missing Data: Details and Extensions

We extend a unified and easy-to-use approach to measurement error and missing data. Blackwell, Honaker, and King (2014a) gives an intuitive overview of the new technique, along with practical suggestions and empirical applications. Here, we offer more precise technical details; more sophisticated measurement error model specifications and estimation procedures; and analyses to assess the approach's robustness to correlated measurement errors and to errors in categorical variables. These results support using the technique to reduce bias and increase efficiency in a wide variety of empirical research.
How Robust Standard Errors Expose Methodological Problems They Do Not Fix
King, Gary, and Margaret Roberts. 2013. How Robust Standard Errors Expose Methodological Problems They Do Not Fix.Abstract
"Robust standard errors'" are used in a vast array of scholarship to correct standard errors for model misspecification. However, when misspecification is bad enough to make classical and robust standard errors diverge, assuming that it is nevertheless not so bad as to bias everything else requires considerable optimism. And even if the optimism is warranted, settling for a misspecified model, with or without robust standard errors, will still bias estimators of all but a few quantities of interest. Even though this message is well known to methodologists and has appeared in the literature in several forms, it has failed to reach most applied researchers. The resulting cavernous gap between theory and practice suggests that considerable gains in applied statistics may be possible. We seek to help applied researchers realize these gains via an alternative perspective that offers a productive way to use robust standard errors; a new general and easier-to-use information test statistic which is easier to apply appropriately; and practical illustrations via simulations and real examples from published research. Instead of jettisoning this extremely popular tool, as some suggest, we show how robust and classical standard error differences can provide effective clues about model misspecification, likely biases, and a guide to more reliable inferences.
How Coarsening Simplifies Matching-Based Causal Inference Theory
Iacus, Stefano M, and Gary King. 2012. How Coarsening Simplifies Matching-Based Causal Inference Theory.Abstract
The simplicity and power of matching methods have made them an increasingly popular approach to causal inference in observational data. Existing theories that justify these techniques are well developed but either require exact matching, which is usually infeasible in practice, or sacrifice some simplicity via asymptotic theory, specialized bias corrections, and novel variance estimators; and extensions to approximate matching with multicategory treatments have not yet appeared. As an alternative, we show how conceptualizing continuous variables as having logical breakpoints (such as phase transitions when measuring temperature or high school or college degrees in years of education) is both natural substantively and can be used to simplify causal inference theory. The result is a finite sample theory that is widely applicable, simple to understand, and easy to implement by using matching to preprocess the data, after which one can use whatever method would have been applied without matching. The theoretical simplicity also allows for binary, multicategory, and continuous treatment variables from the start and for extensions to valid inference under imperfect treatment assignment.
Comparative Effectiveness of Matching Methods for Causal Inference
King, Gary, Richard Nielsen, Carter Coberley, James E Pope, and Aaron Wells. 2011. Comparative Effectiveness of Matching Methods for Causal Inference.Abstract
Matching is an increasingly popular method of causal inference in observational data, but following methodological best practices has proven difficult for applied researchers. We address this problem by providing a simple graphical approach for choosing among the numerous possible matching solutions generated by three methods: the venerable ``Mahalanobis Distance Matching'' (MDM), the commonly used ``Propensity Score Matching'' (PSM), and a newer approach called ``Coarsened Exact Matching'' (CEM). In the process of using our approach, we also discover that PSM often approximates random matching, both in many real applications and in data simulated by the processes that fit PSM theory. Moreover, contrary to conventional wisdom, random matching is not benign: it (and thus PSM) can often degrade inferences relative to not matching at all. We find that MDM and CEM do not have this problem, and in practice CEM usually outperforms the other two approaches. However, with our comparative graphical approach and easy-to-follow procedures, focus can be on choosing a matching solution for a particular application, which is what may improve inferences, rather than the particular method used to generate it.
How Not to Lie Without Statistics
King, Gary, and Eleanor Neff Powell. 2008. How Not to Lie Without Statistics.Abstract
We highlight, and suggest ways to avoid, a large number of common misunderstandings in the literature about best practices in qualitative research. We discuss these issues in four areas: theory and data, qualitative and quantitative strategies, causation and explanation, and selection bias. Some of the misunderstandings involve incendiary debates within our discipline that are readily resolved either directly or with results known in research areas that happen to be unknown to political scientists. Many of these misunderstandings can also be found in quantitative research, often with different names, and some of which can be fixed with reference to ideas better understood in the qualitative methods literature. Our goal is to improve the ability of quantitatively and qualitatively oriented scholars to enjoy the advantages of insights from both areas. Thus, throughout, we attempt to construct specific practical guidelines that can be used to improve actual qualitative research designs, not only the qualitative methods literatures that talk about them.