Unifying Approaches to Statistical Analysis
See our original papers: "The Parable of Google Flu: Traps in Big Data Analysis," and "Google Flu Trends Still Appears Sick: An Evaluation of the 2013‐2014 Flu Season"
See our original paper, "The Parable of Google Flu: Traps in Big Data Analysis"
In February 2013, Google Flu Trends (GFT) made headlines but not for a reason that Google executives or the creators of the flu tracking system would have hoped. Nature reported that GFT was predicting more than double the proportion of doctor visits for influenza-like illness (ILI) than the Centers for Disease Control and Prevention (CDC), which bases its estimates on surveillance reports from laboratories across the United States ( 1, 2). This happened despite the fact that GFT was built to predict CDC reports. Given that GFT is often held up as an exemplary use of big data ( 3, 4), what lessons can we draw from this error?
See also "Google Flu Trends Still Appears Sick: An Evaluation of the 2013‐2014 Flu Season"
"Robust standard errors" are used in a vast array of scholarship to correct standard errors for model misspecification. However, when misspecification is bad enough to make classical and robust standard errors diverge, assuming that it is nevertheless not so bad as to bias everything else requires considerable optimism. And even if the optimism is warranted, settling for a misspecified model, with or without robust standard errors, will still bias estimators of all but a few quantities of interest. The resulting cavernous gap between theory and practice suggests that considerable gains in applied statistics may be possible. We seek to help researchers realize these gains via a more productive way to understand and use robust standard errors; a new general and easier-to-use "generalized information matrix test" statistic that can formally assess misspecification (based on differences between robust and classical variance estimates); and practical illustrations via simulations and real examples from published research. How robust standard errors are used needs to change, but instead of jettisoning this popular tool we show how to use it to provide effective clues about model misspecification, likely biases, and a guide to considerably more reliable, and defensible, inferences. Accompanying this article is open source software that implements the methods we describe.
Whenever we report predicted values, we should also report some measure of the uncertainty of these estimates. In the linear case, this is relatively simple, and the answer well-known, but with nonlinear models the answer may not be apparent. This short article shows how to make these calculations. I first present this for the familiar linear case, also reviewing the two forms of uncertainty in these estimates, and then show how to calculate these for any arbitrary function. An example appears last.
This is a set of easy-to-use Stata macros that implement the techniques described in Gary King, Michael Tomz, and Jason Wittenberg's "Making the Most of Statistical Analyses: Improving Interpretation and Presentation". To install Clarify, type "net from https://gking.harvard.edu/clarify (https://gking.harvard.edu/clarify)" at the Stata command line.
Winner of the Okidata Best Research Software Award. Also try -ssc install qsim- to install a wrapper, donated by Fred Wolfe, to automate Clarify's simulation of dummy variables.
This (two-page) article argues that the evidence base of political science and the related social sciences are beginning an underappreciated but historic change.