The Parable of Google Flu: Traps in Big Data Analysis

Abstract
Large errors in flu prediction were largely avoidable, which offers lessons for the use of big data.
In February 2013, Google Flu Trends (GFT) made headlines but not for a reason that Google executives or the creators of the flu tracking system would have hoped. Nature reported that GFT was predicting more than double the proportion of doctor visits for influenza-like illness (ILI) than the Centers for Disease Control and Prevention (CDC), which bases its estimates on surveillance reports from laboratories across the United States ( 1, 2). This happened despite the fact that GFT was built to predict CDC reports. Given that GFT is often held up as an exemplary use of big data ( 3, 4), what lessons can we draw from this error?
See also “Google Flu Trends Still Appears Sick: An Evaluation of the 2013‐2014 Flu Season”.
See Also
- [Dataset] Replication data for: The Parable of Google Flu: Traps in Big Data Analysis
- [Paper] Calculating Standard Errors of Predicted Values Based on Nonlinear Functional Forms (1991)
- [Software] CLARIFY: Software for Interpreting and Presenting Statistical Results (2003)
- [Paper] Google Flu Trends Still Appears Sick: An Evaluation of the 2013‐2014 Flu Season (2014)
- [Paper] How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It (2015)
- [Paper] Making the Most of Statistical Analyses: Improving Interpretation and Presentation (2000)
- [Paper] Toward A Common Framework for Statistical Analysis and Development (2008)
- [Paper] Twitter: Big Data Opportunities—Response (2014)