Last year was difficult for Google Flu Trends (GFT). In early 2013, Nature reported that GFT was estimating more than double the percentage of doctor visits for influenza like illness than the Centers for Disease Control and Prevention s (CDC) sentinel reports during the 2012 2013 flu season (1). Given that GFT was designed to forecast upcoming CDC reports, this was a problematic finding. In March 2014, our report in Science found that the overestimation problem in GFT was also present in the 2011 2012 flu season (2). The report also found strong evidence of autocorrelation and seasonality in the GFT errors, and presented evidence that the issues were likely, at least in part, due to modifications made by Google s search algorithm and the decision by GFT engineers not to use previous CDC reports or seasonality estimates in their models what the article labeled algorithm dynamics and big data hubris respectively. Moreover, the report and the supporting online materials detailed how difficult/impossible it is to replicate the GFT results, undermining independent efforts to explore the source of GFT errors and formulate improvements.
MatchingFrontier is an easy-to-use R Package for making optimal causal inferences from observational data. Despite their popularity, existing matching approaches leave researchers with two fundamental tensions. First, they are designed to maximize one metric (such as propensity score or Mahalanobis distance) but are judged against another for which they were not designed (such as L1 or differences in means). Second, they lack a principled solution to revealing the implicit bias-variance trade off: matching methods need to optimize with respect to both imbalance (between the treated and control groups) and the number of observations pruned, but existing approaches optimize with respect to only one; users then either ignore the other, or tweak it, usually suboptimally, by hand.
MatchingFrontier resolves both tensions by consolidating previous techniques into a single, optimal, and flexible approach. It calculates the matching solution with maximum balance for each possible sample size (N, N-1, N-2,...). It thus directly calculates the entire balance-sample size frontier, from which the user can easily choose one, several, or all subsamples from which to conduct their final analysis, given their own choice of imbalance metric and quantity of interest. MatchingFrontier solves the joint optimization problem in one run, automatically, without manual tweaking, and without iteration. Although for each subset size k, there exist a huge (N choose k) number of unique subsets, MatchingFrontier includes specially designed fast algorithms that give the optimal answer, usually in a few minutes.
This is a poster presentation describing (1) the largest ever experimental study of media effects, with more than 50 cooperating traditional media sites, normally unavailable web site analytics, the text of hundreds of thousands of news articles, and tens of millions of social media posts, and (2) a design we used in preparation that attempts to anticipate experimental outcomes
The social sciences are undergoing a dramatic transformation from studying problems to solving them; from making do with a small number of sparse data sets to analyzing increasing quantities of diverse, highly informative data; from isolated scholars toiling away on their own to larger scale, collaborative, interdisciplinary, lab-style research teams; and from a purely academic pursuit to having a major impact on the world. To facilitate these important developments, universities, funding agencies, and governments need to shore up and adapt the infrastructure that supports social science research. We discuss some of these developments here, as well as a new type of organization we created at Harvard to help encourage them -- the Institute for Quantitative Social Science. An increasing number of universities are beginning efforts to respond with similar institutions. This paper provides some suggestions for how individual universities might respond and how we might work together to advance social science more generally.
Existing research on the extensive Chinese censorship organization uses observational methods with well-known limitations. We conducted the first large-scale experimental study of censorship by creating accounts on numerous social media sites, randomly submitting different texts, and observing from a worldwide network of computers which texts were censored and which were not. We also supplemented interviews with confidential sources by creating our own social media site, contracting with Chinese firms to install the same censoring technologies as existing sites, and—with their software, documentation, and even customer support—reverse-engineering how it all works. Our results offer rigorous support for the recent hypothesis that criticisms of the state, its leaders, and their policies are published, whereas posts about real-world events with collective action potential are censored.
We offer the first large scale, multiple source analysis of the outcome of what may be the most extensive effort to selectively censor human expression ever implemented. To do this, we have devised a system to locate, download, and analyze the content of millions of social media posts originating from nearly 1,400 different social media services all over China before the Chinese government is able to find, evaluate, and censor (i.e., remove from the Internet) the large subset they deem objectionable. Using modern computer-assisted text analytic methods that we adapt to and validate in the Chinese language, we compare the substantive content of posts censored to those not censored over time in each of 85 topic areas. Contrary to previous understandings, posts with negative, even vitriolic, criticism of the state, its leaders, and its policies are not more likely to be censored. Instead, we show that the censorship program is aimed at curtailing collective action by silencing comments that represent, reinforce, or spur social mobilization, regardless of content. Censorship is oriented toward attempting to forestall collective activities that are occurring now or may occur in the future --- and, as such, seem to clearly expose government intent.
We marshal discoveries about human behavior and learning from social science research and show how they can be used to improve teaching and learning. The discoveries are easily stated as three social science generalizations: (1) social connections motivate, (2) teaching teaches the teacher, and (3) instant feedback improves learning. We show how to apply these generalizations via innovations in modern information technology inside, outside, and across university classrooms. We also give concrete examples of these ideas from innovations we have experimented with in our own teaching.
A method for selecting clusterings to classify a predetermined data set of numerical data comprises five steps. First, a plurality of known clustering methods are applied, one at a time, to the data set to generate clusterings for each method. Second, a metric space of clusterings is generated using a metric that measures the similarity between two clusterings. Third, the metric space is projected to a lower dimensional representation useful for visualization. Fourth, a “local cluster ensemble” method generates a clustering for each point in the lower dimensional space. Fifth, an animated visualization method uses the output of the local cluster ensemble method to display the lower dimensional space and to allow a user to move around and explore the space of clustering.
The American system of higher education is under attack by political, economic, and educational forces that threaten to undermine its business model, governmental support, and operating mission. The potential changes are considerably more dramatic and disruptive than what we've already experienced. Traditional colleges and universities urgently need a coherent, thought-out response. Their central role in ensuring the creation, preservation, and distribution of knowledge may be at risk and, as a consequence, so too may be the spectacular progress across fields we have come to expect as a result.
Symposium contributors include Henry E. Brady, John Mark Hansen, Gary King, Nannerl O. Keohane, Michael Laver, Virginia Sapiro, and Maya Sen.
Guido Imbens, Donald B Rubin, Gary King, Richard A Berk, Daniel E Ho, Kevin M Quinn, James D Greiner, Ian Ayres, Richard Brooks, Paul Oyer, and Richard Lempert. 2012. “Brief of Empirical Scholars as Amici Curiae.” Filed with the Supreme Court of the United States in Abigail Noel Fisher v. University of Texas at Austin, et al.Abstract
In Grutter v. Bollinger,
this Court held that a state has a compelling interest in attaining a diverse student body for the benefit of all students, and thatthis compelling interest justifies the consideration of race as a factor in university admissions. See 539 U.S. 306, 325, 328 (2003). In this, the latest case to consider the constitutionality of affirmative-action admissions policies, Professor Richard H. Sander, along with lawyer and journalist Stuart S. Taylor, Jr., filed a brief amici curiae arguing that social-8science research has shown affirmative action to be harmful to minority students. See Brief Amici Curiae for Richard Sander and Stuart Taylor, Jr. in Supportof Neither Party (“Sander-Taylor Brief”) 2. According to them, a “growing volume of very careful research, some of it completely unrebutted by dissenting work” has found that affirmative-action practices are not having their intended effect. Id.; see also Brief Amici Curiae of Gail Heriot et al. in Support of Petitioner (“Three Commissioners Brief”) 14 (“The Commissioner Amici are aware of no empirical research that challenges [Sander’s] findings.”).
But, as amici will show, the principal research on which Sander and Taylor rely for their conclusion about the negative effects of affirmative action—Sander’s so-called “mismatch” hypothesis2—is far from “unrebutted.” Sander-Taylor Brief 2. Since Sander first published findings in support of a“mismatch” in 2004, that research has been subjected to wide-ranging criticism. Nor is Sander’s research “very careful.” Id. As some of those critiques discussin detail, Sander’s research has major methodologicalflaws—misapplying basic principles of causal inference—that call into doubt his controversial conclusions about affirmative action. The Sander “mismatch” research—and its provocative claim that, on average, minority students admitted through affirmative action would be better off attending less selective colleges and universities—is not good social science.
Sander’s research has “significantly overestimated the costs of affirmative action and failed to demonstrate benefits from ending it.” David L. Chambers et al., The Real Impact of Affirmative Action in American Law Schools: An Empirical Critique of Richard Sander’s Study, 57 Stan. L. Rev. 1855, 1857 (2005). That research, which consists of weak empirical contentions that fail to meet the basic tenets of rigorous social-science research, provides no basis for this Court to revisit longstanding precedent supporting the individualized consideration of race in admissions. Cf. Grutter, 539 U.S. at 334 (“Universities can * * * consider race or ethnicity more flexibly as a ‘plus’ factor in the context of individualized consideration of each and every applicant.”) (citing Regents of Univ. of Cal. v. Bakke, 438 U.S. 265, 315-316 (1978) (opinion of Powell, J.,)).In light of the significant methodological flaws on which it rests, Sander’s research does not constitute credible evidence that affirmative action practices are harmful to minorities, let alone that the diversity rationale at the heart of Grutter is at odds with social science.
We discuss a method for improving causal inferences called "Coarsened Exact Matching'' (CEM), and the new "Monotonic Imbalance Bounding'' (MIB) class of matching methods from which CEM is derived. We summarize what is known about CEM and MIB, derive and illustrate several new desirable statistical properties of CEM, and then propose a variety of useful extensions. We show that CEM possesses a wide range of desirable statistical properties not available in most other matching methods, but is at the same time exceptionally easy to comprehend and use. We focus on the connection between theoretical properties and practical applications. We also make available easy-to-use open source software for R and Stata which implement all our suggestions.
In the election for President of the United States, the Electoral College is the body whose members vote to elect the President directly. Each state sends a number of delegates equal to its total number of representatives and senators in Congress; all but two states (Nebraska and Maine) assign electors pledged to the candidate that wins the state's plurality vote. We investigate the effect on presidential elections if states were to assign their electoral votes according to results in each congressional district,and conclude that the direct popular vote and the current electoral college are both substantially fairer compared to those alternatives where states would have divided their electoral votes by congressional district.
The financial viability of Social Security, the single largest U.S. Government program, depends on accurate forecasts of the solvency of its intergenerational trust fund. We begin by detailing information necessary for replicating the Social Security Administration’s (SSA’s) forecasting procedures, which until now has been unavailable in the public domain. We then offer a way to improve the quality of these procedures due to age-and sex-specific mortality forecasts. The most recent SSA mortality forecasts were based on the best available technology at the time, which was a combination of linear extrapolation and qualitative judgments. Unfortunately, linear extrapolation excludes known risk factors and is inconsistent with long-standing demographic patterns such as the smoothness of age profiles. Modern statistical methods typically outperform even the best qualitative judgments in these contexts. We show how to use such methods here, enabling researchers to forecast using far more information, such as the known risk factors of smoking and obesity and known demographic patterns. Including this extra information makes a sub¬stantial difference: For example, by only improving mortality forecasting methods, we predict three fewer years of net surplus, $730 billion less in Social Security trust funds, and program costs that are 0.66% greater of projected taxable payroll compared to SSA projections by 2031. More important than specific numerical estimates are the advantages of transparency, replicability, reduction of uncertainty, and what may be the resulting lower vulnerability to the politicization of program forecasts. In addition, by offering with this paper software and detailed replication information, we hope to marshal the efforts of the research community to include ever more informative inputs and to continue to reduce the uncertainties in Social Security forecasts.
A method of computerized content analysis that gives “approximately unbiased and statistically consistent estimates” of a distribution of elements of structured, unstructured, and partially structured source data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be equal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.
Amelia II is a complete R package for multiple imputation of missing data. The package implements a new expectation-maximization with bootstrapping algorithm that works faster, with larger numbers of variables, and is far easier to use, than various Markov chain Monte Carlo approaches, but gives essentially the same answers. The program also improves imputation models by allowing researchers to put Bayesian priors on individual cell values, thereby including a great deal of potentially valuable and extensive information. It also includes features to accurately impute cross-sectional datasets, individual time series, or sets of time series for diﬀerent cross-sections. A full set of graphical diagnostics are also available. The program is easy to use, and the simplicity of the algorithm makes it far more robust; both a simple command line and extensive graphical user interface are included.
When respondents use the ordinal response categories of standard survey questions in different ways, the validity of analyses based on the resulting data can be biased. Anchoring vignettes is a survey design technique intended to correct for some of these problems. The anchors package in R includes methods for evaluating and choosing anchoring vignettes, and for analyzing the resulting data.
We highlight common problems in the application of random treatment assignment in large scale program evaluation. Random assignment is the defining feature of modern experimental design. Yet, errors in design, implementation, and analysis often result in real world applications not benefiting from the advantages of randomization. The errors we highlight cover the control of variability, levels of randomization, size of treatment arms, and power to detect causal effects, as well as the many problems that commonly lead to post-treatment bias. We illustrate with an application to the Medicare Health Support evaluation, including recommendations for improving the design and analysis of this and other large scale randomized experiments.
Matching is an increasingly popular method of causal inference in observational data, but following methodological best practices has proven difficult for applied researchers. We address this problem by providing a simple graphical approach for choosing among the numerous possible matching solutions generated by three methods: the venerable ``Mahalanobis Distance Matching'' (MDM), the commonly used ``Propensity Score Matching'' (PSM), and a newer approach called ``Coarsened Exact Matching'' (CEM). In the process of using our approach, we also discover that PSM often approximates random matching, both in many real applications and in data simulated by the processes that fit PSM theory. Moreover, contrary to conventional wisdom, random matching is not benign: it (and thus PSM) can often degrade inferences relative to not matching at all. We find that MDM and CEM do not have this problem, and in practice CEM usually outperforms the other two approaches. However, with our comparative graphical approach and easy-to-follow procedures, focus can be on choosing a matching solution for a particular application, which is what may improve inferences, rather than the particular method used to generate it.