The social sciences are undergoing a dramatic transformation from studying problems to solving them; from making do with a small number of sparse data sets to analyzing increasing quantities of diverse, highly informative data; from isolated scholars toiling away on their own to larger scale, collaborative, interdisciplinary, lab-style research teams; and from a purely academic pursuit to having a major impact on the world. To facilitate these important developments, universities, funding agencies, and governments need to shore up and adapt the infrastructure that supports social science research. We discuss some of these developments here, as well as a new type of organization we created at Harvard to help encourage them -- the Institute for Quantitative Social Science. An increasing number of universities are beginning efforts to respond with similar institutions. This paper provides some suggestions for how individual universities might respond and how we might work together to advance social science more generally.
We offer the first large scale, multiple source analysis of the outcome of what may be the most extensive effort to selectively censor human expression ever implemented. To do this, we have devised a system to locate, download, and analyze the content of millions of social media posts originating from nearly 1,400 different social media services all over China before the Chinese government is able to find, evaluate, and censor (i.e., remove from the Internet) the large subset they deem objectionable. Using modern computer-assisted text analytic methods that we adapt to and validate in the Chinese language, we compare the substantive content of posts censored to those not censored over time in each of 85 topic areas. Contrary to previous understandings, posts with negative, even vitriolic, criticism of the state, its leaders, and its policies are not more likely to be censored. Instead, we show that the censorship program is aimed at curtailing collective action by silencing comments that represent, reinforce, or spur social mobilization, regardless of content. Censorship is oriented toward attempting to forestall collective activities that are occurring now or may occur in the future --- and, as such, seem to clearly expose government intent.
"Robust standard errors'" are used in a vast array of scholarship to correct standard errors for model misspecification. However, when misspecification is bad enough to make classical and robust standard errors diverge, assuming that it is nevertheless not so bad as to bias everything else requires considerable optimism. And even if the optimism is warranted, settling for a misspecified model, with or without robust standard errors, will still bias estimators of all but a few quantities of interest. Even though this message is well known to methodologists and has appeared in the literature in several forms, it has failed to reach most applied researchers. The resulting cavernous gap between theory and practice suggests that considerable gains in applied statistics may be possible. We seek to help applied researchers realize these gains via an alternative perspective that offers a productive way to use robust standard errors; a new general and easier-to-use information test statistic which is easier to apply appropriately; and practical illustrations via simulations and real examples from published research. Instead of jettisoning this extremely popular tool, as some suggest, we show how robust and classical standard error differences can provide effective clues about model misspecification, likely biases, and a guide to more reliable inferences.
We marshal discoveries about human behavior and learning from social science research and show how they can be used to improve teaching and learning. The discoveries are easily stated as three social science generalizations: (1) social connections motivate, (2) teaching teaches the teacher, and (3) instant feedback improves learning. We show how to apply these generalizations via innovations in modern information technology inside, outside, and across university classrooms. We also give concrete examples of these ideas from innovations we have experimented with in our own teaching.
See also a video presentation of this talk before the Harvard Board of Overseers
A method for selecting clusterings to classify a predetermined data set of numerical data comprises five steps. First, a plurality of known clustering methods are applied, one at a time, to the data set to generate clusterings for each method. Second, a metric space of clusterings is generated using a metric that measures the similarity between two clusterings. Third, the metric space is projected to a lower dimensional representation useful for visualization. Fourth, a “local cluster ensemble” method generates a clustering for each point in the lower dimensional space. Fifth, an animated visualization method uses the output of the local cluster ensemble method to display the lower dimensional space and to allow a user to move around and explore the space of clustering.
Chinese government censorship of social media constitutes the largest selective suppression of human communication in the history of the world. Although existing systematic research on the subject has revealed a great deal, it is based on passive, observational methods, with well known inferential limitations. We attempt to generate more robust causal and descriptive inferences through participation and experimentation. For causal inferences, we conduct a large scale randomized experimental study by creating accounts on numerous social media sites spread throughout the country, submitting different randomly assigned types of social media texts, and detecting from a network of computers all over the world which types are censored. Then, for descriptive inferences, we supplement the current approach of confidential interviews by setting up our own social media site in China, contracting with Chinese firms to install the same censoring technologies as existing sites, and reverse engineering how it all works. Our results offer unambiguous support for, and clarification of, the emerging view that criticism of the state, its leaders, and their policies are routinely published whereas posts with collective action potential are much more likely to be censored. We are also able to clarify the internal mechanisms of the Chinese censorship apparatus and show that local social media sites have far more flexibility than was previously understood in how (but not what) they censor.
The American system of higher education is under attack by political, economic, and educational forces that threaten to undermine its business model, governmental support, and operating mission. The potential changes are considerably more dramatic and disruptive than what we've already experienced. Traditional colleges and universities urgently need a coherent, thought-out response. Their central role in ensuring the creation, preservation, and distribution of knowledge may be at risk and, as a consequence, so too may be the spectacular progress across fields we have come to expect as a result.
Symposium contributors include Henry E. Brady, John Mark Hansen, Gary King, Nannerl O. Keohane, Michael Laver, Virginia Sapiro, and Maya Sen.
Imbens, Guido, Donald B Rubin, Gary King, Richard A Berk, Daniel E Ho, Kevin M Quinn, James D Greiner, et al. 2012. Brief of Empirical Scholars as Amici Curiae. Filed with the Supreme Court of the United States in Abigail Noel Fisher v. University of Texas at Austin, et al.Abstract
In Grutter v. Bollinger,
this Court held that a state has a compelling interest in attaining a diverse student body for the benefit of all students, and thatthis compelling interest justifies the consideration of race as a factor in university admissions. See 539 U.S. 306, 325, 328 (2003). In this, the latest case to consider the constitutionality of affirmative-action admissions policies, Professor Richard H. Sander, along with lawyer and journalist Stuart S. Taylor, Jr., filed a brief amici curiae arguing that social-8science research has shown affirmative action to be harmful to minority students. See Brief Amici Curiae for Richard Sander and Stuart Taylor, Jr. in Supportof Neither Party (“Sander-Taylor Brief”) 2. According to them, a “growing volume of very careful research, some of it completely unrebutted by dissenting work” has found that affirmative-action practices are not having their intended effect. Id.; see also Brief Amici Curiae of Gail Heriot et al. in Support of Petitioner (“Three Commissioners Brief”) 14 (“The Commissioner Amici are aware of no empirical research that challenges [Sander’s] findings.”).
But, as amici will show, the principal research on which Sander and Taylor rely for their conclusion about the negative effects of affirmative action—Sander’s so-called “mismatch” hypothesis2—is far from “unrebutted.” Sander-Taylor Brief 2. Since Sander first published findings in support of a“mismatch” in 2004, that research has been subjected to wide-ranging criticism. Nor is Sander’s research “very careful.” Id. As some of those critiques discussin detail, Sander’s research has major methodologicalflaws—misapplying basic principles of causal inference—that call into doubt his controversial conclusions about affirmative action. The Sander “mismatch” research—and its provocative claim that, on average, minority students admitted through affirmative action would be better off attending less selective colleges and universities—is not good social science.
Sander’s research has “significantly overestimated the costs of affirmative action and failed to demonstrate benefits from ending it.” David L. Chambers et al., The Real Impact of Affirmative Action in American Law Schools: An Empirical Critique of Richard Sander’s Study, 57 Stan. L. Rev. 1855, 1857 (2005). That research, which consists of weak empirical contentions that fail to meet the basic tenets of rigorous social-science research, provides no basis for this Court to revisit longstanding precedent supporting the individualized consideration of race in admissions. Cf. Grutter, 539 U.S. at 334 (“Universities can * * * consider race or ethnicity more flexibly as a ‘plus’ factor in the context of individualized consideration of each and every applicant.”) (citing Regents of Univ. of Cal. v. Bakke, 438 U.S. 265, 315-316 (1978) (opinion of Powell, J.,)).In light of the significant methodological flaws on which it rests, Sander’s research does not constitute credible evidence that affirmative action practices are harmful to minorities, let alone that the diversity rationale at the heart of Grutter is at odds with social science.
In the election for President of the United States, the Electoral College is the body whose members vote to elect the President directly. Each state sends a number of delegates equal to its total number of representatives and senators in Congress; all but two states (Nebraska and Maine) assign electors pledged to the candidate that wins the state's plurality vote. We investigate the effect on presidential elections if states were to assign their electoral votes according to results in each congressional district,and conclude that the direct popular vote and the current electoral college are both substantially fairer compared to those alternatives where states would have divided their electoral votes by congressional district.
The simplicity and power of matching methods have made them an increasingly popular approach to causal inference in observational data. Existing theories that justify these techniques are well developed but either require exact matching, which is usually infeasible in practice, or sacrifice some simplicity via asymptotic theory, specialized bias corrections, and novel variance estimators; and extensions to approximate matching with multicategory treatments have not yet appeared. As an alternative, we show how conceptualizing continuous variables as having logical breakpoints (such as phase transitions when measuring temperature or high school or college degrees in years of education) is both natural substantively and can be used to simplify causal inference theory. The result is a finite sample theory that is widely applicable, simple to understand, and easy to implement by using matching to preprocess the data, after which one can use whatever method would have been applied without matching. The theoretical simplicity also allows for binary, multicategory, and continuous treatment variables from the start and for extensions to valid inference under imperfect treatment assignment.
The financial viability of Social Security, the single largest U.S. Government program, depends on accurate forecasts of the solvency of its intergenerational trust fund. We begin by detailing information necessary for replicating the Social Security Administration’s (SSA’s) forecasting procedures, which until now has been unavailable in the public domain. We then offer a way to improve the quality of these procedures due to age-and sex-specific mortality forecasts. The most recent SSA mortality forecasts were based on the best available technology at the time, which was a combination of linear extrapolation and qualitative judgments. Unfortunately, linear extrapolation excludes known risk factors and is inconsistent with long-standing demographic patterns such as the smoothness of age profiles. Modern statistical methods typically outperform even the best qualitative judgments in these contexts. We show how to use such methods here, enabling researchers to forecast using far more information, such as the known risk factors of smoking and obesity and known demographic patterns. Including this extra information makes a sub¬stantial difference: For example, by only improving mortality forecasting methods, we predict three fewer years of net surplus, $730 billion less in Social Security trust funds, and program costs that are 0.66% greater of projected taxable payroll compared to SSA projections by 2031. More important than specific numerical estimates are the advantages of transparency, replicability, reduction of uncertainty, and what may be the resulting lower vulnerability to the politicization of program forecasts. In addition, by offering with this paper software and detailed replication information, we hope to marshal the efforts of the research community to include ever more informative inputs and to continue to reduce the uncertainties in Social Security forecasts.
This work builds on our article that provides forecasts of US Mortality rates (see King and Soneji, The Future of Death in America), a book developing improved methods for forecasting mortality (Girosi and King, Demographic Forecasting), all data we used (King and Soneji, replication data sets), and open source software that implements the methods (Girosi and King, YourCast). Also available is a New York Times Op-Ed based on this work (King and Soneji, Social Security: It’s Worse Than You Think), and a replication data set for the Op-Ed (King and Soneji, replication data set).
A method of computerized content analysis that gives “approximately unbiased and statistically consistent estimates” of a distribution of elements of structured, unstructured, and partially structured source data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be equal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.
We highlight common problems in the application of random treatment assignment in large scale program evaluation. Random assignment is the defining feature of modern experimental design. Yet, errors in design, implementation, and analysis often result in real world applications not benefiting from the advantages of randomization. The errors we highlight cover the control of variability, levels of randomization, size of treatment arms, and power to detect causal effects, as well as the many problems that commonly lead to post-treatment bias. We illustrate with an application to the Medicare Health Support evaluation, including recommendations for improving the design and analysis of this and other large scale randomized experiments.
Massive increases in the availability of informative social science data are making dramatic progress possible in analyzing, understanding, and addressing many major societal problems. Yet the same forces pose severe challenges to the scientific infrastructure supporting data sharing, data management, informatics, statistical methodology, and research ethics and policy, and these are collectively holding back progress. I address these changes and challenges and suggest what can be done.
We introduce a new "Monotonic Imbalance Bounding" (MIB) class of matching methods for causal inference with a surprisingly large number of attractive statistical properties. MIB generalizes and extends in several new directions the only existing class, "Equal Percent Bias Reducing" (EPBR), which is designed to satisfy weaker properties and only in expectation. We also offer strategies to obtain specific members of the MIB class, and analyze in more detail a member of this class, called Coarsened Exact Matching, whose properties we analyze from this new perspective. We offer a variety of analytical results and numerical simulations that demonstrate how members of the MIB class can dramatically improve inferences relative to EPBR-based matching methods.
Social scientists typically devote considerable effort to reducing measurement error during data collection and then ignore the issue during data analysis. Although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative that generalizes the popular multiple imputation (MI) framework by treating missing data problems as a special case of extreme measurement error and correcting for both. Like MI, the proposed "multiple overimputation" (MO) framework is a simple two-step procedure. First, multiple (≈5) completed copies of the data set are created where cells measured without error are held constant, those missing are imputed from the distribution of predicted values, and cells (or entire variables) with measurement error are "overimputed," that is imputed from a predictive distribution with observation-level priors defined by the mismeasured values and available external information, if any. In the second step, analysts can then run whatever statistical method they would have run on each of the overimputed data sets as if there had been no missingness or measurement error; the results are then combined via a simple procedure. We also (will) offer open source software that implements all the methods described herein.
Amelia II is a complete R package for multiple imputation of missing data. The package implements a new expectation-maximization with bootstrapping algorithm that works faster, with larger numbers of variables, and is far easier to use, than various Markov chain Monte Carlo approaches, but gives essentially the same answers. The program also improves imputation models by allowing researchers to put Bayesian priors on individual cell values, thereby including a great deal of potentially valuable and extensive information. It also includes features to accurately impute cross-sectional datasets, individual time series, or sets of time series for diﬀerent cross-sections. A full set of graphical diagnostics are also available. The program is easy to use, and the simplicity of the algorithm makes it far more robust; both a simple command line and extensive graphical user interface are included.
Amelia II software web site
When respondents use the ordinal response categories of standard survey questions in different ways, the validity of analyses based on the resulting data can be biased. Anchoring vignettes is a survey design technique intended to correct for some of these problems. The anchors package in R includes methods for evaluating and choosing anchoring vignettes, and for analyzing the resulting data.
We discuss a method for improving causal inferences called "Coarsened Exact Matching'' (CEM), and the new "Monotonic Imbalance Bounding'' (MIB) class of matching methods from which CEM is derived. We summarize what is known about CEM and MIB, derive and illustrate several new desirable statistical properties of CEM, and then propose a variety of useful extensions. We show that CEM possesses a wide range of desirable statistical properties not available in most other matching methods, but is at the same time exceptionally easy to comprehend and use. We focus on the connection between theoretical properties and practical applications. We also make available easy-to-use open source software for R and Stata which implement all our suggestions.
Political Analysis versionAn Explanation of CEM Weights
Matching is an increasingly popular method of causal inference in observational data, but following methodological best practices has proven difficult for applied researchers. We address this problem by providing a simple graphical approach for choosing among the numerous possible matching solutions generated by three methods: the venerable ``Mahalanobis Distance Matching'' (MDM), the commonly used ``Propensity Score Matching'' (PSM), and a newer approach called ``Coarsened Exact Matching'' (CEM). In the process of using our approach, we also discover that PSM often approximates random matching, both in many real applications and in data simulated by the processes that fit PSM theory. Moreover, contrary to conventional wisdom, random matching is not benign: it (and thus PSM) can often degrade inferences relative to not matching at all. We find that MDM and CEM do not have this problem, and in practice CEM usually outperforms the other two approaches. However, with our comparative graphical approach and easy-to-follow procedures, focus can be on choosing a matching solution for a particular application, which is what may improve inferences, rather than the particular method used to generate it.
We introduce a method for estimating incidence curves of several co-circulating infectious pathogens, where each infection has its own probabilities of particular symptom profiles. Our deconvolution method utilizes weekly surveillance data on symptoms from a defined population as well as additional data on symptoms from a sample of virologically confirmed infectious episodes. We illustrate this method by numerical simulations and by using data from a survey conducted on the University of Michigan campus. Last, we describe the data needs to make such estimates accurate.
Link to PLoS version
Population mortality forecasts are widely used for allocating public health expenditures, setting research priorities, and evaluating the viability of public pensions, private pensions, and health care financing systems. In part because existing methods seem to forecast worse when based on more information, most forecasts are still based on simple linear extrapolations that ignore known biological risk factors and other prior information. We adapt a Bayesian hierarchical forecasting model capable of including more known health and demographic information than has previously been possible. This leads to the first age- and sex-specific forecasts of American mortality that simultaneously incorporate, in a formal statistical model, the effects of the recent rapid increase in obesity, the steady decline in tobacco consumption, and the well known patterns of smooth mortality age profiles and time trends. Formally including new information in forecasts can matter a great deal. For example, we estimate an increase in male life expectancy at birth from 76.2 years in 2010 to 79.9 years in 2030, which is 1.8 years greater than the U.S. Social Security Administration projection and 1.5 years more than U.S. Census projection. For females, we estimate more modest gains in life expectancy at birth over the next twenty years from 80.5 years to 81.9 years, which is virtually identical to the Social Security Administration projection and 2.0 years less than U.S. Census projections. We show that these patterns are also likely to greatly affect the aging American population structure. We offer an easy-to-use approach so that researchers can include other sources of information and potentially improve on our forecasts too.
We develop a computer-assisted method for the discovery of insightful conceptualizations, in the form of clusterings (i.e., partitions) of input objects. Each of the numerous fully automated methods of cluster analysis proposed in statistics, computer science, and biology optimize a different objective function. Almost all are well defined, but how to determine before the fact which one, if any, will partition a given set of objects in an "insightful" or "useful" way for a given user is unknown and difficult, if not logically impossible. We develop a metric space of partitions from all existing cluster analysis methods applied to a given data set (along with millions of other solutions we add based on combinations of existing clusterings), and enable a user to explore and interact with it, and quickly reveal or prompt useful or insightful conceptualizations. In addition, although uncommon in unsupervised learning problems, we offer and implement evaluation designs that make our computer-assisted approach vulnerable to being proven suboptimal in specific data types. We demonstrate that our approach facilitates more efficient and insightful discovery of useful information than either expert human coders or many existing fully automated methods.