Writings

2013
How Social Science Research Can Improve Teaching
Gary King and Maya Sen. 2013. “How Social Science Research Can Improve Teaching.” PS: Political Science and Politics, 46, 3, Pp. 621-629.Abstract

We marshal discoveries about human behavior and learning from social science research and show how they can be used to improve teaching and learning. The discoveries are easily stated as three social science generalizations: (1) social connections motivate, (2) teaching teaches the teacher, and (3) instant feedback improves learning. We show how to apply these generalizations via innovations in modern information technology inside, outside, and across university classrooms. We also give concrete examples of these ideas from innovations we have experimented with in our own teaching.

See also a video presentation of this talk before the Harvard Board of Overseers

Article
Method and Apparatus for Selecting Clusterings to Classify A Predetermined Data Set
Gary King and Justin Grimmer. 2013. “Method and Apparatus for Selecting Clusterings to Classify A Predetermined Data Set.” United States of America 8,438,162 (May 7).Abstract

A method for selecting clusterings to classify a predetermined data set of numerical data comprises five steps. First, a plurality of known clustering methods are applied, one at a time, to the data set to generate clusterings for each method. Second, a metric space of clusterings is generated using a metric that measures the similarity between two clusterings. Third, the metric space is projected to a lower dimensional representation useful for visualization. Fourth, a “local cluster ensemble” method generates a clustering for each point in the lower dimensional space. Fifth, an animated visualization method uses the output of the local cluster ensemble method to display the lower dimensional space and to allow a user to move around and explore the space of clustering.

Patent
The Troubled Future of Colleges and Universities (with comments from five scholar-administrators)
Gary King and Maya Sen. 2013. “The Troubled Future of Colleges and Universities (with comments from five scholar-administrators).” PS: Political Science and Politics, 46, 1, Pp. 81--113.Abstract

The American system of higher education is under attack by political, economic, and educational forces that threaten to undermine its business model, governmental support, and operating mission. The potential changes are considerably more dramatic and disruptive than what we've already experienced. Traditional colleges and universities urgently need a coherent, thought-out response. Their central role in ensuring the creation, preservation, and distribution of knowledge may be at risk and, as a consequence, so too may be the spectacular progress across fields we have come to expect as a result.

Symposium contributors include Henry E. Brady, John Mark Hansen, Gary King, Nannerl O. Keohane, Michael Laver, Virginia Sapiro, and Maya Sen.

Article Symposium Introduction Full symposium
2012
Guido Imbens, Donald B Rubin, Gary King, Richard A Berk, Daniel E Ho, Kevin M Quinn, James D Greiner, Ian Ayres, Richard Brooks, Paul Oyer, and Richard Lempert. 2012. “Brief of Empirical Scholars as Amici Curiae.” Filed with the Supreme Court of the United States in Abigail Noel Fisher v. University of Texas at Austin, et al.Abstract
In Grutter v. Bollinger, this Court held that a state has a compelling interest in attaining a diverse student body for the benefit of all students, and thatthis compelling interest justifies the consideration of race as a factor in university admissions. See 539 U.S. 306, 325, 328 (2003). In this, the latest case to consider the constitutionality of affirmative-action admissions policies, Professor Richard H. Sander, along with lawyer and journalist Stuart S. Taylor, Jr., filed a brief amici curiae arguing that social-8science research has shown affirmative action to be harmful to minority students. See Brief Amici Curiae for Richard Sander and Stuart Taylor, Jr. in Supportof Neither Party (“Sander-Taylor Brief”) 2. According to them, a “growing volume of very careful research, some of it completely unrebutted by dissenting work” has found that affirmative-action practices are not having their intended effect. Id.; see also Brief Amici Curiae of Gail Heriot et al. in Support of Petitioner (“Three Commissioners Brief”) 14 (“The Commissioner Amici are aware of no empirical research that challenges [Sander’s] findings.”). But, as amici will show, the principal research on which Sander and Taylor rely for their conclusion about the negative effects of affirmative action—Sander’s so-called “mismatch” hypothesis2—is far from “unrebutted.” Sander-Taylor Brief 2. Since Sander first published findings in support of a“mismatch” in 2004, that research has been subjected to wide-ranging criticism. Nor is Sander’s research “very careful.” Id. As some of those critiques discussin detail, Sander’s research has major methodologicalflaws—misapplying basic principles of causal inference—that call into doubt his controversial conclusions about affirmative action. The Sander “mismatch” research—and its provocative claim that, on average, minority students admitted through affirmative action would be better off attending less selective colleges and universities—is not good social science. Sander’s research has “significantly overestimated the costs of affirmative action and failed to demonstrate benefits from ending it.” David L. Chambers et al., The Real Impact of Affirmative Action in American Law Schools: An Empirical Critique of Richard Sander’s Study, 57 Stan. L. Rev. 1855, 1857 (2005). That research, which consists of weak empirical contentions that fail to meet the basic tenets of rigorous social-science research, provides no basis for this Court to revisit longstanding precedent supporting the individualized consideration of race in admissions. Cf. Grutter, 539 U.S. at 334 (“Universities can * * * consider race or ethnicity more flexibly as a ‘plus’ factor in the context of individualized consideration of each and every applicant.”) (citing Regents of Univ. of Cal. v. Bakke, 438 U.S. 265, 315-316 (1978) (opinion of Powell, J.,)).In light of the significant methodological flaws on which it rests, Sander’s research does not constitute credible evidence that affirmative action practices are harmful to minorities, let alone that the diversity rationale at the heart of Grutter is at odds with social science.
Amici Brief
Causal Inference Without Balance Checking: Coarsened Exact Matching
Stefano M. Iacus, Gary King, and Giuseppe Porro. 2012. “Causal Inference Without Balance Checking: Coarsened Exact Matching.” Political Analysis, 20, 1, Pp. 1--24. WebsiteAbstract

We discuss a method for improving causal inferences called "Coarsened Exact Matching'' (CEM), and the new "Monotonic Imbalance Bounding'' (MIB) class of matching methods from which CEM is derived. We summarize what is known about CEM and MIB, derive and illustrate several new desirable statistical properties of CEM, and then propose a variety of useful extensions. We show that CEM possesses a wide range of desirable statistical properties not available in most other matching methods, but is at the same time exceptionally easy to comprehend and use. We focus on the connection between theoretical properties and practical applications. We also make available easy-to-use open source software for R and Stata which implement all our suggestions.

An Explanation of CEM Weights

Article
Estimating Partisan Bias of the Electoral College Under Proposed Changes in Elector Apportionment
AC Thomas, Andrew Gelman, Gary King, and Jonathan N Katz. 2012. “Estimating Partisan Bias of the Electoral College Under Proposed Changes in Elector Apportionment.” Statistics, Politics, and Policy, Pp. 1-13. Statistics, Politics and Policy (publisher version)Abstract

In the election for President of the United States, the Electoral College is the body whose members vote to elect the President directly. Each state sends a number of delegates equal to its total number of representatives and senators in Congress; all but two states (Nebraska and Maine) assign electors pledged to the candidate that wins the state's plurality vote. We investigate the effect on presidential elections if states were to assign their electoral votes according to results in each congressional district,and conclude that the direct popular vote and the current electoral college are both substantially fairer compared to those alternatives where states would have divided their electoral votes by congressional district.

Article
Letter to the Editor on the "Medicare Health Support Pilot Program" (by McCall and Cromwell)
Gary King, Richard Nielsen, and Aaron Wells. 2012. “Letter to the Editor on the "Medicare Health Support Pilot Program" (by McCall and Cromwell).” New England Journal of Medicine, 366, 7, Pp. 667. New England Journal of Medicine version Published Letter
Statistical Security for Social Security
Samir Soneji and Gary King. 2012. “Statistical Security for Social Security.” Demography, 49, 3, Pp. 1037-1060 . Publisher's versionAbstract

The financial viability of Social Security, the single largest U.S. Government program, depends on accurate forecasts of the solvency of its intergenerational trust fund. We begin by detailing information necessary for replicating the Social Security Administration’s (SSA’s) forecasting procedures, which until now has been unavailable in the public domain. We then offer a way to improve the quality of these procedures due to age-and sex-specific mortality forecasts. The most recent SSA mortality forecasts were based on the best available technology at the time, which was a combination of linear extrapolation and qualitative judgments. Unfortunately, linear extrapolation excludes known risk factors and is inconsistent with long-standing demographic patterns such as the smoothness of age profiles. Modern statistical methods typically outperform even the best qualitative judgments in these contexts. We show how to use such methods here, enabling researchers to forecast using far more information, such as the known risk factors of smoking and obesity and known demographic patterns. Including this extra information makes a sub¬stantial difference: For example, by only improving mortality forecasting methods, we predict three fewer years of net surplus, $730 billion less in Social Security trust funds, and program costs that are 0.66% greater of projected taxable payroll compared to SSA projections by 2031. More important than specific numerical estimates are the advantages of transparency, replicability, reduction of uncertainty, and what may be the resulting lower vulnerability to the politicization of program forecasts. In addition, by offering with this paper software and detailed replication information, we hope to marshal the efforts of the research community to include ever more informative inputs and to continue to reduce the uncertainties in Social Security forecasts.

This work builds on our article that provides forecasts of US Mortality rates (see King and Soneji, The Future of Death in America), a book developing improved methods for forecasting mortality (Girosi and King, Demographic Forecasting), all data we used (King and Soneji, replication data sets), and open source software that implements the methods (Girosi and King, YourCast).  Also available is a New York Times Op-Ed based on this work (King and Soneji, Social Security: It’s Worse Than You Think), and a replication data set for the Op-Ed (King and Soneji, replication data set).

Article
System for Estimating a Distribution of Message Content Categories in Source Data
Daniel Hopkins, Gary King, and Ying Lu. 2012. “System for Estimating a Distribution of Message Content Categories in Source Data.” United States of America 8180717 (May 15).Abstract

A method of computerized content analysis that gives “approximately unbiased and statistically consistent estimates” of a distribution of elements of structured, unstructured, and partially structured source data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be equal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.

Patent
2011
Amelia II: A Program for Missing Data
James Honaker, Gary King, and Matthew Blackwell. 2011. “Amelia II: A Program for Missing Data.” Journal of Statistical Software, 45, 7, Pp. 1-47.Abstract

Amelia II is a complete R package for multiple imputation of missing data. The package implements a new expectation-maximization with bootstrapping algorithm that works faster, with larger numbers of variables, and is far easier to use, than various Markov chain Monte Carlo approaches, but gives essentially the same answers. The program also improves imputation models by allowing researchers to put Bayesian priors on individual cell values, thereby including a great deal of potentially valuable and extensive information. It also includes features to accurately impute cross-sectional datasets, individual time series, or sets of time series for different cross-sections. A full set of graphical diagnostics are also available. The program is easy to use, and the simplicity of the algorithm makes it far more robust; both a simple command line and extensive graphical user interface are included.

Amelia II software web site

Article
Anchors: Software for Anchoring Vignettes Data
Jonathan Wand, Gary King, and Olivia Lau. 2011. “Anchors: Software for Anchoring Vignettes Data.” Journal of Statistical Software, 42, 3, Pp. 1--25. Publisher's VersionAbstract

When respondents use the ordinal response categories of standard survey questions in different ways, the validity of analyses based on the resulting data can be biased. Anchoring vignettes is a survey design technique intended to correct for some of these problems. The anchors package in R includes methods for evaluating and choosing anchoring vignettes, and for analyzing the resulting data.

Article
AutoCast: Automated Bayesian Forecasting with YourCast
Jonathan Bischof, Gary King, and Samir Soneji. 2011. “AutoCast: Automated Bayesian Forecasting with YourCast”. Publisher's Version
Avoiding Randomization Failure in Program Evaluation
Gary King, Richard Nielsen, Carter Coberley, James E Pope, and Aaron Wells. 2011. “Avoiding Randomization Failure in Program Evaluation.” Population Health Management, 14, 1, Pp. S11-S22.Abstract

We highlight common problems in the application of random treatment assignment in large scale program evaluation. Random assignment is the defining feature of modern experimental design. Yet, errors in design, implementation, and analysis often result in real world applications not benefiting from the advantages of randomization. The errors we highlight cover the control of variability, levels of randomization, size of treatment arms, and power to detect causal effects, as well as the many problems that commonly lead to post-treatment bias. We illustrate with an application to the Medicare Health Support evaluation, including recommendations for improving the design and analysis of this and other large scale randomized experiments.

Article
Comparative Effectiveness of Matching Methods for Causal Inference
Gary King, Richard Nielsen, Carter Coberley, James E Pope, and Aaron Wells. 2011. “Comparative Effectiveness of Matching Methods for Causal Inference”.Abstract

Matching is an increasingly popular method of causal inference in observational data, but following methodological best practices has proven difficult for applied researchers. We address this problem by providing a simple graphical approach for choosing among the numerous possible matching solutions generated by three methods: the venerable ``Mahalanobis Distance Matching'' (MDM), the commonly used ``Propensity Score Matching'' (PSM), and a newer approach called ``Coarsened Exact Matching'' (CEM). In the process of using our approach, we also discover that PSM often approximates random matching, both in many real applications and in data simulated by the processes that fit PSM theory. Moreover, contrary to conventional wisdom, random matching is not benign: it (and thus PSM) can often degrade inferences relative to not matching at all. We find that MDM and CEM do not have this problem, and in practice CEM usually outperforms the other two approaches. However, with our comparative graphical approach and easy-to-follow procedures, focus can be on choosing a matching solution for a particular application, which is what may improve inferences, rather than the particular method used to generate it.

Paper
Ensuring the Data Rich Future of the Social Sciences
Gary King. 2011. “Ensuring the Data Rich Future of the Social Sciences.” Science, 331, 11 February, Pp. 719-721.Abstract

Massive increases in the availability of informative social science data are making dramatic progress possible in analyzing, understanding, and addressing many major societal problems. Yet the same forces pose severe challenges to the scientific infrastructure supporting data sharing, data management, informatics, statistical methodology, and research ethics and policy, and these are collectively holding back progress. I address these changes and challenges and suggest what can be done.

Article
Estimating Incidence Curves of Several Infections Using Symptom Surveillance Data
Edward Goldstein, Benjamin J Cowling, Allison E Aiello, Saki Takahashi, Gary King, Ying Lu, and Marc Lipsitch. 2011. “Estimating Incidence Curves of Several Infections Using Symptom Surveillance Data.” PLoS ONE, 6, 8, Pp. e23380.Abstract

We introduce a method for estimating incidence curves of several co-circulating infectious pathogens, where each infection has its own probabilities of particular symptom profiles. Our deconvolution method utilizes weekly surveillance data on symptoms from a defined population as well as additional data on symptoms from a sample of virologically confirmed infectious episodes. We illustrate this method by numerical simulations and by using data from a survey conducted on the University of Michigan campus. Last, we describe the data needs to make such estimates accurate.

Link to PLoS version

Article
The Future of Death in America
Gary King and Samir Soneji. 2011. “The Future of Death in America.” Demographic Research, 25, 1, Pp. 1--38. WebsiteAbstract

Population mortality forecasts are widely used for allocating public health expenditures, setting research priorities, and evaluating the viability of public pensions, private pensions, and health care financing systems. In part because existing methods seem to forecast worse when based on more information, most forecasts are still based on simple linear extrapolations that ignore known biological risk factors and other prior information. We adapt a Bayesian hierarchical forecasting model capable of including more known health and demographic information than has previously been possible. This leads to the first age- and sex-specific forecasts of American mortality that simultaneously incorporate, in a formal statistical model, the effects of the recent rapid increase in obesity, the steady decline in tobacco consumption, and the well known patterns of smooth mortality age profiles and time trends. Formally including new information in forecasts can matter a great deal. For example, we estimate an increase in male life expectancy at birth from 76.2 years in 2010 to 79.9 years in 2030, which is 1.8 years greater than the U.S. Social Security Administration projection and 1.5 years more than U.S. Census projection. For females, we estimate more modest gains in life expectancy at birth over the next twenty years from 80.5 years to 81.9 years, which is virtually identical to the Social Security Administration projection and 2.0 years less than U.S. Census projections. We show that these patterns are also likely to greatly affect the aging American population structure. We offer an easy-to-use approach so that researchers can include other sources of information and potentially improve on our forecasts too.

Article
General Purpose Computer-Assisted Clustering and Conceptualization
Justin Grimmer and Gary King. 2011. “General Purpose Computer-Assisted Clustering and Conceptualization.” Proceedings of the National Academy of Sciences. Publisher's VersionAbstract

We develop a computer-assisted method for the discovery of insightful conceptualizations, in the form of clusterings (i.e., partitions) of input objects. Each of the numerous fully automated methods of cluster analysis proposed in statistics, computer science, and biology optimize a different objective function. Almost all are well defined, but how to determine before the fact which one, if any, will partition a given set of objects in an "insightful" or "useful" way for a given user is unknown and difficult, if not logically impossible. We develop a metric space of partitions from all existing cluster analysis methods applied to a given data set (along with millions of other solutions we add based on combinations of existing clusterings), and enable a user to explore and interact with it, and quickly reveal or prompt useful or insightful conceptualizations. In addition, although uncommon in unsupervised learning problems, we offer and implement evaluation designs that make our computer-assisted approach vulnerable to being proven suboptimal in specific data types. We demonstrate that our approach facilitates more efficient and insightful discovery of useful information than either expert human coders or many existing fully automated methods.

Article Supplemental notes
MatchIt: Nonparametric Preprocessing for Parametric Causal Inference
Daniel E. Ho, Kosuke Imai, Gary King, and Elizabeth A. Stuart. 2011. “MatchIt: Nonparametric Preprocessing for Parametric Causal Inference.” Journal of Statistical Software, 42, 8, Pp. 1--28. Publisher's VersionAbstract
MatchIt implements the suggestions of Ho, Imai, King, and Stuart (2007) for improving parametric statistical models by preprocessing data with nonparametric matching methods. MatchIt implements a wide range of sophisticated matching methods, making it possible to greatly reduce the dependence of causal inferences on hard-to-justify, but commonly made, statistical modeling assumptions. The software also easily ts into existing research practices since, after preprocessing data with MatchIt, researchers can use whatever parametric model they would have used without MatchIt, but produce inferences with substantially more robustness and less sensitivity to modeling assumptions. MatchIt is an R program, and also works seamlessly with Zelig.
Article
Multivariate Matching Methods That are Monotonic Imbalance Bounding
Stefano M Iacus, Gary King, and Giuseppe Porro. 2011. “Multivariate Matching Methods That are Monotonic Imbalance Bounding.” Journal of the American Statistical Association, 106, 493, Pp. 345-361.Abstract

We introduce a new "Monotonic Imbalance Bounding" (MIB) class of matching methods for causal inference with a surprisingly large number of attractive statistical properties. MIB generalizes and extends in several new directions the only existing class, "Equal Percent Bias Reducing" (EPBR), which is designed to satisfy weaker properties and only in expectation. We also offer strategies to obtain specific members of the MIB class, and analyze in more detail a member of this class, called Coarsened Exact Matching, whose properties we analyze from this new perspective. We offer a variety of analytical results and numerical simulations that demonstrate how members of the MIB class can dramatically improve inferences relative to EPBR-based matching methods.

Article

Pages