Book Chapter

Preface: Big Data is Not About the Data!
Gary King. 2016. “Preface: Big Data is Not About the Data!.” In Computational Social Science: Discovery and Prediction, edited by R. Michael Alvarez. Cambridge: Cambridge University Press, 2016.Abstract

A few years ago, explaining what you did for a living to Dad, Aunt Rose, or your friend from high school was pretty complicated. Answering that you develop statistical estimators, work on numerical optimization, or, even better, are working on a great new Markov Chain Monte Carlo implementation of a Bayesian model with heteroskedastic errors for automated text analysis is pretty much the definition of conversation stopper.

Then the media noticed the revolution we’re all apart of, and they glued a label to it. Now “Big Data” is what you and I do.  As trivial as this change sounds, we should be grateful for it, as the name seems to resonate with the public and so it helps convey the importance of our field to others better than we had managed to do ourselves. Yet, now that we have everyone’s attention, we need to start clarifying for others -- and ourselves -- what the revolution means. This is much of what this book is about.

Throughout, we need to remember that for the most part, Big Data is not about the data....

The C-SPAN Archives as The Policymaking Record of American Representative Democracy: A Foreword
Gary King. 2016. “The C-SPAN Archives as The Policymaking Record of American Representative Democracy: A Foreword.” In Exploring the C-SPAN Archives: Advancing the Research Agenda, edited by Robert X Browning. West Lafayette, IN: Purdue University Press.Abstract

Almost two centuries ago, the idea of research libraries, and the possibility of building them at scale, began to be realized. Although we can find these libraries at every major college and university in the world today, and at many noneducational research institutions, this outcome was by no means obvious at the time. And the benefits we all now enjoy from their existence were then at best merely vague speculations.

How many would have supported the formation of these institutions at the time, without knowing the benefits that have since become obvious? After all, the arguments against this massive ongoing expenditure are impressive. The proposal was to construct large buildings, hire staff, purchase all manner of books and other publications and catalogue and shelve them, provide access to visitors, and continually reorder all the books that the visitors disorder. And the libraries would keep the books, and fund the whole operation, in perpetuity. Publications would be collected without anyone deciding which were of high quality and thus deserving of preservation—leading critics to argue that all this effort would result in expensive buildings packed mostly with junk.  . . .

Inference in Case Control Studies
Gary King, Langche Zeng, and Shein-Chung Chow. 2010. “Inference in Case Control Studies.” In Encyclopedia of Biopharmaceutical Statistics, 3rd ed. New York: Marcel Dekker.Abstract

Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just the relative risks and rates, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information. This is a somewhat revised and extended version of Gary King and Langche Zeng. 2002. "Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control Studies," Statistics in Medicine, 21: 1409-1427. You may also be interested in our related work in other fields, such as in international relations, Gary King and Langche Zeng. "Explaining Rare Events in International Relations," International Organization, 55, 3 (Spring, 2001): 693-715, and in political methodology, Gary King and Langche Zeng, "Logistic Regression in Rare Events Data," Political Analysis, Vol. 9, No. 2, (Spring, 2001): Pp. 137--63.

The Changing Evidence Base of Social Science Research
Gary King. 2009. “The Changing Evidence Base of Social Science Research.” In The Future of Political Science: 100 Perspectives, edited by Gary King, Kay Schlozman, and Norman Nie. New York: Routledge Press.Abstract

This (two-page) article argues that the evidence base of political science and the related social sciences are beginning an underappreciated but historic change.

Gary King, Ori Rosen, and Martin Tanner. 2006. “The New Palgrave Dictionary of Economics.” In Ecological Inference, edited by Larry Blume and Steven N Durlauf, 2nd ed.Abstract
Dictionary entry on the definition of "ecological inference," and a brief summary of the history of ecological inference research.
The Effect of War on the Supreme Court
Lee Epstein, Daniel E. Ho, Gary King, and Jeffrey A. Segal. 2006. “The Effect of War on the Supreme Court.” In Principles and Practice in American Politics: Classic and Contemporary Readings, edited by Samuel Kernell and Steven S. Smith, 3rd ed. Washington, D.C.: Congressional Quarterly Press.Abstract

Does the U.S. Supreme Court curtail rights and liberties when the nation’s security is under threat? In hundreds of articles and books, and with renewed fervor since September 11, 2001, members of the legal community have warred over this question. Yet, not a single large-scale, quantitative study exists on the subject. Using the best data available on the causes and outcomes of every civil rights and liberties case decided by the Supreme Court over the past six decades and employing methods chosen and tuned especially for this problem, our analyses demonstrate that when crises threaten the nation’s security, the justices are substantially more likely to curtail rights and liberties than when peace prevails. Yet paradoxically, and in contradiction to virtually every theory of crisis jurisprudence, war appears to affect only cases that are unrelated to the war. For these cases, the effect of war and other international crises is so substantial, persistent, and consistent that it may surprise even those commentators who long have argued that the Court rallies around the flag in times of crisis. On the other hand, we find no evidence that cases most directly related to the war are affected. We attempt to explain this seemingly paradoxical evidence with one unifying conjecture: Instead of balancing rights and security in high stakes cases directly related to the war, the Justices retreat to ensuring the institutional checks of the democratic branches. Since rights-oriented and process-oriented dimensions seem to operate in different domains and at different times, and often suggest different outcomes, the predictive factors that work for cases unrelated to the war fail for cases related to the war. If this conjecture is correct, federal judges should consider giving less weight to legal principles outside of wartime but established during wartime, and attorneys should see it as their responsibility to distinguish cases along these lines.

Inference in Case-Control Studies
Gary King and Langche Zeng. 2004. “Inference in Case-Control Studies.” In Encyclopedia of Biopharmaceutical Statistics, edited by Shein-Chung Chow, 2nd ed. New York: Marcel Dekker.Abstract

Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just the relative risks and rates, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information.

Empirically Evaluating the Electoral College
Andrew Gelman, Jonathan Katz, and Gary King. 2004. “Empirically Evaluating the Electoral College.” In Rethinking the Vote: The Politics and Prospects of American Electoral Reform, edited by Ann N Crigler, Marion R Just, and Edward J McCaffery, 75-88. New York: Oxford University Press.Abstract

The 2000 U.S. presidential election rekindled interest in possible electoral reform. While most of the popular and academic accounts focused on balloting irregularities in Florida, such as the now infamous "butterfly" ballot and mishandled absentee ballots, some also noted that this election marked only the fourth time in history that the candidate with a plurality of the popular vote did not also win the Electoral College. This "anti-democratic" outcome has fueled desire for reform or even outright elimination of the electoral college. We show that after appropriate statistical analysis of the available historical electoral data, there is little basis to argue for reforming the Electoral College. We first show that while the Electoral College may once have been biased against the Democrats, the current distribution of voters advantages neither party. Further, the electoral vote will differ from the popular vote only when the average vote shares of the two major candidates are extremely close to 50 percent. As for individual voting power, we show that while there has been much temporal variation in relative voting power over the last several decades, the voting power of individual citizens would not likely increase under a popular vote system of electing the president.