Congratulations! You’ve made it to graduate school. This means you’re in a select group, about to embark on a great adventure to learn about the world and teach us all some new things. This also means you obviously know how to follow rules. So I have five for you -- not counting the obvious one that to learn new things you’ll need to break some rules. After all, to be a successful academic, you’ll need to cut a new path, and so if you do exactly what your advisors and I did, you won’t get anywhere near as far since we already did it. So here are some rules, but break some of them, perhaps including this one
Almost two centuries ago, the idea of research libraries, and the possibility of building them at scale, began to be realized. Although we can find these libraries at every major college and university in the world today, and at many noneducational research institutions, this outcome was by no means obvious at the time. And the benefits we all now enjoy from their existence were then at best merely vague speculations.
How many would have supported the formation of these institutions at the time, without knowing the benefits that have since become obvious? After all, the arguments against this massive ongoing expenditure are impressive. The proposal was to construct large buildings, hire staff, purchase all manner of books and other publications and catalogue and shelve them, provide access to visitors, and continually reorder all the books that the visitors disorder. And the libraries would keep the books, and fund the whole operation, in perpetuity. Publications would be collected without anyone deciding which were of high quality and thus deserving of preservation—leading critics to argue that all this effort would result in expensive buildings packed mostly with junk. . . .
A few years ago, explaining what you did for a living to Dad, Aunt Rose, or your friend from high school was pretty complicated. Answering that you develop statistical estimators, work on numerical optimization, or, even better, are working on a great new Markov Chain Monte Carlo implementation of a Bayesian model with heteroskedastic errors for automated text analysis is pretty much the definition of conversation stopper.
Then the media noticed the revolution we’re all apart of, and they glued a label to it. Now “Big Data” is what you and I do. As trivial as this change sounds, we should be grateful for it, as the name seems to resonate with the public and so it helps convey the importance of our field to others better than we had managed to do ourselves. Yet, now that we have everyone’s attention, we need to start clarifying for others -- and ourselves -- what the revolution means. This is much of what this book is about.
Throughout, we need to remember that for the most part, Big Data is not about the data....
Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just the relative risks and rates, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information. This is a somewhat revised and extended version of Gary King and Langche Zeng. 2002. "Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control Studies," Statistics in Medicine, 21: 1409-1427. You may also be interested in our related work in other fields, such as in international relations, Gary King and Langche Zeng. "Explaining Rare Events in International Relations," International Organization, 55, 3 (Spring, 2001): 693-715, and in political methodology, Gary King and Langche Zeng, "Logistic Regression in Rare Events Data," Political Analysis, Vol. 9, No. 2, (Spring, 2001): Pp. 137--63.
Lee Epstein, Daniel E. Ho, Gary King, and Jeffrey A. Segal. 2006. “The Effect of War on the Supreme Court.” In Principles and Practice in American Politics: Classic and Contemporary Readings, edited by Samuel Kernell and Steven S. Smith, 3rd ed. Washington, D.C. Congressional Quarterly Press.Abstract
Does the U.S. Supreme Court curtail rights and liberties when the nation’s security is under threat? In hundreds of articles and books, and with renewed fervor since September 11, 2001, members of the legal community have warred over this question. Yet, not a single large-scale, quantitative study exists on the subject. Using the best data available on the causes and outcomes of every civil rights and liberties case decided by the Supreme Court over the past six decades and employing methods chosen and tuned especially for this problem, our analyses demonstrate that when crises threaten the nation’s security, the justices are substantially more likely to curtail rights and liberties than when peace prevails. Yet paradoxically, and in contradiction to virtually every theory of crisis jurisprudence, war appears to affect only cases that are unrelated to the war. For these cases, the effect of war and other international crises is so substantial, persistent, and consistent that it may surprise even those commentators who long have argued that the Court rallies around the flag in times of crisis. On the other hand, we find no evidence that cases most directly related to the war are affected. We attempt to explain this seemingly paradoxical evidence with one unifying conjecture: Instead of balancing rights and security in high stakes cases directly related to the war, the Justices retreat to ensuring the institutional checks of the democratic branches. Since rights-oriented and process-oriented dimensions seem to operate in different domains and at different times, and often suggest different outcomes, the predictive factors that work for cases unrelated to the war fail for cases related to the war. If this conjecture is correct, federal judges should consider giving less weight to legal principles outside of wartime but established during wartime, and attorneys should see it as their responsibility to distinguish cases along these lines.
Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just the relative risks and rates, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information.
Andrew Gelman, Jonathan Katz, and Gary King. 2004. “Empirically Evaluating the Electoral College.” In Rethinking the Vote: The Politics and Prospects of American Electoral Reform, edited by Ann N Crigler, Marion R Just, and Edward J McCaffery, Pp. 75-88. New York: Oxford University Press.Abstract
The 2000 U.S. presidential election rekindled interest in possible electoral reform. While most of the popular and academic accounts focused on balloting irregularities in Florida, such as the now infamous "butterfly" ballot and mishandled absentee ballots, some also noted that this election marked only the fourth time in history that the candidate with a plurality of the popular vote did not also win the Electoral College. This "anti-democratic" outcome has fueled desire for reform or even outright elimination of the electoral college. We show that after appropriate statistical analysis of the available historical electoral data, there is little basis to argue for reforming the Electoral College. We first show that while the Electoral College may once have been biased against the Democrats, the current distribution of voters advantages neither party. Further, the electoral vote will differ from the popular vote only when the average vote shares of the two major candidates are extremely close to 50 percent. As for individual voting power, we show that while there has been much temporal variation in relative voting power over the last several decades, the voting power of individual citizens would not likely increase under a popular vote system of electing the president.
Few would disagree that health policies and programmes ought to be based on valid, timely and relevant information, focused on those aspects of health development that are in greatest need of improvement. For example, vaccination programmes rely heavily on information on cases and deaths to document needs and to monitor progress on childhood illness and mortality. The same strong information basis is necessary for policies on health inequality. The reduction of health inequality is widely accepted as a key goal for societies, but any policy needs reliable research on the extent and causes of health inequality. Given that child deaths still constitute 19% of all deaths globally and 24% of all deaths in developing countries (1), reducing inequalities in child survival is a good beginning.
The between-group component of total health inequality has been studied extensively by numerous scholars. They have expertly analysed the causes of differences in health status and mortality across population subgroups, defined by income, education, race/ethnicity, country, region, social class, and other group identifiers (2–9).
Andrew Gelman and Gary King. 1996. “Advantages of Conflictual Redistricting.” In Fixing the Boundary: Defining and Redefining Single-Member Electoral Districts, edited by Iain McLean and David Butler, Pp. 207–218. Aldershot, England: Dartmouth Publishing Company.Abstract
This article describes the results of an analysis we did of state legislative elections in the United States, where each state is required to redraw the boundaries of its state legislative districts every ten years. In the United States, redistrictings are sometimes controlled by the Democrats, sometimes by the Republicans, and sometimes by bipartisan committees, but never by neutral boundary commissions. Our goal was to study the consequences of redistricting and at the conclusion of this article, we discuss how our findings might be relevant to British elections.
In this chapter, we study standards of racial fairness in legislative redistricting- a field that has been the subject of considerable legislation, jurisprudence, and advocacy, but very little serious academic scholarship. We attempt to elucidate how basic concepts about "color-blind" societies, and similar normative preferences, can generate specific practical standards for racial fairness in representation and redistricting. We also provide the normative and theoretical foundations on which concepts such as proportional representation rest, in order to give existing preferences of many in the literature a firmer analytical foundation.
At one point during the 1988 campaign, Michael Dukakis was ahead in the public opinion polls by 17 percentage points, but he eventually lost the election by 8 percent. Walter Mondale was ahead in the polls by 4 percent during the 1984 campaign but lost the election in a landslide. During June and July of 1992, Clinton, Bush, and Perot each had turns in the public opinion poll lead. What explains all this poll variation? Why do so many citizens change their minds so quickly about presidential choices?
Gary King. 1993. “The Methodology of Presidential Research.” In Researching the Presidency: Vital Questions, New Approaches, edited by George Edwards III, Bert A. Rockman, and John H. Kessel, Pp. 387–412. Pittsburgh: University of Pittsburgh.Abstract
The original purpose of the paper this chapter was based on was to use the Presidency Research Conference’s first-round papers– by John H. Aldrich, Erwin C. Hargrove, Karen M. Hult, Paul Light, and Richard Rose– as my "data." My given task was to analyze the literature ably reviewed by these authors and report what political methodology might have to say about presidency research. I focus in this chapter on the traditional presidency literature, emphasizing research on the president and the office. For the most part, I do not consider research on presidential selection, election, and voting behavior, which has been much more similar to other fields in American politics.