Unifying Statistical Analysis

Development of a unified approach to statistical modeling, inference, interpretation, presentation, analysis, and software; integrated with most of the other projects listed here.

Unifying Approaches to Statistical Analysis

Zelig: Everyone's Statistical Software
A generalization of Clarify, and much other software, implemented in R. The extensive manual encompasses most of the above works and can be read independently as an introduction to wide range of models. Under active development. Imai, Kosuke, Gary King, and Olivia Lau. 2006. Zelig: Everyone's Statistical Software. Publisher's Version
Making the Most of Statistical Analyses: Improving Interpretation and Presentation
Generalizes the unification in the book (replacing its Section 5.2 with simulation to compute quantities of interest). This paper, which was originally titled "Enough with the Logit Coefficients, Already!", explains how to compute any quantity of interest from almost any statistical model; and shows, with replications of several published works, how to extract considerably more information than standard practices, without changing any data or statistical assumptions. King, Gary, Michael Tomz, and Jason Wittenberg. 2000. Making the Most of Statistical Analyses: Improving Interpretation and Presentation, American Journal of Political Science 44: 341–355. Publisher's VersionAbstract
Social Scientists rarely take full advantage of the information available in their statistical results. As a consequence, they miss opportunities to present quantities that are of greatest substantive interest for their research and express the appropriate degree of certainty about these quantities. In this article, we offer an approach, built on the technique of statistical simulation, to extract the currently overlooked information from any statistical method and to interpret and present it in a reader-friendly manner. Using this technique requires some expertise, which we try to provide herein, but its application should make the results of quantitative articles more informative and transparent. To illustrate our recommendations, we replicate the results of several published works, showing in each case how the authors’ own conclusions can be expressed more sharply and informatively, and, without changing any data or statistical assumptions, how our approach reveals important new information about the research questions at hand. We also offer very easy-to-use Clarify software that implements our suggestions.
<p>How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It</p>
King, Gary, and Margaret E Roberts. In Press.

How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It

, Political Analysis.Abstract
"Robust standard errors'" are used in a vast array of scholarship to correct standard errors for model misspecification. However, when misspecification is bad enough to make classical and robust standard errors diverge, assuming that it is nevertheless not so bad as to bias everything else requires considerable optimism. And even if the optimism is warranted, settling for a misspecified model, with or without robust standard errors, will still bias estimators of all but a few quantities of interest. The resulting cavernous gap between theory and practice suggests that considerable gains in applied statistics may be possible. We seek to help researchers realize these gains via a more productive way to understand and use robust standard errors; a new general and easier-to-use "generalized information matrix test" statistic that can formally assess misspecification (based on differences between robust and classical variance estimates); and practical illustrations via simulations and real examples from published research. How robust standard errors are used needs to change, but instead of jettisoning this popular tool we show how to use it to provide effective clues about model misspecification, likely biases, and a guide to considerably more reliable, and defensible, inferences. Accompanying this article [soon!] is software that implements the methods we describe. 
Toward A Common Framework for Statistical Analysis and Development
A paper that describes the advances underlying Zelig software: Imai, Kosuke, Gary King, and Olivia Lau. 2008. Toward A Common Framework for Statistical Analysis and Development, Journal of Computational Graphics and Statistics 17: 1–22.Abstract
We describe some progress toward a common framework for statistical analysis and software development built on and within the R language, including R’s numerous existing packages. The framework we have developed offers a simple unified structure and syntax that can encompass a large fraction of statistical procedures already implemented in R, without requiring any changes in existing approaches. We conjecture that it can be used to encompass and present simply a vast majority of existing statistical methods, regardless of the theory of inference on which they are based, notation with which they were developed, and programming syntax with which they have been implemented. This development enabled us, and should enable others, to design statistical software with a single, simple, and unified user interface that helps overcome the conflicting notation, syntax, jargon, and statistical methods existing across the methods subfields of numerous academic disciplines. The approach also enables one to build a graphical user interface that automatically includes any method encompassed within the framework. We hope that the result of this line of research will greatly reduce the time from the creation of a new statistical innovation to its widespread use by applied researchers whether or not they use or program in R.
Software that accompanies the above article and implements its key ideas in easy-to-use Stata macros. Tomz, Michael, Jason Wittenberg, and Gary King. 2003. CLARIFY: Software for Interpreting and Presenting Statistical Results, Journal of Statistical Software.Abstract
This is a set of easy-to-use Stata macros that implement the techniques described in Gary King, Michael Tomz, and Jason Wittenberg's "Making the Most of Statistical Analyses: Improving Interpretation and Presentation". To install Clarify, type "net from http://gking.harvard.edu/clarify" at the Stata command line. The documentation [ HTML | PDF ] explains how to do this. We also provide a zip archive for users who want to install Clarify on a computer that is not connected to the internet. Winner of the Okidata Best Research Software Award. Also try -ssc install qsim- to install a wrapper, donated by Fred Wolfe, to automate Clarify's simulation of dummy variables.

Related Materials

What to do When Your Hessian is Not Invertible: Alternatives to Model Respecification in Nonlinear Estimation
Gill, Jeff, and Gary King. 2004. What to do When Your Hessian is Not Invertible: Alternatives to Model Respecification in Nonlinear Estimation, Sociological Methods and Research 32: 54-87.Abstract
What should a researcher do when statistical analysis software terminates before completion with a message that the Hessian is not invertable? The standard textbook advice is to respecify the model, but this is another way of saying that the researcher should change the question being asked. Obviously, however, computer programs should not be in the business of deciding what questions are worthy of study. Although noninvertable Hessians are sometimes signals of poorly posed questions, nonsensical models, or inappropriate estimators, they also frequently occur when information about the quantities of interest exists in the data, through the likelihood function. We explain the problem in some detail and lay out two preliminary proposals for ways of dealing with noninvertable Hessians without changing the question asked.
On Political Methodology
King, Gary. 1991. On Political Methodology, Political Analysis 2: 1–30.Abstract
"Politimetrics" (Gurr 1972), "polimetrics" (Alker 1975), "politometrics" (Hilton 1976), "political arithmetic" (Petty [1672] 1971), "quantitative Political Science (QPS)," "governmetrics," "posopolitics" (Papayanopoulos 1973), "political science statistics (Rai and Blydenburgh 1973), "political statistics" (Rice 1926). These are some of the names that scholars have used to describe the field we now call "political methodology." The history of political methodology has been quite fragmented until recently, as reflected by this patchwork of names. The field has begun to coalesce during the past decade and we are developing persistent organizations, a growing body of scholarly literature, and an emerging consensus about important problems that need to be solved. I make one main point in this article: If political methodology is to play an important role in the future of political science, scholars will need to find ways of representing more interesting political contexts in quantitative analyses. This does not mean that scholars should just build more and more complicated statistical models. Instead, we need to represent more of the essence of political phenomena in our models. The advantage of formal and quantitative approaches is that they are abstract representations of the political world and are, thus, much clearer. We need methods that enable us to abstract the right parts of the phenomenon we are studying and exclude everything superfluous. Despite the fragmented history of quantitative political analysis, a version of this goal has been voiced frequently by both quantitative researchers and their critics (Sec. 2). However, while recognizing this shortcoming, earlier scholars were not in the position to rectify it, lacking the mathematical and statistical tools and, early on, the data. Since political methodologists have made great progress in these and other areas in recent years, I argue that we are now capable of realizing this goal. In section 3, I suggest specific approaches to this problem. Finally, in section 4, I provide two modern examples, ecological inference and models of spatial autocorrelation, to illustrate these points.
The Changing Evidence Base of Social Science Research
King, Gary. 2009. The Changing Evidence Base of Social Science Research, in The Future of Political Science: 100 Perspectives, Gary King, Schlozman, Kay, and Nie, Norman. New York: Routledge Press.Abstract
This (two-page) article argues that the evidence base of political science and the related social sciences are beginning an underappreciated but historic change.
How Not to Lie With Statistics: Avoiding Common Mistakes in Quantitative Political Science
King, Gary. 1986. How Not to Lie With Statistics: Avoiding Common Mistakes in Quantitative Political Science, American Journal of Political Science 30: 666–687.Abstract
This article identifies a set of serious theoretical mistakes appearing with troublingly high frequency throughout the quantitative political science literature. These mistakes are all based on faulty statistical theory or on erroneous statistical analysis. Through algebraic and interpretive proofs, some of the most commonly made mistakes are explicated and illustrated. The theoretical problem underlying each is highlighted, and suggested solutions are provided throughout. It is argued that closer attention to these problems and solutions will result in more reliable quantitative analyses and more useful theoretical contributions.
King, Gary. 1998. MAXLIK.Abstract
A set of Gauss programs and datasets (annotated for pedagogical purposes) to implement many of the maximum likelihood-based models I discuss in Unifying Political Methodology: The Likelihood Theory of Statistical Inference, Ann Arbor: University of Michigan Press, 1998, and use in my class. All datasets are real, not simulated.