Skip to main content
Harvard University HARVARD.EDU

Software

Follow links for the software or where to ask questions

34 software citations

Gary King. 2020. “QuickCode

Georgina Evans and Gary King. 2020. “PrivacyUnbiased

Connor T. Jerzak, Gary King, and Anton Strezhnev. 2018. “Readme2: An R Package for Improved Automated Nonparametric Content Analysis for Social Science

+ Abstract

An R package for estimating category proportions in an unlabeled set of documents given a labeled set, by implementing the method described in Jerzak, King, and Strezhnev (2023). This method is meant to improve on the ideas in Hopkins and King (2010), which introduced a quantification algorithm to estimate category proportions without directly classifying individual observations. This version of the software refines the original method by implementing a technique for selecting optimal textual features in order to minimize the error of the estimated category proportions. Automatic differentiation, stochastic gradient descent, and batch re-normalization are used to carry out the optimization. Other pre-processing functions are available, as well as an interface to the earlier version of the algorithm for comparison. The package also provides users with the ability to extract the generated features for use in other tasks.

Some scholars build models to classify documents into chosen categories. Others, especially social scientists who tend to focus on population characteristics, instead usually estimate the proportion of documents in each category—using either parametric “classify-and-count” methods or “direct” nonparametric estimation of proportions without individual classification. Unfortunately, classify-and-count methods can be highly model dependent or generate more bias in the proportions even as the percent of documents correctly classified increases. Direct estimation avoids these problems, but can suffer when the meaning of language changes between training and test sets or is too similar across categories. The underlying approach includes and optimizes continuous text features, along with a form of matching adapted from the causal inference literature.

Aaron Kaufman, Gary King, and Mayya Komisarchik. 2018. “Compactness: An R Package for Measuring Legislative District Compactness If You Only Know It When You See It

+ Abstract
This software implements the methods in Kaufman, King, and Komisarchik, “How to Measure Legislative District Compactness If You Only Know It When You See It,” American Journal of Political Science. To deter gerrymandering, many U.S. state constitutions require legislative districts to be geographically “compact” (and a similar requirement holds explicitly or implicitly for numerous political jurisdictions around the world). Yet, the law offers few precise definitions other than “you know it when you see it,” which effectively implies a common understanding of the concept. In contrast, academics have shown that compactness has multiple dimensions and have generated many conflicting measures. The authors hypothesize that both are correct—that compactness is complex and multidimensional, but a single common understanding exists across people. They develop a survey to elicit this understanding, with high reliability (in data where the standard paired comparisons approach fails). They then create a statistical model that predicts, with high accuracy, solely from the geometric features of the district, compactness evaluations by judges and public officials responsible for redistricting, among many others. The project also offers compactness data from a validated measure for many state legislative and congressional districts, and software to compute this measure from any district.

Gary King. 2015. “Perusall

James Honaker, Gary King, and Matthew Blackwell. 2009. “AMELIA II: A Program for Missing Data

+ Abstract

Amelia II is a complete R package for multiple imputation of missing data. The package implements a new expectation-maximization with bootstrapping algorithm that works faster, with larger numbers of variables, and is far easier to use, than various Markov chain Monte Carlo approaches, but gives essentially the same answers. The program also improves imputation models by allowing researchers to put Bayesian priors on individual cell values, thereby including a great deal of potentially valuable and extensive information. It also includes features to accurately impute cross-sectional datasets, individual time series, or sets of time series for different cross-sections. A full set of graphical diagnostics are also available. The program is easy to use, and the simplicity of the algorithm makes it far more robust; both a simple command line and extensive graphical user interface are included.

Amelia II software web site

Gary King. 2009. “OpenScholar

Stefano Iacus, Gary King, and Giuseppe Porro. 2009. “CEM: Coarsened Exact Matching Software

Gary King and Ying Lu. 2008. “VA: Verbal Autopsies

Gary King, Kosuke Imai, Daniel Ho, and Elizabeth A. Stuart. 2007. “MatchIt: Nonparametric Preprocessing for Parametric Causal Inference

Michael Tomz, Jason Wittenberg, and Gary King. 2003. “CLARIFY: Software for Interpreting and Presenting Statistical Results

+ Abstract
This is a set of easy-to-use tools that implement the techniques described in Gary King, Michael Tomz, and Jason Wittenberg’s “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” Winner of the Okidata Best Research Software Award from the American Political Science Association. These tools use Monte Carlo simulations to compute interpretable quantities from regression models and perform inference on them. For Stata, see the Journal of Statistical Software article (doi:10.18637/jss.v008.i01); for current R implementations, see https://iqss.github.io/clarify

Older

Marco Gaboardi, James Honaker, Gary King, Kobbi Nissim, Jonathan Ullman, and Salil Vadhan. 2018. “PSI (Ψ): A Private Data Sharing Interface

Michail Schwab, Hendrik Strobelt, James Tompkin, Colin Fredericks, Connor Huff, Dana Higgins, Anton Strezhnev, Mayya Komisarchik, Gary King, and Hanspeter Pfister. 2017. “Booc.Io: Software for an Education System With Hierarchical Concept Maps

Gary King and Margaret Roberts. 2015. “RobustSE

+ Abstract
The RobustSE R package implements the generalized information matrix (GIM) test to detect model misspecification described in King and Roberts (2015). “Robust standard errors” are used in a vast array of scholarship to correct standard errors for model misspecification. However, when misspecification is bad enough to make classical and robust standard errors diverge, assuming that it is nevertheless not so bad as to bias everything else requires considerable optimism. And even if the optimism is warranted, settling for a misspecified model, with or without robust standard errors, will still bias estimators of all but a few quantities of interest. The accompanying article shows how to use robust standard errors as diagnostic tools via the GIM statistic (based on differences between robust and classical variance estimates), with practical illustrations via simulations and real examples. Open source software is available at https://github.com/IQSS/RobustSE and implements the test for linear, Poisson, and negative binomial regressions.

Gary King, Christopher Lucas, and Richard Nielsen. 2014. “MatchingFrontier: R Package for Calculating the Balance-Sample Size Frontier

+ Abstract

MatchingFrontier is an easy-to-use R Package for making optimal causal inferences from observational data. Despite their popularity, existing matching approaches leave researchers with two fundamental tensions. First, they are designed to maximize one metric (such as propensity score or Mahalanobis distance) but are judged against another for which they were not designed (such as L1 or differences in means). Second, they lack a principled solution to revealing the implicit bias-variance trade off: matching methods need to optimize with respect to both imbalance (between the treated and control groups) and the number of observations pruned, but existing approaches optimize with respect to only one; users then either ignore the other, or tweak it, usually suboptimally, by hand.

MatchingFrontier resolves both tensions by consolidating previous techniques into a single, optimal, and flexible approach. It calculates the matching solution with maximum balance for each possible sample size (N, N-1, N-2,…). It thus directly calculates the entire balance-sample size frontier, from which the user can easily choose one, several, or all subsamples from which to conduct their final analysis, given their own choice of imbalance metric and quantity of interest. MatchingFrontier solves the obvious joint optimization problem in one run, automatically, without manual tweaking, and without iteration. Although for each subset size k, there exist a huge (N choose k) number of unique subsets, MatchingFrontier includes specially designed fast algorithms that give the optimal answer, usually in a few minutes.

MatchingFrontier has officially been “Qualified for Scientific Use” by the U.S. Food and Drug Administration.

Gary King, Matthew Knowles, and Steven Melendez. 2010. “ReadMe: Software for Automated Content Analysis

+ Abstract
This program will read and analyze a large set of text documents and report on the proportion of documents in each of a set of given categories.

Andrew Gelman, Gary King, and Andrew Thomas. 2010. “JudgeIt II: A Program for Evaluating Electoral Systems and Redistricting Plans

+ Abstract
A program for analyzing most any feature of district-level legislative elections data, including prediction, evaluating redistricting plans, estimating counterfactual hypotheses (such as what would happen if a term-limitation amendment were imposed), and others. This implements statistical procedures described in a series of journal articles and has been used during redistricting in many states by judges, partisans, governments, private citizens, and many others. Winner of the APSA Research Software Award.

Jonathan Wand, Gary King, and Olivia Lau. 2007. “Anchors: Software for Anchoring Vignettes Data

+ Abstract
When respondents use the ordinal response categories of standard survey questions in different ways, the validity of analyses based on the resulting data can be biased. Anchoring vignettes is a survey design technique intended to correct for some of these problems. The anchors package in R includes methods for evaluating and choosing anchoring vignettes, and for analyzing the resulting data.

Kosuke Imai, Gary King, and Olivia Lau. 2006. “Zelig: Everyone's Statistical Software

Heather Stoll, Gary King, and Langchee Zeng. 2005. “WhatIf: Software for Evaluating Counterfactuals

+ Abstract
This article describes WhatIf: Software for Evaluating Counterfactuals, an R package that implements the methods for evaluating counterfactuals introduced in King and Zeng (2006a) and King and Zeng (2006b). It offers easy-to-use techniques for assessing a counterfactual’s model dependence without having to conduct sensitivity testing over specified classes of models. These same methods can be used to approximate the common support of the treatment and control groups in causal inference.

Jeff Gill and Gary King. 2004. “Schnabel/Eskow/Cholesky/Factorization

Jeff Gill and Gary King. 2004. “Gill/Murray/Cholesky/Factorization

Frederico Girosi and Gary King. 2004. “YourCast

+ Abstract

YourCast is (open source and free) software that makes forecasts by running sets of linear regressions together in a variety of sophisticated ways. YourCast avoids the bias that results when stacking datasets from separate cross-sections and assuming constant parameters, and the inefficiency that results from running independent regressions in each cross-section.

The models enable you to have different covariates, or the same covariates with different meanings, in each cross-section. You may choose from a wide variety of smoothing techniques, such as assuming that the separate time series regressions in neighboring (or “similar”) countries are alike, based on similarities in the coefficients (as in other approaches) or in the values or trends in the expected value of the dependent variable. The model works with time-series-cross-sectional data but also data for which the time series varies over more than one cross-section such as log-mortality over time by age, country, sex, and cause. YourCast implements the methods introduced in Federico Girosi and Gary King’s book manuscript on Demographic Forecasting, Princeton University Press.

Gary King and Kenneth Benoit. 2003. “EzI: A(n Easy) Program for Ecological Inference

+ Abstract
This software is no longer being actively updated. Previous versions and information about the software are archived here.

Michael Tomz, Gary King, and Langche Zeng. 2003. “ReLogit: Rare Events Logistic Regression

Gary King. 2002. “COUNT: A Program for Estimating Event Count and Duration Regressions

+ Abstract

This software is no longer being actively updated. Previous versions and information about the software are archived here.

A stand-alone, easy-to-use program for running event count and duration regression models, developed by and/or discussed in a series of journal articles by Gary King. (Event count models have a dependent variable measured as the number of times something happens, such as the number of uncontested seats per state or the number of wars per year. Duration models explain dependent variables measured as the time until some event, such as the number of months a parliamentary cabinet endures.) Winner of the APSA Research Software Award.

Micah Altman, Leonid Andreev, Mark Diggory, Gary King, Daniel L. Kiskis, Elizabeth Kolster, Michael Krot, and Sidney Verba. 2001. “Virtual Data Center

+ Abstract
Software is now superseded by Dataverse. In this paper, we present an overview of the Virtual Data Center (VDC) software, an open-source digital library system for the management and dissemination of distributed collections of quantitative data. (See Dataverse.) The VDC functionality provides everything necessary to maintain and disseminate an individual collection of research studies, including facilities for the storage, archiving, cataloging, translation, and on-line analysis of a particular collection. Moreover, the system provides extensive support for distributed and federated collections including: location-independent naming of objects, distributed authentication and access control, federated metadata harvesting, remote repository caching, and distributed “virtual” collections of remote objects.

James Honaker, Anne Joseph, Gary King, Kenneth Scheve, and Naunihal Singh. 1998. “AMELIA: A Program for Missing Data

Gary King. 1998. “MAXLIK

+ Abstract

This software is no longer being actively updated. Previous versions and information about the software are archived here.

A set of Gauss programs and datasets (annotated for pedagogical purposes) to implement many of the maximum likelihood-based statistical models discussed in Gary King’s book Unifying Political Methodology: The Likelihood Theory of Statistical Inference (University of Michigan Press). All datasets are real, not simulated.

Andrew Gelman and Gary King. 1992. “JudgeIt I: A Program for Evaluating Electoral Systems and Redistricting Plans

+ Abstract
A program for analyzing almost any feature of district-level legislative elections data, including prediction, evaluating redistricting plans, estimating counterfactual hypotheses (such as what would happen if a term-limitation amendment were imposed), and others. This implements statistical procedures described in a series of journal articles and has been used during redistricting in many states by judges, partisans, governments, private citizens, and many others. Winner of the APSA Research Software Award.

Gary King. 1982. “PC$: Checkbook Manager

+ Abstract
A BASIC Checkbook managing software. No longer available.