Software
Follow links for the software or where to ask questions
Gary King. 2020. “QuickCode”
Georgina Evans and Gary King. 2020. “PrivacyUnbiased”
Gary King and Salil Vadhan. 2020. “OpenDP: Developing Open Source Tools for Differential Privacy”
Connor T. Jerzak, Gary King, and Anton Strezhnev. 2018. “Readme2: An R Package for Improved Automated Nonparametric Content Analysis for Social Science”
Abstract
An R package for estimating category proportions in an unlabeled set of documents given a labeled set, by implementing the method described in Jerzak, King, and Strezhnev (2023). This method is meant to improve on the ideas in Hopkins and King (2010), which introduced a quantification algorithm to estimate category proportions without directly classifying individual observations. This version of the software refines the original method by implementing a technique for selecting optimal textual features in order to minimize the error of the estimated category proportions. Automatic differentiation, stochastic gradient descent, and batch re-normalization are used to carry out the optimization. Other pre-processing functions are available, as well as an interface to the earlier version of the algorithm for comparison. The package also provides users with the ability to extract the generated features for use in other tasks.
Some scholars build models to classify documents into chosen categories. Others, especially social scientists who tend to focus on population characteristics, instead usually estimate the proportion of documents in each category—using either parametric “classify-and-count” methods or “direct” nonparametric estimation of proportions without individual classification. Unfortunately, classify-and-count methods can be highly model dependent or generate more bias in the proportions even as the percent of documents correctly classified increases. Direct estimation avoids these problems, but can suffer when the meaning of language changes between training and test sets or is too similar across categories. The underlying approach includes and optimizes continuous text features, along with a form of matching adapted from the causal inference literature.
Aaron Kaufman, Gary King, and Mayya Komisarchik. 2018. “Compactness: An R Package for Measuring Legislative District Compactness If You Only Know It When You See It”
Abstract
Gary King. 2015. “Perusall”
James Honaker, Gary King, and Matthew Blackwell. 2009. “AMELIA II: A Program for Missing Data”
Abstract
Amelia II is a complete R package for multiple imputation of missing data. The package implements a new expectation-maximization with bootstrapping algorithm that works faster, with larger numbers of variables, and is far easier to use, than various Markov chain Monte Carlo approaches, but gives essentially the same answers. The program also improves imputation models by allowing researchers to put Bayesian priors on individual cell values, thereby including a great deal of potentially valuable and extensive information. It also includes features to accurately impute cross-sectional datasets, individual time series, or sets of time series for different cross-sections. A full set of graphical diagnostics are also available. The program is easy to use, and the simplicity of the algorithm makes it far more robust; both a simple command line and extensive graphical user interface are included.
Gary King. 2009. “OpenScholar”
Stefano Iacus, Gary King, and Giuseppe Porro. 2009. “CEM: Coarsened Exact Matching Software”
Gary King and Ying Lu. 2008. “VA: Verbal Autopsies”
Gary King. 2007. “Dataverse: Open Source Research Data Repository Software”
Gary King, Kosuke Imai, Daniel Ho, and Elizabeth A. Stuart. 2007. “MatchIt: Nonparametric Preprocessing for Parametric Causal Inference”
Michael Tomz, Jason Wittenberg, and Gary King. 2003. “CLARIFY: Software for Interpreting and Presenting Statistical Results”
Abstract
Gary King. 2003. “EI: A Program for Ecological Inference”
Older
Marco Gaboardi, James Honaker, Gary King, Kobbi Nissim, Jonathan Ullman, and Salil Vadhan. 2018. “PSI (Ψ): A Private Data Sharing Interface”
Michail Schwab, Hendrik Strobelt, James Tompkin, Colin Fredericks, Connor Huff, Dana Higgins, Anton Strezhnev, Mayya Komisarchik, Gary King, and Hanspeter Pfister. 2017. “Booc.Io: Software for an Education System With Hierarchical Concept Maps”
Gary King and Margaret Roberts. 2015. “RobustSE”
Abstract
Gary King, Christopher Lucas, and Richard Nielsen. 2014. “MatchingFrontier: R Package for Calculating the Balance-Sample Size Frontier”
Abstract
MatchingFrontier is an easy-to-use R Package for making optimal causal inferences from observational data. Despite their popularity, existing matching approaches leave researchers with two fundamental tensions. First, they are designed to maximize one metric (such as propensity score or Mahalanobis distance) but are judged against another for which they were not designed (such as L1 or differences in means). Second, they lack a principled solution to revealing the implicit bias-variance trade off: matching methods need to optimize with respect to both imbalance (between the treated and control groups) and the number of observations pruned, but existing approaches optimize with respect to only one; users then either ignore the other, or tweak it, usually suboptimally, by hand.
MatchingFrontier resolves both tensions by consolidating previous techniques into a single, optimal, and flexible approach. It calculates the matching solution with maximum balance for each possible sample size (N, N-1, N-2,…). It thus directly calculates the entire balance-sample size frontier, from which the user can easily choose one, several, or all subsamples from which to conduct their final analysis, given their own choice of imbalance metric and quantity of interest. MatchingFrontier solves the obvious joint optimization problem in one run, automatically, without manual tweaking, and without iteration. Although for each subset size k, there exist a huge (N choose k) number of unique subsets, MatchingFrontier includes specially designed fast algorithms that give the optimal answer, usually in a few minutes.
MatchingFrontier has officially been “Qualified for Scientific Use” by the U.S. Food and Drug Administration.
Gary King, Matthew Knowles, and Steven Melendez. 2010. “ReadMe: Software for Automated Content Analysis”
Abstract
Andrew Gelman, Gary King, and Andrew Thomas. 2010. “JudgeIt II: A Program for Evaluating Electoral Systems and Redistricting Plans”
Abstract
Jonathan Wand, Gary King, and Olivia Lau. 2007. “Anchors: Software for Anchoring Vignettes Data”
Abstract
Kosuke Imai, Gary King, and Olivia Lau. 2006. “Zelig: Everyone's Statistical Software”
Heather Stoll, Gary King, and Langchee Zeng. 2005. “WhatIf: Software for Evaluating Counterfactuals”
Abstract
Jeff Gill and Gary King. 2004. “Schnabel/Eskow/Cholesky/Factorization”
Jeff Gill and Gary King. 2004. “Gill/Murray/Cholesky/Factorization”
Frederico Girosi and Gary King. 2004. “YourCast”
Abstract
YourCast is (open source and free) software that makes forecasts by running sets of linear regressions together in a variety of sophisticated ways. YourCast avoids the bias that results when stacking datasets from separate cross-sections and assuming constant parameters, and the inefficiency that results from running independent regressions in each cross-section.
The models enable you to have different covariates, or the same covariates with different meanings, in each cross-section. You may choose from a wide variety of smoothing techniques, such as assuming that the separate time series regressions in neighboring (or “similar”) countries are alike, based on similarities in the coefficients (as in other approaches) or in the values or trends in the expected value of the dependent variable. The model works with time-series-cross-sectional data but also data for which the time series varies over more than one cross-section such as log-mortality over time by age, country, sex, and cause. YourCast implements the methods introduced in Federico Girosi and Gary King’s book manuscript on Demographic Forecasting, Princeton University Press.
Gary King and Kenneth Benoit. 2003. “EzI: A(n Easy) Program for Ecological Inference”
Abstract
Michael Tomz, Gary King, and Langche Zeng. 2003. “ReLogit: Rare Events Logistic Regression”
Gary King. 2002. “COUNT: A Program for Estimating Event Count and Duration Regressions”
Abstract
This software is no longer being actively updated. Previous versions and information about the software are archived here.
A stand-alone, easy-to-use program for running event count and duration regression models, developed by and/or discussed in a series of journal articles by Gary King. (Event count models have a dependent variable measured as the number of times something happens, such as the number of uncontested seats per state or the number of wars per year. Duration models explain dependent variables measured as the time until some event, such as the number of months a parliamentary cabinet endures.) Winner of the APSA Research Software Award.
Micah Altman, Leonid Andreev, Mark Diggory, Gary King, Daniel L. Kiskis, Elizabeth Kolster, Michael Krot, and Sidney Verba. 2001. “Virtual Data Center”
Abstract
James Honaker, Anne Joseph, Gary King, Kenneth Scheve, and Naunihal Singh. 1998. “AMELIA: A Program for Missing Data”
Gary King. 1998. “MAXLIK”
Abstract
This software is no longer being actively updated. Previous versions and information about the software are archived here.
A set of Gauss programs and datasets (annotated for pedagogical purposes) to implement many of the maximum likelihood-based statistical models discussed in Gary King’s book Unifying Political Methodology: The Likelihood Theory of Statistical Inference (University of Michigan Press). All datasets are real, not simulated.
Andrew Gelman and Gary King. 1992. “JudgeIt I: A Program for Evaluating Electoral Systems and Redistricting Plans”
Abstract
Gary King. 1982. “PC$: Checkbook Manager”