Publications by Year: 2019

2019
Cluster Analysis of Participant Responses for Test Generation or Teaching
Gary King, Brian Lukoff, and Eric Mazur. 8/20/2019. “Cluster Analysis of Participant Responses for Test Generation or Teaching.” United States of America US 10,388,177 B2 (Us Patent and Trademark Office).Abstract
Textual responses to open-ended (i.e., free-response) items provided by participants (e.g., by means of mobile wireless devices) are automatically classified, enabling an instructor to assess the responses in a convenient, organized fashion and adjust instruction accordingly.
Patent
Systems and Methods for Keyword Determination and Document Classification from Unstructured Text
Gary King, Margaret Roberts, and Patrick Lam. 4/30/2019. “Systems and Methods for Keyword Determination and Document Classification from Unstructured Text.” United States of America US 10,275,516 B2 (U.S Patent and Trademark Office).Abstract
In various embodiments, documents are searched and retrieved via receipt of a search query, electronically identifying a reference set of relevant documents, providing a search set of documents, creating a database comprising at least  some of the documents of the search set and the reference set , computationally classifying the documents in the database , extracting keywords from the search  set and one or more classified sets , optionally filtering the extracted keywords,  and electronically identifying at least some of the documents from the database that contain one or more of the extracted keywords.
Patent
Participant Grouping for Enhanced Interactive Experience (3rd)
Gary King, Eric Mazur, and Brian Lukoff. 2/26/2019. “Participant Grouping for Enhanced Interactive Experience (3rd).” United States of America US 10,216,827 B2 (U.S Patent and Trademark Office).Abstract
Representative embodiments of a method for grouping participants in an activity include the steps of: (i) defining a grouping policy; (ii) storing, in a database, participant records that include a participant identifier, a characteristic associated with the participant, and/or an identifier for a participant's handheld device; (iii) defining groupings based on the policy and characteristics of the participants relating to the policy and to the activity; and (iv) communicating the groupings to the handheld devices to establish the groups.
Patent
Stimulating Online Discussion in Interactive Learning Environments
Gary King, Eric Mazur, Kelly Miller, and Brian Lukoff. 1/29/2019. “Stimulating Online Discussion in Interactive Learning Environments.” United States of America US 10,192,456 B2 (U.S Patent and Trademark Office).Abstract
In various embodiments, online discussions in connection with an eductional resource are improved by analyzing annotations made by students assigned to a discussion group to identify high-quality annotations likely to generate responses and stimulate discussion threads and by making the identified annotations visibile to students not assigned to the discussion group.
Patent
Ecological Regression with Partial Identification
Wenxin Jiang, Gary King, Allen Schmaltz, and Martin A. Tanner. 2019. “Ecological Regression with Partial Identification.” Political Analysis, Pp. 1--22.Abstract

Ecological inference (EI) is the process of learning about individual behavior from aggregate data. We relax assumptions by allowing for ``linear contextual effects,'' which previous works have regarded as plausible but avoided due to non-identification, a problem we sidestep by deriving bounds instead of point estimates. In this way, we offer a conceptual framework to improve on the Duncan-Davis bound, derived more than sixty-five years ago. To study the effectiveness of our approach, we collect and analyze 8,430  2x2 EI datasets with known ground truth from several sources --- thus bringing considerably more data to bear on the problem than the existing dozen or so datasets available in the literature for evaluating EI estimators. For the 88% of real data sets in our collection that fit a proposed rule, our approach reduces the width of the Duncan-Davis bound, on average, by about 44%, while still capturing the true district level parameter about 99% of the time. The remaining 12% revert to the Duncan-Davis bound. 

Easy-to-use software is available that implements all the methods described in the paper. 

article Online Supplementary Appendix
Indaca
Gary King and Nathaniel Persily. 2019. “A New Model for Industry-Academic Partnerships.” PS: Political Science and Politics. Publisher's VersionAbstract

The mission of the social sciences is to understand and ameliorate society’s greatest challenges. The data held by private companies, collected for different purposes, hold vast potential to further this mission. Yet, because of consumer privacy, trade secrets, proprietary content, and political sensitivities, these datasets are often inaccessible to scholars. We propose a novel organizational model to address these problems. We also report on the first partnership under this model, to study the incendiary issues surrounding the impact of social media on elections and democracy: Facebook provides (privacy-preserving) data access; eight ideologically and substantively diverse charitable foundations provide funding; an organization of academics we created, Social Science One (see SocialScience.One), leads the project; and the Institute for Quantitative Social Science at Harvard and the Social Science Research Council provide logistical help.

Paper
A Theory of Statistical Inference for Matching Methods in Causal Research
Stefano M. Iacus, Gary King, and Giuseppe Porro. 2019. “A Theory of Statistical Inference for Matching Methods in Causal Research.” Political Analysis, 27, 1, Pp. 46-68.Abstract

Researchers who generate data often optimize efficiency and robustness by choosing stratified over simple random sampling designs. Yet, all theories of inference proposed to justify matching methods are based on simple random sampling. This is all the more troubling because, although these theories require exact matching, most matching applications resort to some form of ex post stratification (on a propensity score, distance metric, or the covariates) to find approximate matches, thus nullifying the statistical properties these theories are designed to ensure. Fortunately, the type of sampling used in a theory of inference is an axiom, rather than an assumption vulnerable to being proven wrong, and so we can replace simple with stratified sampling, so long as we can show, as we do here, that the implications of the theory are coherent and remain true. Properties of estimators based on this theory are much easier to understand and can be satisfied without the unattractive properties of existing theories, such as assumptions hidden in data analyses rather than stated up front, asymptotics, unfamiliar estimators, and complex variance calculations. Our theory of inference makes it possible for researchers to treat matching as a simple form of preprocessing to reduce model dependence, after which all the familiar inferential techniques and uncertainty calculations can be applied. This theory also allows binary, multicategory, and continuous treatment variables from the outset and straightforward extensions for imperfect treatment assignment and different versions of treatments.

Paper
Why Propensity Scores Should Not Be Used for Matching
Gary King and Richard Nielsen. 2019. “Why Propensity Scores Should Not Be Used for Matching.” Political Analysis.Abstract

We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal --- thus increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers replace PSM with one of the other available matching methods, propensity scores have other productive uses.

Article Supplementary Appendix