The mission of the social sciences is to understand and ameliorate society’s greatest challenges. The data held by private companies, collected for different purposes, hold vast potential to further this mission. Yet, because of consumer privacy, trade secrets, proprietary content, and political sensitivities, these datasets are often inaccessible to scholars. We propose a novel organizational model to address these problems. We also report on the first partnership under this model, to study the incendiary issues surrounding the impact of social media on elections and democracy: Facebook provides (privacy-preserving) data access; eight ideologically and substantively diverse charitable foundations provide funding; an organization of academics we created, Social Science One (see SocialScience.One), leads the project; and the Institute for Quantitative Social Science at Harvard and the Social Science Research Council provide logistical help.
Information hierarchies are difficult to express when real-world space or time constraints force traversing the hierarchy in linear presentations, such as in educational books and classroom courses. We present booc.io, which allows linear and non-linear presentation and navigation of educational concepts and material. To support a breadth of material for each concept, booc.io is Web based, which allows adding material such as lecture slides, book chapters, videos, and LTIs. A visual interface assists the creation of the needed hierarchical structures. The goals of our system were formed in expert interviews, and we explain how our design meets these goals. We adapt a real-world course into booc.io, and perform introductory qualitative evaluation with students.
A vast literature demonstrates that voters around the world who benefit from their governments' discretionary spending cast more ballots for the incumbent party than those who do not benefit. But contrary to most theories of political accountability, some suggest that voters also reward incumbent parties for implementing "programmatic" spending legislation, over which incumbents have no discretion, and even when passed with support from all major parties. Why voters would attribute responsibility when none exists is unclear, as is why minority party legislators would approve of legislation that would cost them votes. We study the electoral effects of two large prominent programmatic policies that fit the ideal type especially well, with unusually large scale experiments that bring more evidence to bear on this question than has previously been possible. For the first policy, we design and implement ourselves one of the largest randomized social experiments ever. For the second policy, we reanalyze studies that used a large scale randomized experiment and a natural experiment to study the same question but came to opposite conclusions. Using corrected data and improved statistical methods, we show that the evidence from all analyses of both policies is consistent: programmatic policies have no effect on voter support for incumbents. We conclude by discussing how the many other studies in the literature may be interpreted in light of our results.
The origin, meaning, estimation, and application of the concept of partisan symmetry in legislative redistricting, and the justiciability of partisan gerrymandering. An edited transcript of a talk at the “Redistricting and Representation Forum,” American Academy of Arts & Sciences, Cambridge, MA 11/8/2017.
Here also is a video of the original talk.
To deter gerrymandering, many state constitutions require legislative districts to be "compact." Yet, the law offers few precise definitions other than "you know it when you see it," which effectively implies a common understanding of the concept. In contrast, academics have shown that compactness has multiple dimensions and have generated many conflicting measures. We hypothesize that both are correct -- that compactness is complex and multidimensional, but a common understanding exists across people. We develop a survey to elicit this understanding, with high reliability (in data where the standard paired comparisons approach fails). We create a statistical model that predicts, with high accuracy, solely from the geometric features of the district, compactness evaluations by judges and public officials responsible for redistricting, among others. We also offer compactness data from our validated measure for 20,160 state legislative and congressional districts, as well as software to compute this measure from any district.
Winner of the 2018 Robert H Durr Award from the MPSA.
Researchers who generate data often optimize efficiency and robustness by choosing stratified over simple random sampling designs. Yet, all theories of inference proposed to justify matching methods are based on simple random sampling. This is all the more troubling because, although these theories require exact matching, most matching applications resort to some form of ex post stratification (on a propensity score, distance metric, or the covariates) to find approximate matches, thus nullifying the statistical properties these theories are designed to ensure. Fortunately, the type of sampling used in a theory of inference is an axiom, rather than an assumption vulnerable to being proven wrong, and so we can replace simple with stratified sampling, so long as we can show, as we do here, that the implications of the theory are coherent and remain true. Properties of estimators based on this theory are much easier to understand and can be satisfied without the unattractive properties of existing theories, such as assumptions hidden in data analyses rather than stated up front, asymptotics, unfamiliar estimators, and complex variance calculations. Our theory of inference makes it possible for researchers to treat matching as a simple form of preprocessing to reduce model dependence, after which all the familiar inferential techniques and uncertainty calculations can be applied. This theory also allows binary, multicategory, and continuous treatment variables from the outset and straightforward extensions for imperfect treatment assignment and different versions of treatments.
Ecological inference (EI) is the process of learning about individual behavior from aggregate data. We relax assumptions by allowing for ``linear contextual effects,'' which previous works have regarded as plausible but avoided due to non-identification, a problem we sidestep by deriving bounds instead of point estimates. In this way, we offer a conceptual framework to improve on the Duncan-Davis bound, derived more than sixty-five years ago. To study the effectiveness of our approach, we collect and analyze 8,430 2x2 EI datasets with known ground truth from several sources --- thus bringing considerably more data to bear on the problem than the existing dozen or so datasets available in the literature for evaluating EI estimators. For the 88% of real data sets in our collection that fit a proposed rule, our approach reduces the width of the Duncan-Davis bound, on average, by about 44%, while still capturing the true district level parameter about 99% of the time. The remaining 12% revert to the Duncan-Davis bound.
Easy-to-use software is available that implements all the methods described in the paper.
We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal --- thus increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers replace PSM with one of the other available matching methods, propensity scores have other productive uses.
In this paper, we illustrate the successful implementation of pre-class reading assignments through a social learning platform that allows students to discuss the reading online with their classmates. We show how the platform can be used to understand how students are reading before class. We find that, with this platform, students spend an above average amount of time reading (compared to that reported in the literature) and that most students complete their reading assignments before class. We identify specific reading behaviors that are predictive of in-class exam performance. We also demonstrate ways that the platform promotes active reading strategies and produces high-quality learning interactions between students outside class. Finally, we compare the exam performance of two cohorts of students, where the only difference between them is the use of the platform; we show that students do significantly better on exams when using the platform.
Reprinted in Cassidy, R., Charles, E. S., Slotta, J. D., Lasry, N., eds. (2019). Active Learning: Theoretical Perspectives, Empirical Studies and Design Profiles. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-885-1
- 1 of 16