A vast literature demonstrates that voters around the world who benefit from their governments' discretionary spending cast more ballots for the incumbent party than those who do not benefit. But contrary to most theories of political accountability, some evidence suggests that voters also reward incumbent parties for implementing "programmatic" spending legislation, over which incumbents have no discretion, and even when passed with support from all major parties. Why voters would attribute responsibility when none exists is unclear, as is why minority party legislators would approve of legislation that will cost them votes. We study the electoral effects of two prominent programmatic policies that fit the ideal type unusually well. For the first, we implement one of the largest randomized social experiments ever, and find that its programmatic policies do not increase voter support for incumbents. For the second, we reanalyze the study cited as claiming the strongest support for the electoral effects of programmatic policies, which is also a very large randomized experiment. We show that its key results vanish after correcting either a simple coding error affecting only two observations or highly unconventional data analysis procedures (or both). Our results may differ from those of prior research because we were able to marshal large scale experiments rather than observational studies or because we analyze relatively pure forms of programmatic policies rather than mixtures of programmatic and clientelistic policies. However, we conjecture that the primary explanation is the differing nature of the politics for which these policies are passed and implemented.
Ecological inference is the process of learning about individual behavior from aggregate data. We study a partially identified linear contextual effects model for ecological inference and describe how to estimate the district level parameter averaging over many precincts in the presence of the non-identified parameter of the contextual effect. This may be regarded as a first attempt in this venerable literature to limit the scope of the key form of non-identifiability in ecological inference. To study the operating characteristics of our methodology, we have amassed the largest collection of data with known ground truth ever applied to evaluate solutions to the ecological inference problem. We collect and study 459 datasets from a variety of fields including public health, political science and sociology. The datasets contain a total of 2,370,854 geographic units (e.g., precincts), with an average of 5,165 geographic units per dataset. Our replication data are publicly available via the Harvard Dataverse (Jiang et al. 2018) and may serve as a useful resource for future researchers. For all real data sets in our collection that fit our proposed rules, our methodology reduces the width of the Duncan and Davis (1953) deterministic bound, on average, by about 45%, while still capturing the true district level parameter in excess of 97% of the time.
Universities require faculty and students planning research involving human subjects to pass formal certification tests and then submit research plans for prior approval. Those who diligently take the tests may better understand certain important legal requirements but, at the same time, are often misled into thinking they can apply these rules to their own work which, in fact, they are not permitted to do. They will also be missing many other legal requirements not mentioned in their training but which govern their behaviors. Finally, the training leaves them likely to completely misunderstand the essentially political situation they find themselves in. The resulting risks to their universities, collaborators, and careers may be catastrophic, in addition to contributing to the more common ordinary frustrations of researchers with the system. To avoid these problems, faculty and students conducting research about and for the public need to understand that they are public figures, to whom different rules apply, ones that political scientists have long studied. University administrators (and faculty in their part-time roles as administrators) need to reorient their perspectives as well. University research compliance bureaucracies have grown, in well-meaning but sometimes unproductive ways that are not required by federal laws or guidelines. We offer advice to faculty and students for how to deal with the system as it exists now, and suggestions for changes in university research compliance bureaucracies, that should benefit faculty, students, staff, university budgets, and our research subjects.
To prevent gerrymandering, and to impose a specific form of democratic representation, many state constitutions and judicial opinions require US legislative districts to be "compact." Yet, the law offers few precise definitions other than "you know it when you see it," which effectively implies a common understanding of the concept. In contrast, academics have shown that the concept has multiple theoretical dimensions and have generated large numbers of conflicting empirical measures. This has proved extremely challenging for courts tasked with adjudicating compactness. We hypothesize that both are correct --- that compactness is complex and multidimensional, but a common understanding exists across people. We develop a survey design to elicit this understanding, without bias in favor of one's own political views, and with high levels of reliability (in data where the standard paired comparisons approach fails). We then create a statistical model that predicts, with high accuracy and solely from the geometric features of the district, compactness evaluations by 96 judges, justices, and public officials responsible for redistricting (and 102 redistricting consultants, expert witnesses, law professors, law students, graduate students, undergraduates, and Mechanical Turk workers). We also offer data on compactness from our validated measure for 18,215 state legislative and congressional districts, as well as software to compute this measure from any district. We then discuss what may be the wider applicability of our general methodological approach to measuring important concepts that you only know when you see.
Computer scientists and statisticians are often interested in classifying textual documents into chosen categories. Social scientists and others are often less interested in any one document and instead try to estimate the proportion falling in each category. The two existing types of techniques for estimating these category proportions are parametric "classify and count" methods and "direct" nonparametric estimation of category proportions without an individual classification step. Unfortunately, classify and count methods can sometimes be highly model dependent or generate more bias in the proportions even as the percent correctly classified increases. Direct estimation avoids these problems, but can suffer when the meaning and usage of language is too similar across categories or too different between training and test sets. We develop an improved direct estimation approach without these problems by introducing continuously valued text features optimized for this problem, along with a form of matching adapted from the causal inference literature. We evaluate our approach in analyses of a diverse collection of 73 data sets, showing that it substantially improves performance compared to existing approaches. As a companion to this paper, we offer easy-to-use software that implements all ideas discussed herein.
The mission of the academic social sciences is to understand and ameliorate society’s greatest challenges. The data held by private companies holds vast potential to further this mission. Yet, because of its interaction with highly politicized issues, customer privacy, proprietary content, and differing goals of firms and academics, these data are often inaccessible to university researchers. We propose here a new model for industry-academic partnerships that addresses these problems via a novel organizational structure: Respected scholars form a commission which, as a trusted third party, receives access to all relevant firm information and systems, and then recruits independent academics to do research in specific areas following standard peer review protocols organized and funded by nonprofit foundations. We also report on a partnership we helped forge under this model to make data available about the extremely visible and highly politicized issues surrounding the impact of social media on elections and democracy. In our partnership, Facebook will provide privacy-preserving data and access; seven major politically and substantively diverse nonprofit foundations will fund the research; and the Social Science Research Council will oversee the peer review process for funding and data access.
We provide an overview of PSI ("a Private data Sharing Interface"), a system we are developing to enable researchers in the social sciences and other fields to share and explore privacy-sensitive datasets with the strong privacy protections of differential privacy.
Inference is the process of using facts we know to learn about facts we do not know. A theory of inference gives assumptions necessary to get from the former to the latter, along with a definition for and summary of the resulting uncertainty. Any one theory of inference is neither right nor wrong, but merely an axiom that may or may not be useful. Each of the many diverse theories of inference can be valuable for certain applications. However, no existing theory of inference addresses the tendency to choose, from the range of plausible data analysis specifications consistent with prior evidence, those that inadvertently favor one's own hypotheses. Since the biases from these choices are a growing concern across scientific fields, and in a sense the reason the scientific community was invented in the first place, we introduce a new theory of inference designed to address this critical problem. We derive "hacking intervals," which are the range of a summary statistic one may obtain given a class of possible endogenous manipulations of the data. Hacking intervals require no appeal to hypothetical data sets drawn from imaginary superpopulations. A scientific result with a small hacking interval is more robust to researcher manipulation than one with a larger interval, and is often easier to interpret than a classical confidence interval. Some versions of hacking intervals turn out to be equivalent to classical confidence intervals, which means they may also provide a more intuitive and potentially more useful interpretation of classical confidence intervals.
We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal -- increasing imbalance, inefficiency, model dependence, and bias. PSM supposedly makes it easier to find matches by projecting a large number of covariates to a scalar propensity score and applying a single model to produce an unbiased estimate. However, in observational analysis the data generation process is rarely known and so users typically try many models before choosing one to present. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest that researchers replace PSM with one of the other available methods when performing matching, propensity scores have many other productive uses.
Last year was difficult for Google Flu Trends (GFT). In early 2013, Nature reported that GFT was estimating more than double the percentage of doctor visits for influenza like illness than the Centers for Disease Control and Prevention s (CDC) sentinel reports during the 2012 2013 flu season (1). Given that GFT was designed to forecast upcoming CDC reports, this was a problematic finding. In March 2014, our report in Science found that the overestimation problem in GFT was also present in the 2011 2012 flu season (2). The report also found strong evidence of autocorrelation and seasonality in the GFT errors, and presented evidence that the issues were likely, at least in part, due to modifications made by Google s search algorithm and the decision by GFT engineers not to use previous CDC reports or seasonality estimates in their models what the article labeled algorithm dynamics and big data hubris respectively. Moreover, the report and the supporting online materials detailed how difficult/impossible it is to replicate the GFT results, undermining independent efforts to explore the source of GFT errors and formulate improvements.
Matching is an increasingly popular method of causal inference in observational data, but following methodological best practices has proven difficult for applied researchers. We address this problem by providing a simple graphical approach for choosing among the numerous possible matching solutions generated by three methods: the venerable ``Mahalanobis Distance Matching'' (MDM), the commonly used ``Propensity Score Matching'' (PSM), and a newer approach called ``Coarsened Exact Matching'' (CEM). In the process of using our approach, we also discover that PSM often approximates random matching, both in many real applications and in data simulated by the processes that fit PSM theory. Moreover, contrary to conventional wisdom, random matching is not benign: it (and thus PSM) can often degrade inferences relative to not matching at all. We find that MDM and CEM do not have this problem, and in practice CEM usually outperforms the other two approaches. However, with our comparative graphical approach and easy-to-follow procedures, focus can be on choosing a matching solution for a particular application, which is what may improve inferences, rather than the particular method used to generate it.
We highlight, and suggest ways to avoid, a large number of common misunderstandings in the literature about best practices in qualitative research. We discuss these issues in four areas: theory and data, qualitative and quantitative strategies, causation and explanation, and selection bias. Some of the misunderstandings involve incendiary debates within our discipline that are readily resolved either directly or with results known in research areas that happen to be unknown to political scientists. Many of these misunderstandings can also be found in quantitative research, often with different names, and some of which can be fixed with reference to ideas better understood in the qualitative methods literature. Our goal is to improve the ability of quantitatively and qualitatively oriented scholars to enjoy the advantages of insights from both areas. Thus, throughout, we attempt to construct specific practical guidelines that can be used to improve actual qualitative research designs, not only the qualitative methods literatures that talk about them.