Survey researchers have long sought to protect the privacy of their respondents via de-identification (removing names and other directly identifying information) before sharing data. Although these procedures can help, recent research demonstrates that they fail to protect respondents from intentional re-identification attacks, a problem that threatens to undermine vast survey enterprises in academia, government, and industry. This is especially a problem in political science because political beliefs are not merely the subject of our scholarship; they are key information respondents seek to keep private. We confirm the problem in practice by easily re-identifying a sensitive survey about a recent abortion referendum. We build on the concept of "differential privacy" to offer new survey research data sharing procedures with mathematical guarantees for protecting respondent privacy and statistical validity guarantees for social scientists analyzing differentially private data. The cost of these new procedures is larger standard errors, which can be overcome with somewhat larger sample sizes.
When word processors were first introduced into the workplace, they turned scholars into typists. But they also improved our work: Turnaround time for new drafts dropped from days to seconds. Rewriting became easier and more common, and our papers, educational efforts, and research output improved. I discuss the advantages of and mechanisms for doing the same with do-it-yourself video recordings of research talks and class lectures, so that they may become a fully respected channel for scholarly output and education, alongside books and articles. I consider innovations in video design to optimize education and communication, along with technology to make this change possible.
Purpose: The COVID-19 death-rate in Italy continues to climb, surpassing that in every other country. We implement one of the first nationally representative surveys about this unprecedented public health crisis and use it to evaluate the Italian government’ public health efforts and citizen responses. Findings: (1) Public health messaging is being heard. Except for slightly lower compliance among young adults, all subgroups we studied understand how to keep themselves and others safe from the SARS-Cov-2 virus. Remarkably, even those who do not trust the government, or think the government has been untruthful about the crisis believe the messaging and claim to be acting in accordance. (2) The quarantine is beginning to have serious negative effects on the population’s mental health. Policy Recommendations: Communications focus should move from explaining to citizens that they should stay at home to what they can do there. We need interventions that make staying at home and following public health protocols more desirable. These interventions could include virtual social interactions, such as online social reading activities, classes, exercise routines, etc. — all designed to reduce the boredom of long term social isolation and to increase the attractiveness of following public health recommendations. Interventions like these will grow in importance as the crisis wears on around the world, and staying inside wears on people.
Universities require faculty and students planning research involving human subjects to pass formal certification tests and then submit research plans for prior approval. Those who diligently take the tests may better understand certain important legal requirements but, at the same time, are often misled into thinking they can apply these rules to their own work which, in fact, they are not permitted to do. They will also be missing many other legal requirements not mentioned in their training but which govern their behaviors. Finally, the training leaves them likely to completely misunderstand the essentially political situation they find themselves in. The resulting risks to their universities, collaborators, and careers may be catastrophic, in addition to contributing to the more common ordinary frustrations of researchers with the system. To avoid these problems, faculty and students conducting research about and for the public need to understand that they are public figures, to whom different rules apply, ones that political scientists have long studied. University administrators (and faculty in their part-time roles as administrators) need to reorient their perspectives as well. University research compliance bureaucracies have grown, in well-meaning but sometimes unproductive ways that are not required by federal laws or guidelines. We offer advice to faculty and students for how to deal with the system as it exists now, and suggestions for changes in university research compliance bureaucracies, that should benefit faculty, students, staff, university budgets, and our research subjects.
We provide an overview of PSI ("a Private data Sharing Interface"), a system we are developing to enable researchers in the social sciences and other fields to share and explore privacy-sensitive datasets with the strong privacy protections of differential privacy.
We offer methods to analyze the "differentially private" Facebook URLs Dataset which, at over 10 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it possible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias -- including attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statistically consistent and approximately unbiased linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of non-confidential data but with appropriately larger standard errors.
We have implemented these methods in open source software for R called PrivacyUnbiased. Facebook has ported PrivacyUnbiased to open source Python code called svinfer.
Unprecedented quantities of data that could help social scientists understand and ameliorate the challenges of human society are presently locked away inside companies, governments, and other organizations, in part because of worries about privacy violations. We address this problem with a general-purpose data access and analysis system with mathematical guarantees of privacy for individuals who may be represented in the data and statistical validity guarantees for researchers seeking population-level insights from it. We build on the standard of "differential privacy" but, unlike most such approaches, we also correct for the serious statistical biases induced by privacy-preserving procedures, provide a proper accounting for statistical uncertainty, and impose minimal constraints on the choice of data analytic methods and types of quantities estimated. Our algorithm is easy to implement, simple to use, and computationally efficient; we also offer open source software to illustrate all our methods.
Last year was difficult for Google Flu Trends (GFT). In early 2013, Nature reported that GFT was estimating more than double the percentage of doctor visits for influenza like illness than the Centers for Disease Control and Prevention s (CDC) sentinel reports during the 2012 2013 flu season (1). Given that GFT was designed to forecast upcoming CDC reports, this was a problematic finding. In March 2014, our report in Science found that the overestimation problem in GFT was also present in the 2011 2012 flu season (2). The report also found strong evidence of autocorrelation and seasonality in the GFT errors, and presented evidence that the issues were likely, at least in part, due to modifications made by Google s search algorithm and the decision by GFT engineers not to use previous CDC reports or seasonality estimates in their models what the article labeled algorithm dynamics and big data hubris respectively. Moreover, the report and the supporting online materials detailed how difficult/impossible it is to replicate the GFT results, undermining independent efforts to explore the source of GFT errors and formulate improvements.
Matching is an increasingly popular method of causal inference in observational data, but following methodological best practices has proven difficult for applied researchers. We address this problem by providing a simple graphical approach for choosing among the numerous possible matching solutions generated by three methods: the venerable ``Mahalanobis Distance Matching'' (MDM), the commonly used ``Propensity Score Matching'' (PSM), and a newer approach called ``Coarsened Exact Matching'' (CEM). In the process of using our approach, we also discover that PSM often approximates random matching, both in many real applications and in data simulated by the processes that fit PSM theory. Moreover, contrary to conventional wisdom, random matching is not benign: it (and thus PSM) can often degrade inferences relative to not matching at all. We find that MDM and CEM do not have this problem, and in practice CEM usually outperforms the other two approaches. However, with our comparative graphical approach and easy-to-follow procedures, focus can be on choosing a matching solution for a particular application, which is what may improve inferences, rather than the particular method used to generate it.
We highlight, and suggest ways to avoid, a large number of common misunderstandings in the literature about best practices in qualitative research. We discuss these issues in four areas: theory and data, qualitative and quantitative strategies, causation and explanation, and selection bias. Some of the misunderstandings involve incendiary debates within our discipline that are readily resolved either directly or with results known in research areas that happen to be unknown to political scientists. Many of these misunderstandings can also be found in quantitative research, often with different names, and some of which can be fixed with reference to ideas better understood in the qualitative methods literature. Our goal is to improve the ability of quantitatively and qualitatively oriented scholars to enjoy the advantages of insights from both areas. Thus, throughout, we attempt to construct specific practical guidelines that can be used to improve actual qualitative research designs, not only the qualitative methods literatures that talk about them.