Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset

Paper291 KB
Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset

Abstract:

We offer methods to analyze the "differentially private" Facebook URLs Dataset which, at over 10 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it possible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias -- including attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statistically consistent and approximately unbiased regression estimates and descriptive statistics that can be interpreted as ordinary analyses of non-confidential data but with appropriately larger standard errors.

These methods are implemented in open source software called PrivacyUnbiased.

Last updated on 04/02/2020