Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset

Georgina Evans; Gary King

Citation:

Georgina Evans and Gary King. 2023. “Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset.” Political Analysis, 31, 1, Pp. 1-21. Publisher's Version Copy at https://tinyurl.com/yc5mx3sw

Download

Article

1.43 MB

Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset

Abstract:

We offer methods to analyze the "differentially private" Facebook URLs Dataset which, at over 40 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it possible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias -- including attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statistically valid linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of non-confidential data but with appropriately larger standard errors.

We have implemented these methods in open source software for R called PrivacyUnbiased. Facebook has ported PrivacyUnbiased to open source Python code called svinfer. We have extended these results in Evans and King (2021).

DOI: DOI:10.1017/pan.2022.1

Last updated on 05/06/2023

Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset

Citation:

Abstract:

Related Works

Publications By Type

Publications By Year

c05432e852e7f3fbb2c56fc04411b732