In a major development in data sharing, data providers are beginning to supplement insecure privacy protection strategies, such as "de-identification," with a formal approach called "differential privacy". One version of differential privacy adds specially calibrated random noise to a dataset, which is then released to researchers. This offers mathematical guarantees for the privacy of research subjects while still making it possible to learn about aggregate patterns of interest. Unfortunately, adding random noise creates measurement error, which induces statistical bias -- including attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We offer an easy-to-use, computationally efficient approach that corrects for these biases, can be used as researchers would compute descriptive statistics or linear regression, and gives statistically consistent and approximately unbiased estimates and standard errors. We use as our running example the Full URLs Dataset recently released by Social Science One and Facebook, containing more than 10 trillion cell values.
These methods are implemented in open source software called PrivacyUnbiased.