A High-level Introduction to Differential Privacy

As stated in “Differential Privacy: A Survey of Results,” differential privacy can be ensured for any query sequence by running a differentially private algorithm with the Laplacian noise distribution on each query..Another point to add that is mentioned in the survey is that sensitivity and privacy loss (which are the parameters needed for the addition of noise) are independent from the size of the database and the database itself; a larger database leads to a good amount of accuracy for a differentially private algorithm on common queries..Further discussion of this point in addition to details on how to maximize DP’s potential on insensitive queries and the mathematical proofs/theorems to yield differential privacy can be found in the survey.This section can be summarized with this theorem:Taken from Differential Privacy: A Survey of ResultsHow is this being used?The U.S..Census Bureau implemented DP in their OnTheMap application in 2008 to ensure privacy for residential population data..Read more details in this post: “Protecting the Confidentiality of America’s Statistics: Adopting Modern Disclosure Avoidance Methods at the Census Bureau“ written by Dr..John M..Abowd.Apple describes three use-cases of leveraging DP in their article “Learning with Privacy at Scale” in the Apple Machine Learning Journal: discovering popular emojis, identifying resource-intensive websites accessed using Safari, and discovering the use of new words..A novel aspect of Apple’s implementation is their use of three differentially private algorithms — Private Count Mean Sketch algorithm, Private Hadamard Count Mean Sketch, and Private Sequence Fragment Puzzle.Taken from the article “Learning with Privacy at Scale” from Apple’s Machine Learning JournalMicrosoft applied DP to mask the location of individuals in their geolocation databases; the approach involved randomly removing, adding, and shuffling individual data points..A novel aspect of their implementation is the creation of PrivTree, which, given the original data and a few other parameters (the scale of Laplacian noise to be used, a threshold used to decide whether the splitting of a node should occur, etc.), can implement a differentially private algorithm and output noisy data for almost any kind of location data.Pictures taken from Microsoft’s blog post “Project PrivTree: Blurring your “where” for location privacy”Uber uses differential privacy as part of their data analysis pipeline and other development workflows..A novel aspect of their implementation is the use of Elastic Sensitivity, a technique that allows you to compute the sensitivity of a query and met Uber’s demanding performance and scalability requirements.Taken from Uber Security’s Medium article “Uber Releases Open Source Project for Differential Privacy”Google implemented DP as part of their Randomized Aggregatable Privacy-Preserving Ordinal Response (RAPPOR) technology, which allows data analysts to study clients’ data without needing to look at the individual data points..A novel aspect of their implementation is that DP is implemented to guarantee client privacy in two important steps of RAPPOR’s execution: The permanent randomized response makes sure privacy is guaranteed from the generated noise and the instantaneous randomized response protects against an attacker using the permanent randomized response.Taken from the paper “RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response” by Úlfar Erlingsson, Vasyl Pihur, and Aleksandra KorolovaAdditional ResourcesThis post only scratched the surface of differential privacy..There are plenty of great resources if you’d like to explore the inner mechanics and applications of DP further; some of which include:A compendium of publications on DP available at Microsoft’s Research website here.Georgian Impact’s Medium article “A Brief Introduction to Differential Privacy” in addition to the simulation of the Laplacian Noisy Counting mechanism.“A Short Tutorial on Differential Privacy” by Borja Balle.Scientific American’s article “Privacy by the Numbers: A New Approach to Safeguarding Data”Frank McSherry’s blog posts on differential privacy (some of which include code for demonstration purposes), such as this post.Lecture notes and course materials for Penn State’s course Algorithmic Challenges in Data PrivacyLecture slides: “Differential Privacy in the Wild: A Tutorial on Current Practices and Open Challenges”.If you’d like to look into how we can apply DP for machine learning, I would recommend reading Nicolas Papernot’s and Ian Goodfellow’s blog post “Privacy and machine learning: two unexpected allies?” and Martín Abadi and colleagues’ paper “Deep Learning with Differential Privacy”.If you are interested in the intersection between machine learning and privacy in general, check out NeurIPS 2018 “Privacy Preserving Machine Learning” Workshop.Thank you for reading!Originally published at demystifymachinelearning.wordpress.com on November 20, 2018… More details

Leave a Reply