Data Privacy and Anonymization Techniques

Simple Techniques to Anonymize Data A simple approach to maintaining personal data privacy when using data for predictive modeling or to glean insightful information is to scrub the data..Scrubbing is simply removing personally identifiable information such as name, address, and date of birth..However, cross-referencing this with public data or other databases you may have access to could be used to fill in the “missing gaps” in the scrubbed dataset..The classic example of this was when then MIT student Latanya Sweeny was able to identify an individual using a scrubbed health records and cross-referencing it with voter-registration records..Tokenization is another commonly used technique to anonymize sensitive data by replacing personally identifiable information such as a name with a token such as a numerical representation of that name..However, the token could be used as a reference to the original data..Sophisticated Techniques to Anonymize Data More sophisticated workarounds that help overcome the de-anonymization of data are differential privacy and k-anonymity..Differential privacy uses mathematical mechanisms to add random noise to the original dataset to mask personally identifiable information, while making it possible to probabilistically return similar search results if you were to run the same query over the original dataset..An analogy is trying to disguise a toy panda with a horse head, creating just enough of a disguise to not recognize it’s a panda..When queried, it returns the counts of toys, which the disguised panda belongs to, without recognizing an individual panda toy..Apple, for example, has started using differential data privacy with its iOS 10 devices to uncover patterns in user behavior and activity without having to identify individual users..This allows Apple to analyze purchases, web browsing history, and health data while maintaining your privacy.. More details

Leave a Reply