GDPR Implications for Data Science

GDPR Implications for Data ScienceMichael BrooksBlockedUnblockFollowFollowingFeb 27General Data Protection Regulation (“GDPR”) which came into effect on May 25, 2018 sets a new global standard for individual privacy and data protection.

California enacted similar legislation in 2018 with the California Consumer Privacy Act of 2018 which specifies the data rights of California residents and goes into effect on January 1, 2020.

Colorado also enacted HB1128, effective September 1, 2018 which provides specific definitions of personally identifiable information (PII) for Colorado residents and includes new requirements for breach notification, retention, and security to protect the privacy of Colorado residents.

Collectively, these regulations represent the most important developments in data privacy in decades and are a harbinger of increased emphasis on individual data rights.

According to the PricewaterhouseCoopers, GDPR Preparedness Pulse Survey 68 percent of U.


-based companies expect to spend $1M to $10M to meet GDPR requirements.

Another 9 percent expect to spend more than $10M.

The scope of these far-reaching regulations applies to any company, anywhere in the world that provides products and services to covered individuals (e.


, data subjects) residing in a particular jurisdiction.

In an increasingly connected society, many organizations must comply with a myriad of independently developed state, federal and international regulations.

Failure to do so may result in significant penalties that would have a significant impact on the organization’s brand and finances.

As a result, chief information officers, data science leaders, data scientists and other executives must be prepared to deal with new challenges related to regulatory compliance, transparency, and accountability.

Regulatory ComplianceUnder GDPR, processing personal data related to individuals residing in European Union countries is generally prohibited, unless it is expressly allowed by law, or the individual has consented to the processing.

The individual must at least be notified about who is processing their data, what kind of data will be processed, where processing is occurring, and how it will be used.

The consent request must be in clear language and in an easily accessible form that explains the purpose for data processing in an unambiguous manner that is distinguishable from other topics.

Consent must be as easy for individuals to withdraw consent as it is for them to grant it.

​ Finally, GDPR’s “data erasure” provisions entitle individuals to have the organization erase his/her personal data, cease further dissemination of the data, and potentially have third parties halt processing of the individual’s data.

Increased TransparencyThe organization must be prepared to inform the individual about the use of the data for automated decision-making, the possible risks of data transfers due to absence of an adequacy decision or other appropriate safeguards.

These regulations establish limits on how and where data can be transferred, processed, and stored.

Such regulations include increased requirements to implement and maintain effective security measures related to the creation, retention, and disposal of paper and electronic documents containing personally identifiable information.

Finally, notifications are now mandatory where a data breach is likely to “result in a risk for the rights and freedoms of individuals”.

Under GDPR, this must be done within 72 hours of first having become aware of the breach.

Increased AccountabilityData science efforts can be significantly impacted by reduced data sets and other limitations such as what external sources they can and cannot use based on the location of the individual, who they share the data with, and the need for more effective data management practices.

When high-priority data pertaining to individuals is involved, data scientists will need to document what data is used, their algorithms and assumptions, how the algorithm may impact individuals, and how they managed bias.

Such practices will be complicated by the need to exclude specific individuals from data science projects where the individual has not provided express consent or where no legal basis for using the data exists.

Data Science ImplicationsThree things are clear.

First, these laws are complicated and imperfect.

They are unlikely to provide as much protection as individuals hope and will cost far more to implement and maintain than many people expect.

Second, the wave of data protection laws will likely continue and need better coordination.

Most impacted companies work across state and national borders so complying with diverse and at times conflicting regulations will require dedicated resources, processes and new technologies to ensure compliance.

Finally, many impacted companies, individuals, and government agencies are not prepared for these laws.

While many organizations have implemented data management programs that address the collection, aggregation, and storage of data, few are prepared to manage compliance preferences at the individual level.

Getting StartedTo address these new requirements and prepare for future complexity, proactive organizations should take the following steps.


Establish the Data Protection Officer (DPO) — Many organizations have established the role of Data Governance Officer to oversee the coordination of their data assets.

GDPR requires the establishment of a DPO to oversee compliance with external regulations.


Educate leadership — The use of individual data touches many parts of the organization including sales, service delivery, claims processing, human resources, operations, and other areas.

It is important to know what information can be collected, how it can be used, and what steps to take in the event issues arise.


Minimize data collection — The traditional philosophy of capturing, retaining, and often selling as much information as possible could increase the complexity of compliance.

Organizations should re-examine their data collection processes to capture the minimum amount of PII that is required for their data science initiatives.


Assess current business processes — These new requirements can have significant impact on a variety of policies and practices including communications with individuals, opting out, data capture, data retention, data sharing, and data usage.

Usage and management of PII should always be supported by a clear business purpose.


Review 3rd party contracts — Once data is shared with third parties, it is difficult to know and control how that data is used.

Third-party contracts should be reviewed for compliance and security measures to confirm that such data is not being used in a manner that is inconsistent with the individual’s intention.

It is important that organizations and their data science teams understand the implications of these requirements.

Organizations will need to document and demonstrate the consent of individuals and compliance with applicable regulations.

Data scientists will need to get educated, partner with business colleagues, establish more consistent policies and procedures, and improve the consistency of their practices.

Data scientists must be prepared to deal with limitations on data collection, retention, data profiling, bias, and how that data is being used to inform decisions that may impact the individual.

Together, these constituencies can create a more trusted environment where insights can be managed to deliver insight-driven service without compromising customer trust and regulatory compliance.


General Data Protection Regulation, Regulation (EU) 2016/679 of the European Parliament and of the Council, April 27, 20162.

California Consumer Privacy Act of 2018, AB-375, June 28, 20183.

Colorado also enacted HB 18–1128, May 29, 20184.

GDPR Preparedness Pulse Survey, PricewaterhouseCoopers, December 2016Originally published at https://medium.

com on February 27, 2019.


. More details

Leave a Reply