What does the requirement for data privacy mean for data scientists, business analysts and IT?

0

data privacy for business analysts, data scientists and ITCorporate compliance with an increasing number of industry regulations intended to protect personally identifiable information (PII) has made data privacy a frequent and public discussion. An inherent challenge to data privacy is, as Tamara Dull explained, “data, in and of itself, has no country, respects no law, and travels freely across borders. In the digital age, there are no geographical borders. And yet, most governments have attempted to put restrictions on how their citizens’ data is used. When we hear about foreign issues, we treat them like they’re strange and far away, ignoring the fact that those issues can very quickly come home to roost.” One such example is the pending European General Data Protection Regulation (GDPR).

Not only does data not naturally obey geographical borders – but with the pervasiveness of cloud computing, social networks, mobile devices and the Internet of Things, everything is connected, in motion and data-infused. All of that increases the risk of unauthorized access to sensitive data. This is why data privacy cannot be an afterthought.

Unfortunately, many organizations lack a clearly defined data privacy policy that details exactly what data is to be collected and how that data should be used. The era of big data also seems to encourage an excessive, almost obsessive, collection of data without consideration of its potential uses and privacy implications.

What do data privacy requirements mean for data scientists, business analysts and IT?

For starters, not everyone within the organization may know what sensitive data the enterprise has and/or where it's located. Part of IT’s responsibility is to make sure all data assets are properly cataloged and then share this information with the entire enterprise. The organization may also lack a well-defined way to identify which data is sensitive and therefore subject to protection. Executive management must take responsibility for providing data governance guidelines for identifying sensitive data. Then IT must implement data management processes to protect and control the access, and level of access, to sensitive data.

Business analysts require authorized access to pertinent data, such as an analyst at a bank who reviews a customer’s loan application. But the business analyst doesn’t need to see sensitive data values such as social security numbers (or other national or tax identification numbers), bank account numbers, credit card numbers, or possibly even contact information such as phone numbers and email addresses. There really is no reason for these data values to be displayed as plain text on the business analyst’s computer screen.

Data scientists often analyze data that includes sensitive customer information. But they can still gain analytical insights if the data is anonymized before analysis. Natural keys like credit card numbers, social security numbers and account numbers should be replaced with surrogate keys so that individuals can be identified and aggregated properly without exposing sensitive details.

Data privacy requires role-based data masking and encryption techniques to minimize the exposure of sensitive data. This data masking should be done as close to data entry or as early in the data life cycle as possible to minimize any unnecessary exposure of sensitive data. In this way, data can be valuable to data scientists, business analysts, IT and those in other organizational roles while maintaining the level of privacy and protection necessary for both corporate compliance and personal privacy.

What say you?

Please share your perspective and experience regarding how the requirement for data privacy affects different roles at your organization by posting a comment below.


Learn how SAS can help identify, govern and protect personal data

Share

About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 20 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality. Jim is the host of the popular podcast OCDQ Radio, and is very active on Twitter, where you can follow him @ocdqblog.

Leave A Reply

Back to Top