The European Union’s General Data Protection Regulation (GDPR) taking effect on 25 May 2018 pertains not only to organizations located within the EU; it applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.
If the GDPR acronym does not mean much to you, think of the one that does – HIPAA, FERPA, COPPA, CIPSEA, or any other that is relevant to your jurisdiction – this blog post is equally applicable to all of them.
The GDPR prohibits personal data processing revealing such individual characteristics as race or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, as well as the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health, and data concerning a natural person’s sex life or sexual orientation. It also has special rules for data relating to criminal convictions or offenses and the processing of children’s personal data.
Whenever SAS users produce reports on demographic data, there is always a risk of inadvertently revealing personal data protected by law, especially when reports are generated automatically or interactively via dynamic data queries. Even for aggregate reports there is a high potential for such exposure.
Suppose you produce an aggregate cross-tabulation report on a small demographic group, representing a count distribution by students’ grade and race. It is highly probable that you can get the count of 1 for some cells in the report, which will unequivocally identify persons and thus disclose their education record (grade) by race. Even if the count is not equal to 1, but is equal to some other small number, there is still a risk of possible deducing or disaggregating of Personally Identifiable Information (PII) from surrounding data (other cells, row and column totals) or related reports on that small demographic group.
The following are the four selected SAS tools that allow you to take care of protecting personal data in SAS reports by suppressing counts in small demographic group reports.
This blog post explains the fundamental concepts of data suppression algorithms. It takes you behind the scenes of the iterative process of complementary data suppression and walks you through SAS code implementing a primary and secondary complementary suppression algorithm. The suppression code uses BASE SAS – DATA STEPs, SAS macros, PROC FORMAT, PROC MEANS, and PROC REPORT.
This SAS Global Forum 2018 paper solidifies and expands on the above blog post. It walks you through the intricate logic of an enhanced complementary suppression process, and demonstrates SAS coding techniques to implement and automatically generate aggregate tabular reports compliant with privacy protection law. The result is a set of SAS macros ready for use in any reporting organization responsible for compliance with privacy protection.
In SAS Visual Analytics you can create derived data items that are aggregated measures. SAS Visual Analytics 8.2 on SAS Viya introduces a new Type for the aggregated measures derived data items called Data Suppression. Here is an excerpt from the documentation on the Data Suppression type:
“Obscures aggregated data if individual values could easily be inferred. Data suppression replaces all values for the measure on which it is based with asterisk characters (*) unless a value represents the aggregation of a specified minimum number of values. You specify the minimum in the Suppress data if count less than parameter. The values are hidden from view, but they are still present in the data query. The calculation of totals and subtotals is not affected.
Some additional values might be suppressed when a single value would be suppressed from a subgroup. In this case, an additional value is suppressed so that the suppressed value cannot be inferred from totals or subtotals.
A common use of suppressed data is to protect the identity of individuals in aggregated data when some crossings are sparse. For example, if your data contains testing scores for a school district by demographics, but one of the demographic categories is represented only by a single student, then data suppression hides the test score for that demographic category.
When you use suppressed data, be sure to follow these best practices:
- Never use the unsuppressed version of the data item in your report, even in filters and ranks. Consider hiding the unsuppressed version in the Data pane.
- Avoid using suppressed data in any object that is the source or target of a filter action. Filter actions can sometimes make it possible to infer the values of suppressed data.
- Avoid assigning hierarchies to objects that contain suppressed data. Expanding or drilling down on a hierarchy can make it possible to infer the values of suppressed data.”
This Data Suppression type functionality is significant as it represents the first such functionality embedded directly into a SAS product.
This blog post provides an example of using the above Data Suppression type aggregated measures derived data items in SAS Visual Analytics.
We need your feedback!
We want to hear from you. Is this blog post useful? How do you comply with GDPR (or other Privacy Law of your jurisdiction) in your organization? What SAS privacy protection features would you like to see in future SAS releases?