SAS tools for GDPR privacy compliant reporting

16

SAS reporting tools for GDPR and other privacy protection lawThe European Union’s General Data Protection Regulation (GDPR) taking effect on 25 May 2018 pertains not only to organizations located within the EU; it applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.

If the GDPR acronym does not mean much to you, think of the one that does – HIPAA, FERPA, COPPA, CIPSEA, or any other that is relevant to your jurisdiction – this blog post is equally applicable to all of them.

The GDPR prohibits personal data processing revealing such individual characteristics as race or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, as well as the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health, and data concerning a natural person’s sex life or sexual orientation. It also has special rules for data relating to criminal convictions or offenses and the processing of children’s personal data.

Whenever SAS users produce reports on demographic data, there is always a risk of inadvertently revealing personal data protected by law, especially when reports are generated automatically or interactively via dynamic data queries. Even for aggregate reports there is a high potential for such exposure.

Suppose you produce an aggregate cross-tabulation report on a small demographic group, representing a count distribution by students’ grade and race. It is highly probable that you can get the count of 1 for some cells in the report, which will unequivocally identify persons and thus disclose their education record (grade) by race. Even if the count is not equal to 1, but is equal to some other small number, there is still a risk of possible deducing or disaggregating of Personally Identifiable Information (PII) from surrounding data (other cells, row and column totals) or related reports on that small demographic group.

The following are the four selected SAS tools that allow you to take care of protecting personal data in SAS reports by suppressing counts in small demographic group reports.

1. Automatic data suppression in SAS reports

This blog post explains the fundamental concepts of data suppression algorithms. It takes you behind the scenes of the iterative process of complementary data suppression and walks you through SAS code implementing a primary and secondary complementary suppression algorithm. The suppression code uses BASE SAS – DATA STEPs, SAS macros, PROC FORMAT, PROC MEANS, and PROC REPORT.

2. Implementing Privacy Protection-Compliant SAS® Aggregate Reports

This SAS Global Forum 2018 paper solidifies and expands on the above blog post. It walks you through the intricate logic of an enhanced complementary suppression process, and demonstrates SAS coding techniques to implement and automatically generate aggregate tabular reports compliant with privacy protection law. The result is a set of SAS macros ready for use in any reporting organization responsible for compliance with privacy protection.

3. SAS® Visual Analytics 8.2 / Create Derived Items

In SAS Visual Analytics you can create derived data items that are aggregated measures.  SAS Visual Analytics 8.2 on SAS Viya introduces a new Type for the aggregated measures derived data items called Data Suppression. Here is an excerpt from the documentation on the Data Suppression type:

“Obscures aggregated data if individual values could easily be inferred. Data suppression replaces all values for the measure on which it is based with asterisk characters (*) unless a value represents the aggregation of a specified minimum number of values. You specify the minimum in the Suppress data if count less than parameter. The values are hidden from view, but they are still present in the data query. The calculation of totals and subtotals is not affected.

Some additional values might be suppressed when a single value would be suppressed from a subgroup. In this case, an additional value is suppressed so that the suppressed value cannot be inferred from totals or subtotals.

A common use of suppressed data is to protect the identity of individuals in aggregated data when some crossings are sparse. For example, if your data contains testing scores for a school district by demographics, but one of the demographic categories is represented only by a single student, then data suppression hides the test score for that demographic category.

When you use suppressed data, be sure to follow these best practices:

  • Never use the unsuppressed version of the data item in your report, even in filters and ranks. Consider hiding the unsuppressed version in the Data pane.
  • Avoid using suppressed data in any object that is the source or target of a filter action. Filter actions can sometimes make it possible to infer the values of suppressed data.
  • Avoid assigning hierarchies to objects that contain suppressed data. Expanding or drilling down on a hierarchy can make it possible to infer the values of suppressed data.”

This Data Suppression type functionality is significant as it represents the first such functionality embedded directly into a SAS product.

4. Is it sensitive? Mask it with data suppression

This blog post provides an example of using the above Data Suppression type aggregated measures derived data items in SAS Visual Analytics.

We need your feedback!

We want to hear from you.  Is this blog post useful? How do you comply with GDPR (or other Privacy Law of your jurisdiction) in your organization? What SAS privacy protection features would you like to see in future SAS releases?

Share

About Author

Leonid Batkhan

Leonid Batkhan is a long-time SAS consultant and blogger. Currently, he is a Lead Applications Developer at F.N.B. Corporation. He holds a Ph.D. in Computer Science and Automatic Control Systems and has been a SAS user for more than 25 years. From 1995 to 2021 he worked as a Data Management and Business Intelligence consultant at SAS Institute. During his career, Leonid has successfully implemented dozens of SAS applications and projects in various industries. All posts by Leonid Batkhan >>>

16 Comments

  1. The Consumer Financial Protection Bureau used techniques to remove PII from data using entity extraction. These days, you can use a modeling or a rules-based approach for that. Anyone interested in other modeling techniques that can be applied to comment/complaint freeform data which was scrubbed for PII can check out my paper on assessing consumer complaints using text analytics and machine learning. Thanks Leonid - I appreciate the encouragement to share this out.

  2. This is useful, lots of folks have questions these days on how to handle GDPR. I think SAS text analytics could address issues as well. Entities could identify and mask PII in freeform data fields for starters.

  3. I had once a customer that wanted to hide plain text of a given character column. This macro does the trick.

    /*=============================================================*/
    /* Macro ENCODE                                                */
    /* This macro encodes a character column of a given dataset    */
    /* and writes a result dataset.                                */
    /*-------------------------------------------------------------*/
    /* Parameters: indata = library.SourceTable                    */
    /*             outdata= library.TargetTable                    */
    /*             col    = Column to encrypt                      */
    /*             method = encryption algorithm, valid opitons:   */
    /*                      SAS 9.4 (sas001,sas002,sas003,sas004)  */
    /*                      SAS 9.3 (sas001,sas002)                */
    /*                      (sas001 is a weak encryption)          */
    /*             dropCol= Should the non-encrypted column be     */
    /*                      deleted from the target table: YES/NO  */
    /*-------------------------------------------------------------*/
    /* Author: Karl-Heinz Saxer, SAS Institute AG, Switzerland     */
    /*-------------------------------------------------------------*/
     
    %macro encode(indata  =,
                  outdata =,
                  col     =,
                  method  =sas002,
                  dropCol =NO);
     
      /* write no NOTES and no generated macro-code to LOG */
      options nonotes nomprint;
     
      /* write column value to encrypt into macro-variables*/
     /* count the number of records */
      data _null_;
        set &indata end=last;
        call symputx("enc_&col"||left(_N_),&col);
        if last=1 then call symputx("final",_n_);
      run;
     
      /* encrypt the column */
      filename dummy temp;
      %do i=1 %to &final;
         proc pwencode in     = "&&enc_&col&i"
                       method = &method
                       out    = dummy;
         run;
         %let outcol&i=&_pwencode;
      %end;
      filename dummy clear;
     
      /* write target dataset */
      data &outdata;
         set &indata;
         &col          = strip(symget("enc_&col"||left(_n_)));
         &col._encrpyt = substr(strip(symget("outcol"||left(_n_))),9);
         %if %upcase(&dropCol) = YES %then %do;
           drop &col;
         %end;
       run;
     
       /* write NOTES to LOG */
       options notes;
       %put NOTE: Dataset "%upcase(&outdata)" has been created with encrypted column "ENC_%upcase(&col)".;
       %if %upcase(&dropCol) = YES %then %do;
         %put NOTE: Column "%upcase(&Col)" was deleted because of the parameter setting of DropCol="%upcase(&dropCol)".;
       %end;
    %mend encode;
     
    /* call the macro----------------------------------------------*/
    %encode(indata  = sashelp.class,
            outdata = work.result_sas001,
            col     = name,
            method  = sas001,
            dropCol = no);
     
    /* print the result table--------------------------------------*/
    title "sas001 - encoding algorithm";
    proc print data=work.result_sas001;
    run;
     
    /* call the macro----------------------------------------------*/
    %encode(indata  = sashelp.class,
            outdata = work.result_sas002,
            col     = name,
            method  = sas002,
            dropCol = no);
     
    /* print the result table--------------------------------------*/
    title "sas002 - encoding algorithm";
    proc print data=work.result_sas002;
    run;
  4. Peter Lancashire on

    I think the decryption function (or its possibility) is a security hole. Why not keep a copy of the encrypted value locally? When the external partner has a query and sends the encrypted value back, just look it up in the copy you have. On uniqueness: just make an MD5 hash or similar and encrypt it - it is almost guaranteed to be unique.

  5. An encryption and decryption system is really required as suggested above, the SAS tools which are described here and the parameters of that will be important for working with SAS.

  6. SAS R&D folks should come up with Encryption function with a single parameter ie an actual Value to be Encrypted ( can be Character or Numeric (including date value) ). The Function thus converts the input parameter to an Encrypted value. Probably another Decryption Function to obtain the actual value given an Encrypted value (produced by the SAS encryption function) is useful. I sometimes have to send reports to external clients with some column values encrypted. Each distinct column value should have a distinct encrypted value. And often i do receive feedback from these clients on some data within the report with the report again sent back to me where i need to access the actual value instead of the encrypted value.

    Is there a easy solution to this problem in SAS?

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top