How to store data about your data in your data

9

Although the NSA and news media have given metadata a bad name in the popular press, the fact remains that information about the nature of your data is extremely valuable.

For example, I posted an article yesterday about data cardinality. Cardinality measures the uniqueness of values in a variable. Cardinality is metadata: it's an attribute of the data variable that isn't the actual data content. If you know the cardinality of a variable, then you can decide what types of analyses might be appropriate. That's very useful.

I provided a method to calculate cardinality. The method (a SAS macro program) creates a data set with the cardinality information, plus an optional report. But what if you could infuse that cardinality information back into the original source data set, so that this metadata was present and visible the next time that you needed it? The information would be part of the data set, and you wouldn't need to keep track of another data table "on the side" or recalculate the values for each operation in your process.

With SAS 9.4, you can achieve this using extended attributes in SAS data sets. Extended attributes can be added, modified, and removed by PROC DATASETS and a series of XATTR statements.

The following program uses the %getcardinality macro program to create a cardinality data set, which includes values for nlevels and pct_unique. Then it generates a series of attributes and values (using the XATTR ADD VAR statement) to add these back into the original source data. (Well, in my case I made a copy of the data because I didn't want to modify the SASHELP version.)

/* See https://blogs.sas.com/content/sasdummy/about-cardinality/ */
%getcardinality(SASHELP.CARS, WORK.CARDS, 0);
 
/* Assemble XATTR statements to add in nlevels and pct_unique */
/* These were calculated in the %getcardinality macro         */
proc sql;
/* will generate statements like:                               */
/* XATTR ADD VAR Invoice (nlevels=425 pct_unique=0.9929906542); */
select 
  cat('XATTR ADD VAR ', 
         Name,' (nlevels=',
         nlevels,' pct_unique=',pct_unique,');') 
          into :xattr separated by ' '
    from work.cards;
quit;
 
proc datasets lib=work nolist;
  /* make a copy of the SASHELP.CARS data */
  /* so I don't modify the original */
  copy in=sashelp out=work;
  select cars;
 
  /* Now get ready to modify WORK.CARS */
  modify cars;
 
  XATTR ADD DS STATUS="Added cardinality on &sysdate.";
  /* Fold in the XATTR statements */
  &xattr.;
 
  /* Report on the contents including extended attributes */
  /* save just the attributes to a data set */
  ods output ExtendedAttributesVar = work.savedAttrs;
  contents data=cars;
quit;

At the end of the program, the PROC DATASETS CONTENTS statement produces a report and data set that shows the extended attributes, which are now "baked into" the SAS data set that they describe. Here's the report from PROC DATASETS -- note the new "Extended Attributes" sections.


Perhaps more useful, I used the ODS OUTPUT statement to save a copy of the ExtendedAttributesVar table, which contains the cardinality information that can inform my analysis.


Of course, storing cardinality is just one of many possible applications for extended attributes. You could extend my example here and "precalculate" any univariate statistic that you might want to know later: mean, median, sum, and so on. Or you could store audit details at the data set level with XATTR ADD DS. You can even store a copy of the SAS program that you used to create the data -- Diane Olson provides an example of this in her paper about extended attributes.

Important note: Only the BASE engine supports extended attributes. You can also access them from the META engine when the underlying metadata library is a BASE library. And SAS releases prior to 9.4 won't be able to access a data set once it contains extended attributes.

What other uses can you imagine for extended attributes? Post back in the comments!

Share

About Author

Chris Hemedinger

Senior Manager, SAS Online Communities

+Chris Hemedinger is the manager of SAS Online Communities. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies.  He also hosts the SAS Tech Talk webcasts each year from SAS Global Forum, connecting viewers with smart people from SAS R&D and the impressive work that they do.

9 Comments

  1. Thanks Chris,
    This is an intriguing feature. Curious, do these extended attributes show up when you run PROC CONTENTS? Are they by chance stored in a dictionary table as well?

    So now there are at least 3 different types of SAS metadata: metadata stored in the sas dataset itself; metadata stored in dictionary tables, and BI metadata. Great to have all of this available!

    • Chris Hemedinger
      Chris Hemedinger on

      Quentin,

      Yes, they show up in PROC CONTENTS (which is really the same as PROC DATASET and the CONTENTS statement, essentially).

      And there is a SASHELP.VXATTR for your querying convenience. Example:

      proc sql; 
      select name, xattr,xtype,xvalue from sashelp.vxattr 
        where libname="WORK" and memname="CARS";
      quit;

      Partial output:

                                                Extended
                                                Attribute  Extended Attribute
      Column Name     Extended Attribute Name   Type       Value
      --------------------------------------------------------------------------
                      STATUS                    char       Added cardinality on 
                                                           18OCT13              
      Make            pct_unique                num        0.0887850467         
      Make            nlevels                   num        38                   
      Model           pct_unique                num        0.9929906542         
      Model           nlevels                   num        425                  
      Type            nlevels                   num        6                    
      Type            pct_unique                num        0.0140186916         
      Origin          nlevels                   num        3                    
      Origin          pct_unique                num        0.0070093458         
      DriveTrain      nlevels                   num        3                    
      DriveTrain      pct_unique                num        0.0070093458         
      MSRP            pct_unique                num        0.9579439252         
      MSRP            nlevels                   num        410                  
      Invoice         nlevels                   num        425                  
      Invoice         pct_unique                num        0.9929906542         
      

  2. Awesome post (as usual).

    For "xtra" fun (terrible pun), once you've exported the extended attributes into a data set, you can create some macros of them.

    data _null_; set work.savedAttrs;
    call symput(ExtendedAttribute, AttributeCharValue);
    run;

    (Though it might be better to run symput on the xvattr data set you mentioned, but using an if statement for xtype.)

    • You actually don't need the IF statement that I alluded to. If you use the data set savedAttrs, then the extended attribute values are split across two variables - one for char and one for num. In that instance, you would need to call symput appropriately.

      But using the sashelp.vxattr view is far easier:

      data _null_; set sashelp.vxattr;
      call symput(xattr, xvalue);
      run;
      %put _USER_;

      • Chris Hemedinger
        Chris Hemedinger on

        Thanks for the extra tips Jared! I'm glad that you find the post (and the extended attributes feature) useful!

  3. Thanks Chris,

    great to read about this feature! (though it’s no longer brand new at the time of my reading)

    I’ve been wondering whether it would be possible to customize EG to display the extended attribute content in the tooltip that pops up in datagrid view when the mouse cursor is placed over the variable name? (e.g. http://blogs.sas.com/content/sasdummy/files/2013/09/valuelabel.png)

    The fields that are normally displayed in the tooltip are variable Type, Length and Label. The inclusion of (a selection of) extended attributes would add some serious value to this tooltip!

  4. I'm super late to the game on this, as I've not considered adding custom attributes in the past. Also, we've only upgraded to SAS 9.4 recently, so the xattr statement wouldn't have helped me before anyway. But now that I have 9.4 and I had a desire to customize some metadata for some datasets, here I am on yet another of your spectacularly informative blog postings. Wow! I'm so glad the developers decided to add this feature. Thanks, Chris, for keeping us SAS users well informed, even years down the road.

Leave A Reply

Back to Top