SAS variable labels are unique features of SAS data tables (aka data sets) that allow SAS users to enhance reading and interpretation of tables and reports.
Whether you use SAS data table as a data source in any of the reporting procedures or interactive interface such as SAS Visual Analytics, you will benefit from pre-assigning meaningful labels during the data preparation process. Besides being more efficient, such an early label assignment secures consistency of the data elements descriptions (labels) across different developers.
The most direct way of creating column labels is by explicitly assigning them to the data variables. You can do it during the data table creation in a DATA step using either LABEL statement or ATTRIB statement. Alternatively, you can do it after your data table is already created by using PROC DATASETS’ MODIFY statement with the LABEL= option.
However, in many situations there are ways of automating this tedious and voluminous process of column labels creation. Let’s look into one of them that I found useful for bulk column labeling, especially in quick turnaround scenarios like developing an analytics proof of concept (PoC). Plus, we are going to explore SAS coding technique using _DATA_and_LAST_special data sets.
Deriving variable labels from variable names
This method is suitable when variable names are well-formed, for example CUSTOMER_ADDRESS, FIRST_NAME, LAST_NAME, COMPANY_NAME, PLACE_OF_BIRTH, etc. Kudos to data designer!
We can transform these names into labels by replacing underscores with space characters and converting words from upper case to proper case. These are the labels we will get: Customer Address, First Name, Last Name, Company Name, Place Of Birth.
Let’s say our original data table is DEMO:
data DEMO; input CUSTOMER_CITY $ 1-15 FIRST_NAME $ 16-26 LAST_NAME $27-37 COMPANY_NAME $38-50 COUNTRY_OF_BIRTH $51-65; datalines; Washington Peter Birn Citibank USA Denver Lisa Roth IBM UK Cary Antony Bessen SAS Spain ;
Then the following macro will create variable labels out of the variable names as described above:
%macro ilabel (dataset); %local lbref dsname vname vlabel nvars; %if %index(&dataset,.) %then %do; /* 2-level dataset name */ %let lbref = %scan(&dataset,1,'.'); %let dsname = %scan(&dataset,2,'.'); %end; %else %do; /* 1-level dataset name */ %let lbref = WORK; %let dsname = &dataset; %end; /* get variable names */ proc contents data=&dataset out=_data_(keep=name) noprint; run; /* create name/label pairs */ data _null_; set _last_ end=eof nobs=n; call symput('vname'!!strip(put(_n_,best.)),name); lbl = propcase(translate(name,' ','_')); call symput('vlabel'!!strip(put(_n_,best.)),trim(lbl)); if eof then call symputx('nvars',n); run; /* modify variable labels */ proc datasets lib=&lbref nolist; modify &dsname; label %do i=1 %to &nvars; &&vname&i = "&&vlabel&i" %end; ; quit; %mend ilabel;
You can invoke this macro by either one line of code:
Here are how our DEMO table looks before and after %ilabel macro modifies/assigns the labels based on the column names:
Macro code highlights
In this macro, we:
- Define local macro variables to make sure their names will not interfere with possible namesakes in the calling program.
- Determine libref and one-level data set name for the input data set.
- Create a table containing variable names in the input data set using PROC CONTENTS.
- Use DATA _NULL_ step to read through the variable names, and derive labels as
lbl = propcase(translate(name,' ','_'));
Here, transalate() function replaces underscores with blanks, then propcase() function converts every word in its argument to proper case (upper case for the first character and lower case for the remaining characters). We also create macro variables for each name/label pair (vname1, vlabel1, vname2, vlabel2, …) and macro variable nvars representing the number of such pairs.
- Use PROC DATASETS with MODIFY and LABEL statements to assign generated column labels to the source data set.
If some of the labels assigned by this macro are not what you need you may run another PROC DATASETS to individually adjust (re-assign) them according to your wishes or specification. But when you need to label data set columns on a large scale (many tables with dozens or hundreds of columns) this can be a good first draft that can save you time and efforts.
_DATA_ and _LAST_ special data sets
You might notice that I used _data_ data set name in the out= option of the PROC CONTENTS. This is not an explicit data set name; it is a keyword, a special data set that allows SAS to assign one of the available data set names dynamically. The created output data set will have a name that looks something like DATA1 or DATA2, etc. Try running this code:
data _data_; x=1; run;
and look in the SAS LOG at what data set is created. I got:
NOTE: The data set WORK.DATA1 has 1 observations and 1 variables.
Special data set name _data_ tells SAS to create a data set in the WORK library from a list of names DATA1, DATA2, … according to the DATAn naming convention. These names (as well as WORK library) are unique for a given SAS session. The first time you use _data_ within a SAS session it will create data set named WORK.DATA1, the second time you use _data_ it will create WORK.DATA2, and so on.
Consequently, I used special data set name _last_ in the SET statement of the DATA step following the PROC CONTENTS. Again, here _last_ is a keyword, not a data set name; it is a special data set that refers to the name of the last created data set during your SAS session. That causes SAS to use the latest data set created prior to the _last_ reference.
Special data sets _data_ and _last_ are reserved names (or SAS keywords) along with special data set _null_ that is used in the DATA _NULL_ statement and causes SAS to execute the DATA step without creating a data set. (By the way, using DATA _NULL_ can increase your code efficiency when you use the DATA step for custom report writing or creating macro variables or other processing for which the output data set is not needed as it does not consume computer resources for writing and storing the output data set.)
If I were using an explicit table name in this macro instead, and your calling program accidentally were using the same table name, then the macro would overwrite your table which would wreak havoc to your program. Using _data_ and _last_ special data sets protect your SAS program from a possibility of inadvertently overwriting your other data set with the same name by executing the %ilabel macro. It is similar to using %LOCAL for macro variable names for protecting from possible overwrites of your %GLOBAL macro variables with the same names.
A WORD OF CAUTION: Remember, that the _data_ keyword creates table names that are unique only within a SAS session, so it works perfectly for the WORK data library which itself is a unique instance for a SAS session. While it is syntactically correct to use special data set notation _data_ for creating permanent data sets such as libref._data_ (including SASUSER._data_), I have to warn you against using it as it will not guarantee the name uniqueness in the permanent data library, and you may end up overwriting data sets that already exist there.
Do you find this post useful? How do you handle the task of assigning variable labels on a mass scale? Do you use _data_ and _last_ special data sets in your SAS coding? Please share in the comments section below.