Using the DATASETS procedure, we can easily modify SAS variable attributes such as name, format, informat and label:
proc datasets library=libref; modify table_name; format var_name date9.; informat var_name mmddyy10.; label var_name = 'New label'; rename var_name = var_new_name; quit;
We cannot, however, modify fixed variable attributes such as variable type and length.
Actually, there is a way: Changing variable type and variable length in SAS datasets
Notice that we apply variable attributes modifications to one table at a time; that table name is specified in the modify statement of the proc datasets. There might be multiple modify statements within a single proc datasets to apply modifications to multiple data tables.
But what if we need to change variable attributes in all tables of a data library where that variable is present? Say, there is a data model change that needs to be applied consistently to all the data tables in that model.
Business case
Let’s consider a business case for a SAS programmer working for a pharmaceutical company or a contract research organization (CRO) managing data for a clinical trials project. Imagine that you need to apply some variable attributes modifications to all SDTM (Study Data Tabulation Model) and/or ADaM (Analysis Data Model) datasets to bring them in compliance with the CDISC (Clinical Data Interchange Standards Consortium) requirements. Obviously, “manually” going through dozens or hundreds of data tables in different domains to identify whether or not they contain your variable of interest is not an efficient solution.
Better data management practice
This variable attributes update process can be fully automated using the data-driven SAS macro loops approach as described in my earlier blog post. First, we need to programmatically loop through all the tables in a specified library to find the tables that contain the variable to be modified. Then, for each iteration of that data loop we use SAS macro looping to generate multiple modify statements in proc datasets with subsequent sub-statements (format, informat, label, rename).
SASHELP library
SAS provides a multitude of datasets and views in the SASHELP library. While many of them can be used in coding examples, demos, and testing, there are some that provide invaluable information about your SAS environment. Take a look at the views in the SASHELP library whose names start with the letter “V”: Vcolumn, Vlibname, Vmacro, Voption, Vtable, Vtitle, and others.
These SAS views contain a wealth of information about your SAS environment, and reflect any new changes to your SAS environment dynamically. For example, if you run the following code to create a macro variable:
%let aaa = a;
and then open the Vmacro view, you will see that macro variable listed right there at the very top.
For our purpose of looping through all the tables in a library containing a particular variable, we are going to use the Vcolumn view, as it has all the information we need – libname (Library Name), memname (Member Name), name (Column Name):
Code implementation
Having that SASHELP.VCOLUMN table view available we can easily implement our code for modifying variable attributes in all the tables of a data library. Here is the SAS code example:
libname sdtm 'c:\'; data sdtm.a; x=1; label x='Old x-label'; run; data sdtm.b; y=1; label y='Old y-label'; format y mmddyy8. run; data sdtm.c; y=2; label y='Old y-label'; format y date9.; run; %macro change_var_attr ( lib=, /* libref, required parameter */ var=, /* variable name, required parameter */ new_label='_', /* new label, in quotes, optional parameter */ new_format=_, /* new format, optional parameter */ new_informat=_, /* new informat, optional parameter */ new_name=_ /* new variable name, optional parameter */ ); /* get all tables in a library &lib containing variable &var */ data target_tables (keep=memname); set sashelp.vcolumn (keep=libname memname name); where (upcase(libname) eq upcase("&lib")) and (upcase(name) eq upcase("&var")); run; proc datasets library=&lib; %let dsid = %sysfunc(open(target_tables)); %syscall set(dsid); /* no leading ampersand for dsid with %syscall */ %do %while(%sysfunc(fetch(&dsid)) eq 0); modify &memname; %if &new_label ne '_' %then %str(label &var = &new_label;); %if &new_format ne _ %then %str(format &var &new_format;); %if &new_informat ne _ %then %str(informat &var &new_informat;); %if &new_name ne _ %then %str(rename &var = &new_name;); %end; %let rc = %sysfunc(close(&dsid)); run; quit; %mend change_var_attr; %change_var_attr(lib=sdtm, var=y, new_label='New label, commas allowed', new_format=yymmddd10., new_name=z)
The centerpiece of the code is the change_var_attr macro. The macro has two required parameters, lib and var, and four optional parameters, new_label, new_format, new_informat, and new_name.
To get all the tables in a library &lib containing the variable &var, we query and subset the SAS view SASHELP.VCOLUMN and create a target_tables SAS dataset.
Then in PROC DATASETS, we macro-loop through the target_tables and generate as many modify statements as there are observations in the target_tables dataset, one for each table to be modified. Each modify statement is followed by one or several optional statements – label, format, informat, and rename.
%syscall set (dsid); statement invokes a CALL SET routine that automatically sets the value of macro variable memname when an observation with memname variable is fetched (read) from the target_tables dataset.
There may be other programming techniques to solve the problem of modifying variable attributes across all data sets of a data library. You are welcome to suggest your solution in the comments section below.
Bonus
For those who were patient enough to read to the end… If you need to apply modifications to a variable in tables across multiple data libraries, you can still do it with a single invocation of the change_var_attr macro. In order to do that you would need to define a new data library logically concatenating all the relevant libraries. For example, the following libname statement assigns the libref combo as a logical concatenation of both, sdtm and adam libraries:
libname combo (sdtm adam);
16 Comments
There is an error here :
There should be no "=" in the format line. If there is You get a syntax error ): .
Great catch Wlodzimierz! You are absolutely correct regarding the format line, there should be no "=" between variable name and format name. The same holds true for the informat line, so I corrected both. Thank you.
Very nice article, Leonid!
Thank you, Kirk! I am glad you liked it.
Years ago I did a SUGI paper using the dictionary tables. I learned that using DICTIONARY.COLUMNS in PROC SQL is faster than using a DATA step with SASHELP.VCOLUMN. You could speed up your code by replacing your DATA step with PROC SQL. Also the values of LIBNAME and NAME in the dictionary/sashelp tables are already in uppercase. Using the upcase function there is unneeded. The open function is a nice bit.
Paper is at ftp://ftp.sas.com/techsup/download/misc/flatfile.pdf if anyone is interested.
Michelle, thank you for your comment and paper reference. My rationale is this. When we are dealing with "small" data as the dictionary/sashelp tables, using SAS data step vs PROC SQL is more a matter of personal preference (code readability and clarity) rather than a matter of speed optimization. Who cares whether a data step runs 1.5 seconds vs 1.4 seconds for PROC SQL! Optimizing this might be a classical case of solving the wrong problem (see Are you solving the wrong problem?) Regarding the upcase() function, I prefer a little redundancy in my code for the sake of clarity. Besides, who knows if in the future SAS releases those uppercase values in the dictionary/sashelp tables are going to stay.
At SAS Global Forum I will be presenting a paper on the use of some of the DATA step functions, such as the ones used here. I will add a citation for this blog.
One of the macros presented executes against all the files in a folder, including sub-folders. In this macro, included here, the objective is to import all CSV files, but that is the least important aspect of the macro.
%macro RecursiveImportData(folder=,levelcounter=0);
%* written by Phuong Dinh of Cornerstone Research Inc.;
%local _rc _dsid _i _s_ext _filename _ref;
%let _ref=filrf&levelcounter;
%if %sysfunc(fileexist(&folder)) = 0 %then %do;
%put ERROR: Folder does not exist - &folder;
%return;
%end;
%let _rc = %qsysfunc(filename(_ref, &folder));
%if &_rc=0 %then %do;
%let _dsid = %qsysfunc(dopen(&_ref));
%if &_dsid ne 0 %then %do;
%do _i=1 %to %qsysfunc(dnum(&_dsid));
%let _filename = %qsysfunc(dread(&_dsid, &_i));
%let _s_ext = %scan(&_filename, -1, .);
%if %upcase(&_s_ext)=CSV %then %do;
%put NOTE: Importing &folder\&_filename;
proc import file="&folder\&_filename"
out=temp dbms=csv replace;
getnames=yes;
run;
proc append base=bigdata
data=temp force;
run;
%end;
%else %if &_s_ext= &_filename %then %do;
%put This is a folder: &folder\&_filename;
%recursiveImportData(folder=&folder\&_filename,
levelcounter=%eval(&levelcounter+1))
%end;
%end;
%end;
%let _dsid=%qsysfunc(dclose(&_dsid));
%end;
%let _rc = %qsysfunc(filename(_ref));
%mend recursiveImportData;
Art, thank you for sharing your code here. What is the reason you use %qsysfunc macro function in most cases here as opposed to %sysfunc ?
At SAS Global Forum i will be presenting a paper on the use of some of the DATA step functions, such as the ones used here. I will add a citation for this blog.
Thank you, Art. I am looking forward to reading your paper.
Hello,
I'm also a big fan of the use of CALL SET routine and SCL FETCH function for the creation and use of macro variables directly obtained from a dataset. I have made a communication around this technique at a Phuse congress in 2010. For me this technique is really efficient and we can use it in many cases.
Regards,
Patrice.
Patrice, thank you for your comment. Would you mind sharing a link to your presentation at PhUSE?
Regarding the bonus: from what I understand from the documentation, concatenating libnames across multiple data libraries will work just fine as long as there are not datasets with the same name in more than one library? The documentation on which rules apply to which datasets seems well documented, if not that intuitive; on my part I would find it much more difficult to remember what applies when using the concatenation; I think individual calls to the macro would be slower, but more likely to do what I intended? Especially if there were datasets with the same names? Is there an easy way to have SAS tell me if there are duplicate dataset names in a concatenated library?
Patrick, thank you for your comment. I agree, if there are datasets with the same name in both libraries, it makes perfect sense to call this macro for each library individually. It won't significantly slow down the processing as the total number of tables processed stays the same. The "bonus" goal was to illustrate the library concatenation technique. Another way to approach processing multiple libraries with one macro call would be to modify the macro to allow several libref words in its first parameter value, then parse it and loop through all of them sequentially.
However, in general, it is not a good data management practice to have project-related same-named datasets in different libraries, even though operating system allows for such naming.
Very helpful technique - thank you for publishing. The combo libname statement is not one I had seen before.
Thank you, Louise, for your comment. I am glad you learnt something new.