There are many different ways to do something; my approach is all about breaking a problem into smaller problems. I try as much as possible to stay away from developing something that can only serve a purpose in one problem. I achieve this by keeping things modular. Each tool should work independently to do one well-defined task. In that way, when you find solutions to the smaller problems, you will also have the tools to solve other big problems later.
Thinking about a problem abstractly can help you to break it into smaller problems. For example, what if we wanted to find the variables that are common to two data sets? At a basic level, this is just comparing two sets of elements and seeing which elements appear in both sets. So right there we’ve broken the problem into two smaller problems:
- How do we list out the variables of a dataset?
- How do we compare two sets to find the intersection?
Creating tools to solve these smaller problems will give you ideas for other tools you can develop that will all work together to broaden your tool set.
In this post I will continue with this idea of lists/sets. A lot of my macros work with lists and rely on functions that interact with lists. The real value of working with lists in this way may not be immediately clear but as I write more posts, it will come.
Lists and sets are similar concepts: In the traditional sense, a list is a sequence of elements that have some ordering whereas a set is a bunch of unique elements. Here I will create functions for a flexible version of list, basing some of the ideas on the theory behind sets.
First we need to make a macro that will return a list of variables in a specified data set. Looking at example 5 at the bottom of this page - http://support.sas.com/documentation/cdl/en/mcrolref/61885/HTML/default/z3514sysfunc.htm#a000384587- will give you a good idea on how to do this. Here is what I came up with:
/* Macro that returns the list of variables from a specified dataset. PARAMETERS: &ds - the dataset EXAMPLES: %varlist(sashelp.class); %varlist(doesnotexist); */ %MACRO varlist(ds); %LOCAL nvars nobs varlist i; %LET dsid = %SYSFUNC(open(&ds)); %IF &dsid %THEN %DO; %LET nvars = %SYSFUNC(attrn(&dsid,NVARS)); %IF &nvars < 1 %THEN %PUT WARNING: %dset has no variables.; %ELSE %DO; %DO i = 1 %TO &nvars; %LET varlist = &varlist %SYSFUNC(VARNAME(&dsid, &i)); %END; %LET rc = %SYSFUNC(close(&dsid)); &varlist %RETURN; %END; %END; %ELSE %PUT ERROR: Open for data set &dset failed - %SYSFUNC(sysmsg()); %MEND;
This opens a data set (%SYSFUNC(OPEN())), counts the number of variables %SYSFUNC(ATTRN(_,NVARS)), iterates through them and gets their names (%SYSFUNC(VARNAME)). (A warning is printed if there are no variables in the data set, and an error is given if the data set could not be opened.)
The line “&varlist” is actually returning the contents of the macro variable “varlist” to the caller. This idea is important. Another way to do it is to use a system of reading/writing to global variables, but I am not a fan of that approach - global variables can be inadvertently overwritten resulting in things going wrong.
In this case, I wanted the macro to stop after returning the value because the subsequent error message was not needed, so I explicitly used the %RETURN function (). Normally if I am returning a value to the caller, I make it the last line in the macro and therefore do not need the %RETURN function. I think of these tools as functions - you pass one something and it returns something. Two different inputs can have the same output but the same input cannot yield two different outputs.
Because no data steps or SQL are used in the implementation, this macro can be called from virtually anywhere. This will come in handy when we call it from within data steps later. You can test it by running it on one of the sample SAS data sets that come with Base SAS (e.g. sashelp.class - %PUT %varlist(sashelp.class);).
[*Note that it returns the variables in a list, where each element is specified by a space. (This is the convention I use.)]
Now, let’s start with a simple function that will count the number of elements in a list. This requires just one pass through the list:
/* Macro that returns the number of elements in a given list. EXAMPLES: %lengthList(element element2 element3 element4); %lengthList(%STR(element element2 element3 element4)); %lengthList(%STR(element,element2,element3,element4), delimiter=%STR(,)); %lengthList(); */ %MACRO lengthList(list, delimiter=%STR( )); %LOCAL return i; %LET i=1; %DO %WHILE(%LENGTH(%QSCAN(&list, &i, &delimiter)) GT 0); %LET i = %EVAL(&i + 1); %END; %EVAL(&i - 1) %MEND;
I added a “delimiter”, an optional parameter, just to show that you can make it more flexible. In most of my other examples I will stick to assuming a space separated list. That being said, I also have functions that allow the user to format lists, adding/removing delimiters as they please. This way they have just as much power by using tools in combination as they would have had, had I offered the options on each one.
Let’s try another easy one - one that tells you if a specified element exists in a list:
/* Macro that simulates the IN function used in data steps. That is, it checks to see if &string is an element of &list. NOTE: It is not enough that &string be contained in an element of &list. Only if &string IS AN ELEMENT of &list, then true is returned. EXAMPLE: %inn(%STR(el1 el2 el3), el2); - will return 1 (true) %inn(%STR(el1 el2 el3), el4); - will return 0 (false) */ %MACRO inn(list, string); %IF %LENGTH(&list)<=0 %THEN %LET list=%STR( ); %IF (%SYSFUNC(FINDW(%STR(&list), &string))) %THEN 1; %ELSE 0; %MEND inn
Note here that %inn compares the elements as is. It does not control for case. You have to make the elements the same case beforehand if you want the result to be case insensitive. A helpful additional parameter to this function would be something that tells the macro whether or not to be case sensitive. I invite you to refactor to add in this feature.
Also note that a value of 1 is taken to be true and 0 as false. This macro can be called from within conditionals and SAS will interpret the returned result as such.
Next let’s try a function that finds the intersection between two lists. This will involve parsing both lists, comparing elements between them as we go.
/* Macro that returns the intersection of two lists. EXAMPLE: %intersectList(%STR(element1 element2 element3), %STR(element2 element4)); %intersectList(%STR(element1 element2 element3), %STR(element4 element5)); */ %MACRO intersectList(list1, list2); %LOCAL return i; %LET return = %STR( ); %LET i = 1; %DO %WHILE (%LENGTH(%QSCAN(&list1, &i, %STR( ))) GT 0); %IF (%INN(%STR(&list2), %QSCAN(&list1, &i, %STR( )))) %THEN %LET return = &return %SYSFUNC(STRIP(%QSCAN(&list1, &i, %STR( )))); %LET i=%EVAL(&i + 1); %END; &return %MEND;
It goes through each element in the first list using the do loop and calling the helper %inn function to check if it exists in the second.
I invite you to refactor this macro so that it checks for duplicated elements in each of the lists and notifies you appropriately. What happens if both lists have the same duplicated element?
We now have everything that we need to address the problem we started with - how to find the variables that are common to two data sets. Let’s try it on those sample SAS data sets again, this time on sashelp.class and sashelp.heart. We know we want to call %intersectList() and pass it the lists of variables from each dataset using %varlist(). Here’s what that would look like:
%PUT %intersectList(%varlist(sashelp.class), %varlist(sashelp.heart));
See how the calls to %varlist() are nested inside of a call to %intersectList()? This all works because of our function approach. Each call to %varlist is returning a list, which is then being passed as arguments when calling %intersectList(). Running this prints “Sex Height Weight”; these variables exist in both data sets.
Now you can begin to see what’s possible with this function-based approach. Try playing around with the macros in different situations and combinations. For example, try using %varlist() inside a KEEP function in a data step to keep only the variables that exist in some other data set. (You can find the tools we've covered at DontSASme.com.
In my next post, I will continue with a few more list functions.
Let me know what your results are and how you would solve this problem.