Tiny macros break up big problems

6

There are many different ways to do something; my approach is all about breaking a problem into smaller problems. I try as much as possible to stay away from developing something that can only serve a purpose in one problem. I achieve this by keeping things modular. Each tool should work independently to do one well-defined task. In that way, when you find solutions to the smaller problems, you will also have the tools to solve other big problems later.

Thinking about a problem abstractly can help you to break it into smaller problems. For example, what if we wanted to find the variables that are common to two data sets? At a basic level, this is just comparing two sets of elements and seeing which elements appear in both sets. So right there we’ve broken the problem into two smaller problems:

  1. How do we list out the variables of a dataset?
  2. How do we compare two sets to find the intersection?

Creating tools to solve these smaller problems will give you ideas for other tools you can develop that will all work together to broaden your tool set.

In this post I will continue with this idea of lists/sets. A lot of my macros work with lists and rely on functions that interact with lists. The real value of working with lists in this way may not be immediately clear but as I write more posts, it will come.

Lists and sets are similar concepts: In the traditional sense, a list is a sequence of elements that have some ordering whereas a set is a bunch of unique elements. Here I will create functions for a flexible version of list, basing some of the ideas on the theory behind sets.

First we need to make a macro that will return a list of variables in a specified data set. Looking at example 5 at the bottom of this page - http://support.sas.com/documentation/cdl/en/mcrolref/61885/HTML/default/z3514sysfunc.htm#a000384587- will give you a good idea on how to do this. Here is what I came up with:

/*     Macro that returns the list of variables from a specified dataset.

   PARAMETERS:
       &ds - the dataset

   EXAMPLES:
       %varlist(sashelp.class);
       %varlist(doesnotexist);
*/

%MACRO varlist(ds);
   %LOCAL nvars nobs varlist i;

   %LET dsid = %SYSFUNC(open(&ds));
   %IF &dsid %THEN %DO;
       %LET nvars  = %SYSFUNC(attrn(&dsid,NVARS));
       %IF &nvars < 1 %THEN %PUT WARNING: %dset has no variables.;
       %ELSE %DO;
           %DO i = 1 %TO &nvars;
               %LET varlist = &varlist %SYSFUNC(VARNAME(&dsid, &i));
           %END;
       %LET rc = %SYSFUNC(close(&dsid));
       &varlist
       %RETURN;
       %END;
   %END;
   %ELSE %PUT ERROR: Open for data set &dset failed - %SYSFUNC(sysmsg());
%MEND;

This opens a data set (%SYSFUNC(OPEN())), counts the number of variables %SYSFUNC(ATTRN(_,NVARS)), iterates through them and gets their names (%SYSFUNC(VARNAME)). (A warning is printed if there are no variables in the data set, and an error is given if the data set could not be opened.)

The line “&varlist” is actually returning the contents of the macro variable “varlist” to the caller. This idea is important. Another way to do it is to use a system of reading/writing to global variables, but I am not a fan of that approach - global variables can be inadvertently overwritten resulting in things going wrong.

In this case, I wanted the macro to stop after returning the value because the subsequent error message was not needed, so I explicitly used the %RETURN function (). Normally if I am returning a value to the caller, I make it the last line in the macro and therefore do not need the %RETURN function. I think of these tools as functions - you pass one something and it returns something. Two different inputs can have the same output but the same input cannot yield two different outputs.

Because no data steps or SQL are used in the implementation, this macro can be called from virtually anywhere. This will come in handy when we call it from within data steps later. You can test it by running it on one of the sample SAS data sets that come with Base SAS (e.g. sashelp.class - %PUT %varlist(sashelp.class);).

[*Note that it returns the variables in a list, where each element is specified by a space. (This is the convention I use.)]

Now, let’s start with a simple function that will count the number of elements in a list. This requires just one pass through the list:

/*
  Macro that returns the number of elements in a given list.

  EXAMPLES:
      %lengthList(element element2 element3 element4);
      %lengthList(%STR(element element2 element3 element4));
      %lengthList(%STR(element,element2,element3,element4), delimiter=%STR(,));
      %lengthList();
*/
%MACRO lengthList(list, delimiter=%STR( ));
  %LOCAL return i;
  %LET i=1;
  %DO %WHILE(%LENGTH(%QSCAN(&list, &i, &delimiter)) GT 0);
      %LET i = %EVAL(&i + 1);
  %END;
  %EVAL(&i - 1)
%MEND;

I added a “delimiter”, an optional parameter, just to show that you can make it more flexible. In most of my other examples I will stick to assuming a space separated list. That being said, I also have functions that allow the user to format lists, adding/removing delimiters as they please. This way they have just as much power by using tools in combination as they would have had, had I offered the options on each one.

Let’s try another easy one - one that tells you if a specified element exists in a list:

/*
   Macro that simulates the IN function used in data steps.
   That is, it checks to see if &string is an element of &list.

   NOTE:
       It is not enough that &string be contained in an element of
   &list.  Only if &string IS AN ELEMENT of &list, then true
   is returned.

   EXAMPLE:
       %inn(%STR(el1 el2 el3), el2);
           - will return 1 (true)
       %inn(%STR(el1 el2 el3), el4);
           - will return 0 (false)
*/
%MACRO inn(list, string);
   %IF %LENGTH(&list)<=0 %THEN %LET list=%STR( );
   %IF (%SYSFUNC(FINDW(%STR(&list), &string))) %THEN 1;
   %ELSE 0;
%MEND inn

Note here that %inn compares the elements as is. It does not control for case. You have to make the elements the same case beforehand if you want the result to be case insensitive. A helpful additional parameter to this function would be something that tells the macro whether or not to be case sensitive. I invite you to refactor to add in this feature.

Also note that a value of 1 is taken to be true and 0 as false. This macro can be called from within conditionals and SAS will interpret the returned result as such.

Next let’s try a function that finds the intersection between two lists. This will involve parsing both lists, comparing elements between them as we go.

/*
   Macro that returns the intersection of two lists.

   EXAMPLE:
      %intersectList(%STR(element1 element2 element3), %STR(element2 element4));
      %intersectList(%STR(element1 element2 element3), %STR(element4 element5));
*/
%MACRO intersectList(list1, list2);
   %LOCAL return i;
   %LET return = %STR( );
   %LET i = 1;
   %DO %WHILE (%LENGTH(%QSCAN(&list1, &i, %STR( ))) GT 0);
       %IF (%INN(%STR(&list2), %QSCAN(&list1, &i, %STR( )))) %THEN
           %LET return = &return %SYSFUNC(STRIP(%QSCAN(&list1, &i, %STR( ))));
       %LET i=%EVAL(&i + 1);
   %END;
   &return
%MEND;

It goes through each element in the first list using the do loop and calling the helper %inn function to check if it exists in the second.

I invite you to refactor this macro so that it checks for duplicated elements in each of the lists and notifies you appropriately. What happens if both lists have the same duplicated element?

We now have everything that we need to address the problem we started with - how to find the variables that are common to two data sets. Let’s try it on those sample SAS data sets again, this time on sashelp.class and sashelp.heart. We know we want to call %intersectList() and pass it the lists of variables from each dataset using %varlist(). Here’s what that would look like:

%PUT %intersectList(%varlist(sashelp.class), %varlist(sashelp.heart));

See how the calls to %varlist() are nested inside of a call to %intersectList()? This all works because of our function approach. Each call to %varlist is returning a list, which is then being passed as arguments when calling %intersectList(). Running this prints “Sex Height Weight”; these variables exist in both data sets.

Now you can begin to see what’s possible with this function-based approach. Try playing around with the macros in different situations and combinations. For example, try using %varlist() inside a KEEP function in a data step to keep only the variables that exist in some other data set. (You can find the tools we've covered at DontSASme.com.

In my next post, I will continue with a few more list functions.

Let me know what your results are and how you would solve this problem.

Share

About Author

Praveen Sawh

Programmer, Database Architect

Praveen Sawh, Programmer, Database Architect, is a programmer at the University of Toronto. He is in charge of getting big data from different sources to play nice and work together so that meaningful real-world questions can be answered. His treatment of choice is a healthy dose of Base SAS and a few open source solutions.

6 Comments

    • Waynette Tubbs
      Waynette Tubbs on

      Hey Paul,

      Are you talking about example four? If so, you might find the missing values by using the slider at the bottom of that image. Let me know if this doesn't help.

      THank you, Waynette Tubbs

    • Waynette Tubbs
      Waynette Tubbs on

      Hey Paul,

      I'm sorry for the mix up. You are exactly right. 🙂 I had a copy/paste error. It is fixed in the post now.

      THank you,
      Waynette

  1. Pingback: Keys to the kingdom—using metadata to build a database - SAS Users Groups

  2. I have one quick question if you're still following this thread. Why would you prefer to build a %lengthList function rather than use a %sysfunc call to COUNTW?

    Thanks,
    Bob

  3. Hi Bob,

    Good question. I am not familiar with the implementation of COUNTW but if it does the same thing (it sounds like it does) then it can certainly be used. I am just used to making my own functions because I like to be able to have control over implementation and add features as I wish. But in the case of something as simple as finding the length of a list, there's probably not a lot I'd want to add beyond what COUNTW already does.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top