Earlier this week I described a common programming pattern in the SAS macro language. The pattern sets up a loop for processing each distinct value of a classification variable. The program uses the PROC SQL SELECT INTO feature to populate SAS macro variables. The effect: you can roll your own BY processing in SAS.
I posted the example and received many useful comments, including news about new features in SAS 9.3 that allow us to improve this pattern. Instead of two SELECT statements (one to gather a count, and a second to populate a range of variables), we can now achieve the same with just one SELECT statement (and thus, just one pass through the data):
/* SAS 9.3 approach */ /* Create macro vars with values and a total count of distinct values */ proc sql noprint; select distinct TYPE into :varVal1- from SASHELP.CARS; %let varCount = &SQLOBS.; quit; |
With the new SAS 9.3 syntax, we can now specify an open-ended range for macro variable names. SAS will allocate as many macro variables as are needed to store each value. The values will be TRIMMED of whitespace by default, but you can control this by specifying NOTRIM before the FROM keyword.
Since we don't need to know the count ahead of time, we can then safely use &SQLOBS after the SELECT to determine the upper bound of our index.
You can also specify leading zeros in the macro variable range, which can help with sorting the ordinals later. Here's a revised example:
proc sql noprint; /* using leading zero */ select distinct MAKE into :varVal01- from SASHELP.CARS; %let varCount = &SQLOBS.; quit; |
The accompanying image shows the resulting macro variables in the SAS Macro Variable Viewer, neatly collated in numerical order. (Remember that if you do add the leading zeros, you may need to use the Zw.d format when building up the macro variable names within a loop.)
9 Comments
Very nice! I know I'll be using this alot. Thanks for sharing it.
I think an additional small but valuable improvement, for a future SAS version, could be to follow a similar pattern to that used in SAS/IntrNet and SAS Stored Processes where the macro 'array' has a 0 indexed item containing the count of members (i.e. varVal0). It would save the extra statement with SQLOBS which I know I'll always be doing.
Yes, that seems nice but it really is a kludge. With a background in C I know I would trip over this as element 0 of an array is the first element. I actually found a bug in SAS a while ago where starting at zero failed in ODS MATCH_ALL. Please keep the array pure.
A far better improvement would be to give the macro language a genuine array type. Then you would have a function or property to return the number of elements. A great deal of messing about with the macro language is just to work around this lack. Why not make the macro facilities match the table-oriented paradigm of the dataset?
Peter,
An array purist, eh? That's what happens when your pedigree is C programming and not Visual Basic. Of course you can get the 0-based behavior in this SAS macro example:
The SAS macro language is more akin to a preprocessor than a compiled programming language, since its main use is to generate text which will eventually be interpreted into a SAS program. The SAS language continues to evolve, and many of the traditional tasks for which we rely on SAS macro can sometimes be better expressed in new constructs, such as defining your own DATA step functions with FCMP.
Well I also vote for better structures in the macro lang - or moere call exec() functionality in the datastep.
I propose freeing the hash object from the datastep so it can be available in in a macro too.
As option it should be allowed to use either the DDV or the PDV when finding etc.
Dave,
As experimental in SAS 9.3m2, you can use the DOSUBL function as a pretty effective substitute for CALL EXECUTE. I plan a related blog post soon. You can read about DOSUBL in this SAS Global Forum paper.
Pingback: Implement BY processing for your entire SAS program - The SAS Dummy
just put it into a hash object....
The hash object is a very powerful tool, but of course it's memory intensive. For very large data it may not be feasible. It's good to have a variety of techniques in your "SAS programming quiver" to pull out as needed.
For those wanting to learn more about hash objects, Michele Burlew has a new book on the topic. She also has a free webinar in the SAS Talks series to introduce you to the hash object with some examples.
yes - good links.
macro variables are also memory intensive....