It’s early March and one of these days, the snow will melt here in Minnesota, the ice will disappear from the lakes, and spring will arrive. On my list of things to do this spring is a third edition of SAS Macro Programming Made Easy. It’s time to update the book for Version 9.4 and improve the outputs for eBook formatting. Perhaps it’s time to refresh your SAS programming skills. Here are two quick tips.
Macro Variables: In SAS 9.3, when you want to create a series of macro variables with the INTO clause on the PROC SQL SELECT statement, you do not have to specify an exact upper bound on the list of macro variables. SAS creates only the number of macro variables it needs to when it executes the INTO clause. The only requirement is that you make sure the upper bound you specify is big enough. Try the following code.
Data set FRUITS has three observations. The SELECT clause specifies that PROC SQL can create up to 50 macro variables when it processes FRUITS.
data fruits; length fruit $ 20; input fruit @@; datalines; apple pear watermelon ;;; proc sql noprint; select fruit into :fruit1-:fruit50 from fruits; quit; %put &fruit1 &fruit2 &fruit3 &fruit4 &fruit5;
The SAS log for the %PUT statement shows that SAS did not create macro variables FRUIT4 and FRUIT5.
WARNING: Apparent symbolic reference FRUIT4 not resolved. WARNING: Apparent symbolic reference FRUIT5 not resolved. 60 %put &fruit1 &fruit2 &fruit3 &fruit4 &fruit5; apple pear watermelon &fruit4 &fruit5
Hash Obects: Test whether you can save resources if you use a hash object to find the unique combinations in a group of variables. Although easy to code, you might be able to avoid a PROC SORT step or an indexing step if you use a hash object. While PROC FREQ can easily find combinations for you, your SAS session might not have enough memory when your classification variables have many levels.
Suppose you have a large data set with prescription records (RXRECS) and you want to find all the combinations of drug (RX) and pharmacy (PHARMACY). The following DATA step loads RXRECS into hash object P. It takes advantage of the SAS default action of loading only unique combinations of key items (RX and PHARMACY) into a hash object (P). No sorts or indexing are required before the DATA step executes. The OUTPUT method creates data set RXPHARMCOMBOS from P. Because of the ORDERED: “YES” argument tag, SAS writes observations to RXPHARMCOMBOS in order by RX and PHARMACY. The data items are the same as the key items, and SAS saves the data items as variables in RXPHARMCOMBOS.
data_null_; attrib rx length=$4 pharmacy length=$20; declare hash p(dataset: 'rxrecs',ordered: 'yes'); p.definekey('rx','pharmacy'); p.definedata('rx','pharmacy'); p.definedone(); call missing(rx,pharmacy); rc=p.output(dataset: 'rxpharmcombos'); run;
A limitation with using a hash object is that your SAS session may not have enough memory to load the unique combinations into the hash object. But it’s worth a try!
Michele Burlew is the author of 7 SAS books, including her most recent - SAS Hash Object Programming Made Easy.
2 Comments
Hi ... just curious about the HASH example.
One statement ... "While PROC FREQ can easily find combinations for you, your SAS session might not have enough memory when your classification variables have many levels."
Another statement ... "A limitation with using a hash object is that your SAS session may not have enough memory to load the unique combinations into the hash object."
So ... since the coding with PROC FREQ is so simple, is there any reason to think that PROC FREQ use any more memory than a HASH OBJECT to store the combinations?
Again, just curious. Thanks.
Mike - Definitely go with the technique that’s easiest and most efficient. I have had occasions where PROC FREQ told me there was insufficient memory to compute the combinations I needed in large claims files while a hash object breezed through the processing. I’m not sure what defines the constraints on PROC FREQ. If neither PROC FREQ or a hash object have sufficient memory, I go back to sorting or indexing my data set and following that with a DATA step that looks at the BY variables that define the combinations. Thanks for the question.