In a previous SAS Learning Post, I introduced my new SAS Press book with Paul Dorfman, Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study. The post discussed the five things you (probably) don’t know you can do with a hash table. The fifth thing mentioned in that post was that techniques using the SAS hash object can often be more efficient than standard approaches to data management tasks, adding that the book does not specifically include many evaluations of comparisons of performance. The post promised to extend the content available in our book by writing articles at communities.sas.com. This article, Hash Tables Can Do More Than You Think: Table Lookup; Data Management; Data Aggregation; and more, provides an overview of the various topics.
In this post, I would like to highlight four additional articles of interest:
The first article, Performance - Comparing SQL, MERGE and the Hash Object to Join/Merge SAS Tables, illustrates alternative techniques to perform a 1-to-many merge or join. Not surprising (to us) is the fact that the hash object performs quite well (if not better) than either SQL or the DATA step MERGE statement.
The second article discusses how debugging programs that use the hash object can be a challenge at times. This article, SAS Hash Object Debugging Tips, provides an overview of some of the techniques we found valuable in developing and debugging the examples included in the book.
A common data management task is to split up a data set based on the value of some variable. Program 6.12 included in Chapter 6 of the book illustrates a single DATA step that creates a separate data set for each value of the variable assuming the data is sorted. The third article, Splitting a SAS data set based on the value of a variable, extends that example by providing two alternative techniques to split up the data. One that uses the SAS DOSUBL function; and one that uses the Hash of Hash approach which is discussed in Chapter 9. And make sure to check out the comments which include some tips on how to define all the variables in a data set to the hash object, as well as a way to define the data elements using the DATASET argument tab without loading any data rows.
And for the last article, Paul and I would like to thank Allan Bowe for re-packaging the programs that create the sample data for our book. When Allan asked if he could take the programs available on the book page and package them, so they are easier to access and use, we - of course - said ABSOLUTELY! Check out the github site for access to the data and check out his communities article Bizarro Ball Make a Hash of it.