Four more tips about hash tables


In a previous SAS Learning Post, I introduced my new SAS Press book with Paul Dorfman, Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study. The post discussed the five things you (probably) don’t know you can do with a hash table. The fifth thing mentioned in that post was that techniques using the SAS hash object can often be more efficient than standard approaches to data management tasks, adding that the book does not specifically include many evaluations of comparisons of performance. The post promised to extend the content available in our book by writing articles at This article, Hash Tables Can Do More Than You Think: Table Lookup; Data Management; Data Aggregation; and more, provides an overview of the various topics.

In this post, I would like to highlight four additional articles of interest:

The first article, Performance - Comparing SQL, MERGE and the Hash Object to Join/Merge SAS Tables, illustrates alternative techniques to perform a 1-to-many merge or join. Not surprising (to us) is the fact that the hash object performs quite well (if not better) than either SQL or the DATA step MERGE statement.

The second article discusses how debugging programs that use the hash object can be a challenge at times. This article, SAS Hash Object Debugging Tips, provides an overview of some of the techniques we found valuable in developing and debugging the examples included in the book.

A common data management task is to split up a data set based on the value of some variable. Program 6.12 included in Chapter 6 of the book illustrates a single DATA step that creates a separate data set for each value of the variable assuming the data is sorted. The third article, Splitting a SAS data set based on the value of a variable, extends that example by providing two alternative techniques to split up the data. One that uses the SAS DOSUBL function; and one that uses the Hash of Hash approach which is discussed in Chapter 9. And make sure to check out the comments which include some tips on how to define all the variables in a data set to the hash object, as well as a way to define the data elements using the DATASET argument tab without loading any data rows.

And for the last article, Paul and I would like to thank Allan Bowe for re-packaging the programs that create the sample data for our book. When Allan asked if he could take the programs available on the book page and package them, so they are easier to access and use, we - of course - said ABSOLUTELY! Check out the github site for access to the data and check out his communities article Bizarro Ball Make a Hash of it.


About Author

Don Henderson

Owner and Principal of Henderson Consulting Services

Don Henderson is the Owner and Principal of Henderson Consulting Services, a SAS Affiliate Partner. Don has used SAS software since 1975, designing and developing business applications with a focus on data warehouse, business intelligence, and analytic applications. Don was one of the primary architects in the initial development and release of SAS/IntrNet software in 1996, and he was one of the original developers for the SAS/IntrNet Application Dispatcher. Don is the author of SAS Server Pages: Generating Dynamic Content, Building Web Applications with SAS/IntrNet: A Guide to the Application Dispatcher, and Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study. Don has presented numerous papers at SUGI and regional SAS user group meetings and continues to be a great supporter of SAS and its products.

Related Posts

Comments are closed.

Back to Top