Michele Burlew’s new book, SAS Hash Object Programming Made Easy takes users through the newest look-up technique from SAS, one that has many still shaking their heads. “Not many DATA step programmers have used hash objects much,” Burlew says. “They seem scary because they operate a little differently from the SAS language people are used to.”
But Michele is a big believer now. “Once you get them, you can do so much more,” she told me. “They enhance your programming; they can make the job easier; they can be extremely efficient. Hash objects significantly expand your vocabulary within the SAS language.”
I wondered why people have been so hesitant to use hash objects. Michele described some common misconceptions:
- Hash objects are too difficult to figure out. The highest hurdle in adding hash objects to your SAS language vocabulary might be the strange-looking syntax. Probably the easiest starting point is to make a hash object a simple lookup tool like you use arrays and user-defined formats. Adapt a simple example that processes your own data and compare the results of your new hash object solution to the output produced by a technique you already understand well. You do not have to be a techno-wizard nor do your applications have to be complex to use hash objects!
- A programmer proficient in other DATA step and PROC step tools will not benefit from learning how to use hash objects. As a SAS programmer for more than 30 years, I definitely believed this statement when I first encountered hash objects. However, I followed my own advice in the previous bullet and started with simple lookup tasks. Then when I needed various cuts of 150 million record medical claims data sets, I was hooked. While they aren’t always the best solution, in many situations my hash object code was shorter, easier to maintain, and executed faster than my usual combination of DATA steps and PROC steps.
- Hash objects make sense only when working with big data sets. Since the claim was that hash objects would improve efficiency of programs, it seemed to me that this would matter only when I worked with huge data sets. However, as I gained experience with hash objects, I found reasons to use them in all sizes of applications. For example, if you need to combine multiple data sets into one data set and different keys link the various data sets, the task could require multiple sorts and merges. Instead, in just a single DATA step, you might be able to combine all of the data sets using hash objects. None of the data sets needs to be large to make this DATA step an efficient solution that is easy to follow.