3 misconceptions about SAS hash objects: and why you should start using them now

Michele Burlew’s new book, SAS Hash Object Programming Made Easy takes users through the newest look-up technique from SAS, one that has many still shaking their heads. “Not many DATA step programmers have used hash objects much,” Burlew says. “They seem scary because they operate a little differently from the SAS language people are used to.”

But Michele is a big believer now. “Once you get them, you can do so much more,” she told me. “They enhance your programming; they can make the job easier; they can be extremely efficient. Hash objects significantly expand your vocabulary within the SAS language.”

I wondered why people have been so hesitant to use hash objects. Michele described some common misconceptions:

Hash objects are too difficult to figure out. The highest hurdle in adding hash objects to your SAS language vocabulary might be the strange-looking syntax. Probably the easiest starting point is to make a hash object a simple lookup tool like you use arrays and user-defined formats. Adapt a simple example that processes your own data and compare the results of your new hash object solution to the output produced by a technique you already understand well. You do not have to be a techno-wizard nor do your applications have to be complex to use hash objects!
A programmer proficient in other DATA step and PROC step tools will not benefit from learning how to use hash objects. As a SAS programmer for more than 30 years, I definitely believed this statement when I first encountered hash objects. However, I followed my own advice in the previous bullet and started with simple lookup tasks. Then when I needed various cuts of 150 million record medical claims data sets, I was hooked. While they aren’t always the best solution, in many situations my hash object code was shorter, easier to maintain, and executed faster than my usual combination of DATA steps and PROC steps.
Hash objects make sense only when working with big data sets. Since the claim was that hash objects would improve efficiency of programs, it seemed to me that this would matter only when I worked with huge data sets. However, as I gained experience with hash objects, I found reasons to use them in all sizes of applications. For example, if you need to combine multiple data sets into one data set and different keys link the various data sets, the task could require multiple sorts and merges. Instead, in just a single DATA step, you might be able to combine all of the data sets using hash objects. None of the data sets needs to be large to make this DATA step an efficient solution that is easy to follow.

Start expanding your SAS vocabulary with SAS Hash Object Programming Made Easy. Preview a free chapter here.

2 Comments

Michele Burlew on July 31, 2014 2:23 pm

Michael,

Thanks for your suggestion. I think you are describing Examples 1.1 and 1.2. The intention of the WHERE= option in 1.2 is to show that it's possible to use the option to select what you load into a hash object. Maybe in a next edition, I will change 1.1 to be more like 1.2 and better clarify the distinction between the two examples.

Thanks again,
Michele Burlew
Michael Bonanomi on July 30, 2014 5:05 am

Hi Aimee
I run the two examples in the book excerpt, the one without, and the one with a hash object.
They produce the same results. But they are not comparable because the hash example only selects the observations from employees with emppaylevel=:"A" while the other example selects all observations. So I did a second run while letting away the where data set option:
declare hash e(dataset: 'employees');
The result is the same. So I propose for a next edition of the book to either let away the where data set option, or to put it also in the first non-hash example. Like this the examples would be comparable.
Kind regards, Michael

Blogs

Blogs

3 misconceptions about SAS hash objects: and why you should start using them now

About Author

2 Comments