This week's SAS tip is from Michele Burlew and her latest book SAS Hash Object Programming Made Easy. Michele is the author of several extremely helpful SAS books. Visit her author page to read free chapters and for additional bonus content.
The following excerpt is from SAS Press author Michele Burlew and her book "SAS Hash Object Programming Made Easy" Copyright © 2012, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. (please note that results may vary depending on your version of SAS software)
Understanding how SAS stores hash objects in memory
A big advantage of working with hash objects is that SAS dynamically allocates memory as it needs it. You do not have to determine the size of your hash object before you can use it. For example, you can reuse the same code that defines your hash object even if the next time you use it, you have many more observations to load into it.
This flexibility is different than if you worked with an array of SAS variables. When you define an array with the ARRAY statement, you must specify the number of elements in your array. If the number of elements changes the next time you use your DATA step, you must update your ARRAY statement, or possibly maintain additional code like macro programs that could update this for you.
The amount of memory your SAS session has available determines how big your hash object can be. The amount of space that the hash objects in the examples in this chapter take is trivial. On the other hand, it is possible that you may not have enough memory to create a hash object from a data set that has millions of observations and hundreds of data items.
You can make a rough estimate of the amount of space your hash object might take by multiplying the number of observations by the observation length, or more precisely by using the ITEM_SIZE attribute. Chapter 6 describes the ITEM_SIZE attribute. However, even if your hash object fits into memory, other processing that you’re doing within the DATA step can affect memory usage.
As you code your DATA step that creates a hash object, consider how you will reference that hash object. Consider both memory usage and programming complexity when determining whether you should define a hash object.
Reducing the number of observations and restricting the data items loaded into the hash object to only those that the program needs is a way to conserve memory. While it may seem counterintuitive, it may be more efficient to load your larger data set into the hash object, especially if it is your lookup data set. The action of reading your smaller data set sequentially and looking up information in a large hash object is likely to process more quickly than if you read your larger data set sequentially and look up information for each of its observations in a small hash object.
Another way to conserve memory if your DATA step is complex and you have multiple hash objects is to delete hash objects after they are no longer needed. You can also empty out a hash object and refill it. Chapter 6 shows how these actions can be performed.