SAS author's tip: Creating a data set from a hash object

This week's SAS tip is from Michele Burlew and her new book SAS Hash Object Programming Made Easy. Michele is the author of several revered user-friendly books. Be sure to take a look at the free chapter from her latest after reading this week's tip.

The following excerpt is from SAS Press author Michele Burlew and her book "SAS Hash Object Programming Made Easy" Copyright © 2012, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. (please note that results may vary depending on your version of SAS software)

Creating a Data Set from a Hash Object

A DATA step’s purpose is to process data and many times create a new data set. You start the DATA step with a DATA statement that names the data set that you want to create.

You can also create data sets from the contents of a hash object in a DATA step. You do this by applying the OUTPUT method and naming the data set that you want to create from the hash object on the OUTPUT method’s DATASET argument tag. You do not name the data set on the DATA statement.

An advantage of being able to create data sets from hash objects with the OUTPUT method is that you can name your data sets during execution of your DATA steps. SAS language statements that call the OUTPUT method within a DATA step can name your data sets, and you can create as many data sets as your data and code require. You do not need to explicitly name the data sets that the OUTPUT method creates as the DATA statement requires you to do.

The syntax of the OUTPUT method follows. The OUTPUT method requires that you specify at least one DATASET argument tag.

When the OUTPUT method executes, SAS outputs the entire contents of the hash object to the data set. The OUTPUT method is not related to the OUTPUT SAS language statement where the OUTPUT statement must execute once for every observation you want SAS to add to a data set. Instead, the statement that calls the OUTPUT method typically executes just once during execution of your DATA step. All the information that you want written to your data set must already exist in your hash object before the DATA step executes the OUTPUT method.

Additionally, you can apply data set options to the data sets specified on the DATASET argument tags. For example, starting with SAS 9.3, you can control which variables SAS writes to a data set by adding either the KEEP= or DROP= data set options. Also, with the WHERE= data set option, you can control which observations the OUTPUT method writes to the data set it creates.

You can create more than one data set from one hash object with one application of the OUTPUT method. With the use of data set options, you can save one group of variables in one data set and save a second group of variables in a second data set as the following statement illustrates.

The OUTPUT method does not have any options that can prevent the replacement of an existing data set whose name you have specified on the DATASET argument tag. Further, SAS does not issue any warning messages that inform you that the data set named in the OUTPUT method call already exists.

SAS generates an error for the OUTPUT method if you specify the same data set name on both the DATA statement and the DATASET argument tag of the OUTPUT method. The DATA statement takes precedence over the OUTPUT method. Therefore, the OUTPUT method does not replace or contribute observations to the data set named on the DATA statement because it cannot open the data set.

Similarly, SAS generates an error for the OUTPUT method if your DATA step reads a data set that you also name on the DATASET argument tag of the OUTPUT method. SAS does not replace the data set when the OUTPUT method executes because the OUTPUT method cannot close a data set while SAS executes a SET or MERGE statement on a data set with the same name.

tags: michele burlew, sas, sas author's tip, SAS Hash Object, SAS Hash Object Programming

One Comment

  1. Cory
    Posted June 4, 2014 at 5:02 pm | Permalink

    I did not receive an error when using the same dataset name in a set statement as well as the hash output method. This cleared up my issue though and instead of initiating variables with an if _N_ = 0 then set dsname; I am simply using if _N_ then do; length var1 8.; end;

    I've made it generic, but you hopefully get the point. The method did not error out and in my looping you could see it working correctly and even stating the new number of records in the dataset, but the next iteration of the loop had the original number, as if the dataset had been locked and not able to be overwritten.