Cleaning up after yourself: Deleting data sets


"Always clean up after yourself."

My mother taught me this, and I apply it to SAS programming as regularly as I apply it at home. For SAS programming, I reinterpret Mom's saying as the following rule:

Always delete temporary files and data sets when you are finished using them.

How to Detect and Delete Temporary Data Sets in the SAS/IML Language

In a previous blog post about data set maintenance tips, I showed how you can use the DATASETS function in SAS/IML to determine which data sets exist in a library. I also showed how to use the DELETE subroutine to delete data sets from within the SAS/IML language.

You can use these functions to detect and delete any temporary data sets that you create during a SAS/IML session. The idea is to use the DATASETS function at the beginning of your program to remember which data sets are in the WORK library, and then call it again at the end of your program to detect which data sets are new. Those are the ones to delete, because they were created by your program.

The following DATA step creates four SAS data sets, each with one observation:

/** created outside of your SAS/IML
    program. Don't delete these. **/
data a b c d;

The following statement starts PROC IML and assigns ds0 to the results of the DATASETS function. The vector ds0 contains four elements, which are the names of the data sets that exist in the WORK library.

proc iml;
ds0 = datasets("work"); /** = {A,B,C,D} **/

Suppose in the course of your SAS/IML program that you create several data sets in WORK, such as are shown in the following statements:

x=1; r = {1,2,3}; t = {1,2};
create Q var {x}; append;
create A123 var {x}; append;
create results var {r}; append;
create timing var {t}; append;
close Q A123 results timing;

It is good programming practice to delete all data sets in WORK that were not there when your SAS/IML session began. To do this, call the DATASETS function a second time and use the SETDIF function to find the new data sets:

ds = datasets("work");
dsNew = setdif(ds, ds0);
print dsNew[format=$7.];

The SETDIF function returns a vector that contains the elements of ds that are not found in ds0. You can use the DELETE subroutine to delete the data sets that are specified in the dsNew vector.

Preserving a Few Data Sets

But what can you do if you want to preserve one or more SAS data sets? For example, what if you intend to plot or otherwise analyze the RESULTS and TIMING data sets? No problem! Just define a vector that contains the names of the data sets that you want to keep, and use the SETDIF function again to determine the data sets to remove:

keep = {"Results", "Timing"};
dsDel = setdif(upcase(dsNew), upcase(keep));
print dsDel[format=$7.];
call delete(dsDel); /** SAS/IML 9.22 **/

Note: In SAS/IML 9.22 you can delete all of the data sets with a single call to the DELETE subroutine. Prior to 9.22, you have to call the DELETE subroutine in a loop to delete multiple data sets.

Notice that you can perform case-insensitive comparisons in the SAS/IML language by using the UPCASE function. Alternatively, you can specify the strings in the keep vector as all uppercase, because the DATASETS function always returns uppercase values.

The inquisitive reader might ask why I don't just specify a vector of the temporary datasets and omit the techniques in this post. That's fine for short programs, but for longer programs that call many SAS/IML modules (that were potentially written by other people!) it can be difficult to keep track of all the temporary datasets in a program. Furthermore, if you modify and enhance the program in the future, you could forget to update the vector. The techniques in this post are robust to program modifications.

So thanks, Mom, for your advice, which keeps my SAS libraries as clutter-free as my house.


About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top