SAS programmers often resort to using the X command to list the contents of file directories and to process the contents of ZIP files. In centralized SAS environments, the X command is unavailable to most programmers. NOXCMD is the default setting for these environments (disallowing shell commands), and SAS admins are reluctant to change it.
Update 28Nov2016: I updated this article to remove the text about gz (gzip) file support. Currently, the FILENAME ZIP method works only with ZIP files -- on Windows and Unix.
In this article, I'll share a SAS program that can retrieve the contents of a file directory (all of the file names), and then also report on the contents of every ZIP file within that directory -- without using any shell commands. The program uses two lesser-known tricks to retrieve the information:
- The FILENAME statement can be applied to a directory, and then the DOPEN, DNUM, DREAD, and DCLOSE functions can be used to retrieve information about that directory. (Check SAS Note 45805 for a better example of just this - click the Full Code tab.)
- The FILENAME ZIP method (added in SAS 9.4) can retrieve the names of the files within a compressed archive (ZIP files). For more information, see all of my previous articles about the FILENAME ZIP access method.
I wrote the program as a SAS macro so that it should be easy to reuse. And I tried to be liberal with the comments, providing a view into my thinking and maybe some opportunities for improvement.
%macro listzipcontents (targdir=, outlist=); filename targdir "&targdir"; /* Gather all ZIP files in a given folder */ /* Searches just one folder, not subfolders */ /* for a fancier example see */ /* http://support.sas.com/kb/45/805.html (Full Code tab) */ data _zipfiles; length fid 8; fid=dopen('targdir'); if fid=0 then stop; memcount=dnum(fid); /* Save just the names ending in ZIP*/ do i=1 to memcount; memname=dread(fid,i); /* combo of reverse and =: to match ending string */ /* Looking for *.zip files */ if (reverse(lowcase(trim(memname))) =: 'piz.') then output; end; rc=dclose(fid); run; filename targdir clear; /* get the memnames into macro vars */ proc sql noprint; select memname into: zname1- from _zipfiles; %let zipcount=&sqlobs; quit; /* for all ZIP files, gather the members */ %do i = 1 %to &zipcount; %put &targdir/&&zname&i; filename targzip ZIP "&targdir/&&zname&i"; data _contents&i.(keep=zip memname); length zip $200 memname $200; zip="&targdir/&&zname&i"; fid=dopen("targzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname=dread(fid,i); /* save only full file names, not directory names */ if (first(reverse(trim(memname))) ^='/') then output; end; rc=dclose(fid); run; filename targzip clear; %end; /* Combine the member names into a single data set */ /* the colon notation matches all files with "_contents" prefix */ data &outlist.; set _contents:; run; /* cleanup temp files */ proc datasets lib=work nodetails nolist; delete _contents:; delete _zipfiles; run; %mend;
Use the macro like this:
Experience has taught me that savvy SAS programmers will scrutinize my example code and offer improvements. For example, they might notice my creative use of the REVERSE function and "=:" operator to simulate and "ends with" comparison function -- and then suggest something better. If I don't receive at least a few suggestions for improvements, I'll know that no one has read the post. I hope I'm not disappointed!