It's time to share another tip about working with ZIP files in SAS. Since I first wrote about FILENAME ZIP to list and extract files from a ZIP archive, readers have been asking for more. Specifically, they want additional details about the files that are contained in a ZIP, including the original file datetime stamps, file size, and compressed size. Thanks to a feature that was quietly added into SAS 9.4 Maintenance 3, you can use the FINFO function to retrieve these details. In this article, I share a SAS macro program that does the job.
Here's an abridged example of the output. If you need to create something like this without the use of external ZIP tools like 7-Zip or WinZip (which are often unavailable in controlled environments), read on.
You can download the full program from my public gist on GitHub: zipfiles_list_details.sas
ZIPpy details: a solution in three macros
Here's my basic approach to this problem:
- First, create a list of all of the ZIP files in a directory and all of the file "members" that are compressed within. I've already shared this technique in a previous article. Like an efficient (or lazy) programmer, I'm just reusing that work. That's macro routine #1 (%listZipContents).
- With this list in hand, iterate through each ZIP file member, "open" the file with FOPEN, and gather all of the available file attributes with FINFO. I've divided this into two macros for readability. %getZipMemberInfo (macro routine #2) retrieves all of the file details for a single member and stores them in a data set. %getZipDetails (macro routine #3) iterates through the list of ZIP file members, calls %getZipMemberInfo on each, and concatenates the results into a single output data set.
Here's a sample usage:
%listzipcontents (targdir=C:\Projects\ZIPPED_Examples, outlist=work.zipfiles); %getZipDetails (inlist=work.zipfiles, outlist=work.zipdetails);
I tried to add decent comments to my program so that interested coders can study and adapt as needed. Here's a snippet of code that uses the FINFO function, which is really the important part for retrieving these file details.
/* Assumes an assignment like: FILENAME F ZIP "C:\ZIPPED_Examples\SudokuSolver_src.zip" member="src/AboutThisProject.txt"; */ fId = fopen("&f","S"); if fID then do; infonum=foptnum(fid); do i=1 to infonum; infoname=foptname(fid,i); select (infoname); when ('Filename') filename=finfo(fid,infoname); when ('Member Name') membername=finfo(fid,infoname); when ('Size') filesize=input(finfo(fid,infoname),15.); when ('Compressed Size') compressedsize=input(finfo(fid,infoname),15.); when ('CRC-32') crc32=finfo(fid,infoname); when ('Date/Time') filetime=input(finfo(fid,infoname),anydtdtm.); end; end; compressedratio = compressedsize / filesize; output; fId = fClose( fId );
The FINFO function in SAS provides access to file attributes and their values for a given file that you've accessed using the FOPEN function. The available file attributes can differ according to the type of file (FILENAME access method) that is used. ZIP files, as you can guess, have some attributes that are specific to them: "Compressed Size", "CRC-32", and others. This code checks for all of the available attributes and keeps those that we need for our detailed output. (And see the use of the SELECT/WHEN statement? So much more readable than a bunch of IF/THEN/ELSEs.)
Look, I'm not going to claim that my approach to this problem is the most elegant or most efficient -- but it works. If it can be improved, then I'm sure I'll hear from a few of you experts out there. Bring it on!
For more about ZIP files in SAS
- How to create ZIP files with ODS PACKAGE ZIP (available since SAS 9.2)
- How to "unzip" and read ZIP files using FILENAME ZIP (SAS 9.4 and later)
- How to create and update ZIP files with FILENAME ZIP (SAS 9.4 and later)
- List the contents of ZIP files with FILENAME ZIP (SAS 9.4 and later)
I suspect it won't, but I suppose it can't hurt to ask: Does this work with gzip, i.e. in a UNIX type environment?
Jim, that's coming soon! GZ support hits at SAS 9.4 Maint 5, mere weeks away. I'll have to update all of these blog posts then.
Chris! That's great news! Have I mentioned how intelligent and good looking you are, lately? Keep up the good work.
I am trying to read S3 files from AWS in a SAS BASE program. Is that possible?
I have found in documentaton 'proc s3' procedure, but it's not avalilable in my ver (SAS EG 7.11)
Do yo know any other way?
I think proc s3 isnew in 9.4m5. if you don't have that, you might be able to do something with APIs and proc http.
Just an FYI. As XiaobinDC pointed out, there is a typo in the _zipfiles step where the closing paren for lowcase in misplaced. Took me a while to figure out why that step wasn't working.
Fixed that - thanks!
first of all I would like to say that the idea of using FINFO to obtain file information from a zip archive member is really nice. The implementation and the need of using a sequential read per zip member makes it rather slow when the zipfile contains a combination many file members.. Therefore I have implemented the ZIP member FINFO in such a way, that it is only used if the zip archive contains one single file. Here is the link to the macro I created: https://github.com/paul-canals/toolbox/tree/master/custom/create_zip%20(standalone%20version).
Thanks & best regards,