In a previous post, I shared an example of using ODS PACKAGE to create ZIP files. But what if you need to read a ZIP file within your SAS program? In SAS 9.4, you can use the FILENAME ZIP access method to do the job.
In this example, let's pretend that I need to analyze data that a government agency published (maybe by using SAS!) into a ZIP file. I've selected an exciting data source (found via data.gov) about Large Truck Crash Causation.
First, I need to download the latest version of the data file. I'll use PROC HTTP to do that job:
/* detect proper delim for UNIX vs. Windows */ %let delim=%sysfunc(ifc(%eval(&sysscp. = WIN),\,/)); /* create a name for our downloaded ZIP */ %let ziploc = %sysfunc(getoption(work))&delim.datafile.zip; filename download "&ziploc"; /* Download the ZIP file from the Internet*/ proc http method='GET' url="http://ai.fmcsa.dot.gov/ltccs/Data/TEXT/Public/LTCCS_db_txt_public_01.zip" out=download; run; |
Next, I need to discover what files are within the ZIP file. I'll assign a fileref using the new FILENAME ZIP method. FILENAME ZIP is a directory-based access method, similar to the CATALOG access method or to using FILENAME to map to a folder. You can use functions such as DOPEN and DREAD to treat the ZIP file as if it's a file directory (since that's what it is, in concept).
/* Assign a fileref wth the ZIP method */ filename inzip zip "&ziploc"; /* Read the "members" (files) from the ZIP file */ data contents(keep=memname); length memname $200; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname=dread(fid,i); output; end; rc=dclose(fid); run; /* create a report of the ZIP contents */ title "Files in the ZIP file"; proc print data=contents noobs N; run; |
Here's the report of files within the ZIP archive:
I've identified the HAZMAT.TXT file as the one that I want to analyze. I peeked at the first couple of records and was able to scratch out a simple DATA step to read the data. Notice how I don't need to explicitly extract the HAZMAT.TXT file -- I can simply reference it as a "member" of the INZIP fileref. The ZIP access method does the rest.
/* Import a text file directly from the ZIP */ data hazmat; infile inzip(hazmat.txt) firstobs=2 dsd dlm='09'x; input CaseID $10. VehicleNumber Material Reportable Waiver PSU PSUStrata RATWeight; run; title "Box plot of Vehicles # per incident"; ods graphics / height=200 width=450; proc sgplot data=hazmat; hbox vehiclenumber; label VehicleNumber="# of vehicles"; xaxis labelattrs=(size=12) valueattrs=(size=12); run; |
SAS reads my data file successfully, and yields this interesting box plot from the SGPLOT step:
(It looks like most "hazardous materials" accidents involved just 2 or 3 vehicles, except for one messy outlier that had nearly 30. Imagine the cleanup effort on that one!)
As an alternative, if I know exactly which file I need, I can assign a direct fileref by using the MEMBER= syntax:
filename inzip zip "&ziploc" member="hazmat.txt"; /* then my INFILE references the file directly, no parenthesized-member */ data hazmat; infile inzip firstobs=2 dsd dlm='09'x; /* ... */ |
The ZIP access method isn't just for reading. I can also use it to create and update ZIP files. For creating ZIP files, I prefer to use ODS PACKAGE. But it's very handy to be able to update ZIP files from a SAS program without using an external tool. For example, here's a program that deletes an extraneous file from an existing ZIP file:
/* Remove the PackageMetadata piece that ODS PACKAGE creates */ filename pkg ZIP "c:\projects\filenamezip\new.zip" member="PackageMetaData"; data _null_; if (fexist('pkg')) then rc = fdelete('pkg'); run; |
Note: Like ODS PACKAGE, the FILENAME ZIP method does not support encrypted (password-protected) ZIP archives.
Download the complete SAS 9.4 program: filenameZipHttpExample.sas
Thanks to the growing size of data files, ZIP files are created and consumed by SAS users everywhere. Between ODS PACKAGE and FILENAME ZIP, you can teach your SAS programs to build and read the files without having to rely on external tools. The more you that you can use native SAS methods for this work, the more portable your SAS programs will be.
See also
Using FILENAME ZIP to unzip and read SAS data files in SAS
Reading and writing GZIP files in SAS
45 Comments
Beautiful!
Thanks for providing example how to get files within zip file listed!
I can use this immediately.
But what if your external files are password protected how can you read from SAS directly ?
Thanks much
For password-protected ZIP files, you'll still need to use external tools like WinZip, 7-Zip, or gzip. This SAS Global Forum paper shows how that can work.
I have the same question, but for now, 2022. We're using SAS Viya and the X command isn't available. Any other options for a password-protected zip file these days? (Great articles Chris, by the way, I've used them a number of times!)
Great to hear! Any chance you're using SAS Viya 2021.1.6 or later? If so, PROC PYTHON might provide a way to use a Python package to accomplish the task in your process.
Hi Chris
I have a little problem because i receive a archive zip file that has the following structure
ZIP name
Nielsen
dir001 with file aaa
dir002 with file aaa
dir003 with file aaa
dir004 with file aaa
dir005 with file aaa
dir006 with file aaa
in each sub-folder there is a file that has the same name in all subdirectories but changes between reception of the zip file and the next and also the number of sub folder was variable.
how can i read all files present in archive zip file (the program is a automatic program without human control).
If I understand you, it sounds like you have repeating file names within the same archive. In the archive they are in a folder structure, so they can be treated distinctly. But when you use a SAS program to process, all of the files end up in a single folder? You would need to rename the file as you extract it, perhaps based on the folder name, to keep the name unique in the folder. You could use the SAS RENAME function for this.
Hi Chris,
I met a similar situation. I couldn't find how to navigate a directory structure stored inside a zip file.
Chris... Is there a way to read in a SAS dataset that's been zipped?
Marc, yes, I think so. First, you would have to use the FILENAME ZIP method to copy the zipped data file from a ZIP archive. Then, you would assign a library to the location where you just copied that data file, and access the data from there.
You can do something like this to reach the file inside the ZIP, then copy it to a target folder:
Then use something like the binaryFileCopy macro (which I shared in this post) to copy the file and access as data:
Hi Chris,
Thanks for this it's very useful!
What if there are excel files in the zipped folder, how would one go about reading them into SAS?
Kind Regards
You can discover the member name using the example I provided here. Once you know the member name, you can assign a fileref to the Excel file you want to read. You can't PROC IMPORT the Excel file directly from the ZIP file, so you'll need to copy it out first. Here's an example that copies an XLSX file to the SAS Work location, and then runs PROC IMPORT on the result.
Hi Chris
Thanks for this, it was exactly the solution I was looking for to support an analytic project with zipped source files. Just out of interest, does the ZIP engine work with the newer ZIPX file format?
Cheers
Darren
Darren, I'm going to say that most likely: No, the FILENAME ZIP method would not support ZIPX (which I had to look up). That's a proprietary set of extensions created by WinZip.
Thanks Chris.
Thanks Chris, is there an equivalent for gzip (.gz) files?
José,
gz is usually for a single file, while ZIP bundles up a collection of files and compresses them. There isn't a FILENAME method for gz, but ZIP and GZIP are mostly compatible. That is, you should be able to read a GZIPped file with the FILENAME ZIP method and vice versa. As some others have pointed out, password-protected files are not supported -- so that's one feature difference.
My data is encrypted and requires a password.
Can I add to the parameters? It is a simple fixed length file, once I give a pw, unzip it, and read it in!
Thanks
Do you mean that the ZIP file is encrypted with a password? Then no, that is not supported by FILENAME ZIP.
If the ZIP file contains a SAS data set that is protected with a data set password, you can specify that in syntax when you read the data set. First, you must extract the data to a folder, then assign a library. This example shows you how.
I have a folder, and i want to zip this folder by using SAS, create a password for zip folder by using sas.
Unfortunately, none of the SAS language methods (ODS PACKAGE, FILENAME ZIP) support passwords. You'll have to use the "old school" method: use X command or SYSTASK to call 7zip or gzip commands to compress with a password.
Hi Chris,
Thanks for the post. It's very helpful.
Glad to hear it! Thanks!
Beautiful! thanks Chris. But this only works in SAS 9.4. Anyway to read zip files in 9.3?
Eric, there is the unsupported SASZIPAM method. Examples are around the internet, such as in this conference paper.
Good to know it. Thanks!
Chris,
This is very helpful. Thanks so much. I am migrating a bunch of files from SDD using the desktop connection, and I am finding that when I extract the files from the zip that they don't retain their true original creation date. Is there a way to get around this? We are migrating thousands of files and need to zip them to move them.
Thanks.
How would you deal with a zip file inside the zip without having to extract the second archive?
To be clear, I have xxx.zip on which I can filename xx ZIP "xxx.zip" ; but the content of this is yyy.zip. How can I access from a filename statement the content of yyy.zip which happens to contain a csv file I want the dataset ?
There is NO getting around that extraction. At least while using SAS, you'll have to extract the embedded ZIP and then use FILENAME ZIP to access that result, then extract the CSV within that. It's possible that other tools hide this complexity, but any process that needs to get to that CSV file will need to read/extract the entire "nesting doll" of ZIPs.
Hi Chris,
This is very helpful, is there a way to move single pdf from a zipfile to a local folder?
Thanks,
Yes - I have an example in this blog post using an XLSX file. You can do the same with a PDF file if you know the file name.
Pingback: Add files to a ZIP archive with FILENAME ZIP - The SAS Dummy
Pingback: Using FILENAME ZIP to unzip and read data files in SAS - The SAS Dummy
Pingback: Using SAS and ODS PACKAGE to create ZIP files - The SAS Dummy
Thanks for this Chris. What if there are csv files in the zipped folder, how would one go about reading them into SAS?
I've got an example that comes close to that in this blog post.
Pingback: Using FILENAME ZIP and FINFO to list the details in your ZIP files - The SAS Dummy
Hi, Chris,
I'm using SAS EG 7.15 HF2 with a UNIX (AIX) back end. I can't get the ZIP method to find out the contents of a zip archive. The DOPEN consistently returns a zero. I've tried multiple zip files. Is the ZIP method only for those whose SAS executes on a Windows machine?
Jim
Code used:
*------------------------------------------------------------------------------------------------*;
** File and library allocations. **;
FILENAME FileIn ZIP "&Zip_Dir.";
&NoMacs %Error_Check (MsgLvl=&MsgLvl, ErrLvl=&ErrLvl);
LIBNAME SASout "&SAS_Lib";
&NoMacs %Error_Check (MsgLvl=&MsgLvl, ErrLvl=&ErrLvl);
*------------------------------------------------------------------------------------------------*;
** Program logic. **;
DATA SASout.&SAS_Out;
DROP _:;
LENGTH mem_name $200.;
LENGTH _File_ID 8.;
LENGTH _mem_count 8.;
LENGTH _i 8.;
_File_ID = DOPEN("FileIn");
IF MISSING(_File_ID) OR
_File_ID = 0 THEN
DO;
CALL SYMPUT('SYSCC','4');
PUTLOG "WARNING- ";
PUTLOG "WARNING- ******************************************************************** ";
PUTLOG "WARNING: Unable to open file. " _N_= _File_ID=;
PUTLOG "WARNING- ******************************************************************** ";
PUTLOG "WARNING- ";
STOP;
END;
_mem_count = DNUM(_File_ID);
IF NOT MISSING(_mem_count) THEN
DO _i = 1 TO _mem_count;
mem_name = DREAD(_File_ID, _i);
OUTPUT;
END;
ELSE
DO;
CALL SYMPUT('SYSCC','4');
PUTLOG "WARNING- ";
PUTLOG "WARNING- ******************************************************************** ";
PUTLOG "WARNING: Unable to process members. " _N_= _File_ID= _mem_count=;
PUTLOG "WARNING- ******************************************************************** ";
PUTLOG "WARNING- ";
STOP;
END;
_RC = DCLOSE(_File_ID);
RUN;
&NoMacs %Error_Check (MsgLvl=&MsgLvl, ErrLvl=&ErrLvl);
*------------------------------------------------------------------------------------------------*;
No Jim, it works with UNIX platforms too. Requires SAS 9.4 or later. Does &ZIP_Dir in your program refer to a ZIP file or to a folder that contains ZIP files? The FILENAME ZIP method requires a filename, not directory name.
Hi, Chris, thanks for your very speedy response!
Hmm. Well, I sort of figured that it was intended to work for UNIX, but no luck thus far.
In answer to your question, &Zip_Dir points to:
NOTE: The file being processed is:
/analytics/data_intelligence/Reporting/TU_FFR40TU_FFR40.zip
I'm running SAS 9.4:
SAS (r) Proprietary Software Release 9.4 TS1M4
I did notice one other thing. There's a note in my log as follows:
NOTE: List Handle Creation Failed.
My system errors are as follows (if they're salient):
NOTE: Checking for warnings and errors
SYSERR = 4
SYSCC = 4
SYSFILRC = 0
SYSLIBRC = 0
SQLRC = 0
SQLXRC = 0
ReturnCode = 0
Jim
Jim -- I'm stumped. I know this works with UNIX -- I've used in SAS University Edition (Linux-based) and in other Linux environments here. I suggest that you open a track with tech support. First, double-check that the file exists with the name you specified -- remember, case has to match on UNIX.
Hi, Chris,
I guess the "NOTE: List Handle Creation Failed" message didn't help any? That was the only diagnostic I got.
I've tried this with multiple files, and I cut and paste in the names to avoid the very problem you mention (lower case vs. upper case). I also created a zip archive on my Win 7 machine and brought it over both in "text" mode and binary to see if perhaps the mode of transport were the issue. It was not.
I'll check with tech support.
Thanks,
Jim
Oh, gad! I'm a complete idiot. I forgot to put a "/" between my macro variables when I concatenated the path with the file name. Duh. I are a very smart computer programmer. Amazing how much better it works when you get the path and file name put together properly.
However, somewhat in my defense, I'm going to go ahead and say "NOTE: List Handle Creation Failed" is a wee tad obscure in terms of a diagnostic message. "File not found" would have been a whole lot more helpful, especially for a SAS Dummy like me. :)
Jim
We've all been there, Jim -- glad it's working now!
OK! It's all integrated and working:
a) Get member list, identifying which members are directories.
b) Sort so sub directories inside the archive are first in the member list.
c) DATA step to create all sub directories and then call execute a macro to read in the zipped data and write to UNIX directory
Very nice, very useful (when you don't forget to put a "/" between your path macro variable and your file macro variable).
Thanks, Chris!