I've written about how to use the FILENAME ZIP method to read and update ZIP files in your SAS programs. The ZIP method was added in SAS 9.4, and its advantage is that you can accomplish more in SAS without having to launch external utilities such as WinZip, gunzip, or 7-Zip.

Several readers replied with questions about how you can use the content of these ZIP files within your SAS program. The basic scenario is: "I've got some data files in my ZIP archive. I want to use SAS to unzip these and then use them as data within my SAS process. Can I do this?"

Yes, you can -- but it does require an extra step. Even though FILENAME ZIP can show you the contents and structure of your ZIP file, most SAS procedures cannot access the content directly while it's in the archive. So, the additional step is to copy the file to another location, effectively extracting it from the ZIP file.

As an example, I created a ZIP file with two files and a subfolder:

data.zip
|__ sas_tech_talks_15.xlsx
|__ sas/
|__ instanttitles.sas7bdat


This SAS program helps me to discover how FILENAME ZIP sees the file:

filename inzip ZIP "c:\projects\data.zip";   /* Read the "members" (files) from the ZIP file */ data contents(keep=memname isFolder); length memname $200 isFolder 8; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname=dread(fid,i); /* check for trailing / in folder name */ isFolder = (first(reverse(trim(memname)))='/'); output; end; rc=dclose(fid); run; /* create a report of the ZIP contents */ title "Files in the ZIP file"; proc print data=contents noobs N; run; Output:  Files in the ZIP file memname isFolder sas/ 1 sas/instanttitles.sas7bdat 0 sas_tech_talks_15.xlsx 0 N = 3  With this information, I can now "copy" the XLSX file out of the ZIP file and then import it into a SAS data set. Notice how I can use the "member" syntax (fileref with the file I want in parentheses) to address a specific file in the ZIP archive. I want to copy just from the actual files, and not the folder-level entries. /* identify a temp folder in the WORK directory */ filename xl "%sysfunc(getoption(work))/sas_tech_talks_15.xlsx" ; /* hat tip: "data _null_" on SAS-L */ data _null_; /* using member syntax here */ infile inzip(sas_tech_talks_15.xlsx) lrecl=256 recfm=F length=length eof=eof unbuf; file xl lrecl=256 recfm=N; input; put _infile_$varying256. length; return; eof: stop; run;   proc import datafile=xl dbms=xlsx out=confirmed replace; sheet=confirmed; run;

Sample output from my SAS log:

NOTE: The infile INZIP(sas_tech_talks_15.xlsx) is:
Filename=c:\projects\data.zip,
Member Name=sas_tech_talks_15.xlsx

NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file XL is:
Filename=C:\SAS Temporary Files\_TD396_\Prc2\sas_tech_talks_15.xlsx,
RECFM=N,LRECL=256,File Size (bytes)=0,
Create Time=11May2015:11:20:23

NOTE: A total of 55 records were read from the infile library INZIP.
NOTE: 55 records were read from the infile INZIP(sas_tech_talks_15.xlsx).
NOTE: DATA statement used (Total process time):
real time           0.00 seconds
cpu time            0.00 seconds


To use the SAS data set in the file, I need to copy it into a location shared by a SAS library. In this example, I will again use the WORK location. Because my SAS data set is in a logical subfolder (named "sas") within the archive, I need to include that path as part of the member syntax on the INFILE statement.

/* Copy a zipped data set into the WORK library */ filename ds "%sysfunc(getoption(work))/instanttitles.sas7bdat" ;   data _null_; /* reference the member name WITH folder path */ infile inzip(sas/instanttitles.sas7bdat) lrecl=256 recfm=F length=length eof=eof unbuf; file ds lrecl=256 recfm=N; input; put _infile_ $varying256. length; return; eof: stop; run; proc contents data=work.instanttitles; run; Partial output in my example:  Files in the ZIP file The CONTENTS Procedure Data Set Name WORK.INSTANTTITLES Observations 1475 Member Type DATA Variables 6 Engine V9 Indexes 0 Created 01/29/2015 15:09:54 Observation Length 248 Last Modified 01/29/2015 15:09:54 Deleted Observations 0 Protection Compressed NO Data Set Type Sorted NO Label Data Representation WINDOWS_64 Encoding wlatin1 Western (Windows)  Of course, all of this can be automated even further by writing SAS code that automatically iterates through the ZIP file member names and copies/imports each of the members as needed. Share ### About Author Senior Manager, SAS Online Communities +Chris Hemedinger is the manager of SAS Online Communities. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies. He also hosts the SAS Tech Talk webcasts each year from SAS Global Forum, connecting viewers with smart people from SAS R&D and the impressive work that they do. ### 54 Comments 1. Andreas Menrath on WARNING: binary file copy may cause trouble! I just used your binary file copy snippet here: data _null_; /* using member syntax here */ infile inzip(sas_tech_talks_15.xlsx) lrecl=256 recfm=F length=length eof=eof unbuf; file xl lrecl=256 recfm=N; input; put _infile_$varying256. length;
return;
eof:
stop;
run;

For 99% of my files it worked fine. But unfortunately it does not make a 1:1 copy because it drops UTF byte order marks!
I played around with the code, but was not able to fix it. It looks like the UTF BOM is dropped before it is copied into the _infile_ variable :-(

2. Paige Miller on

The following is a SASLOG with an error when I try to duplicate your results in SAS 9.4 TS1M2. What is this error?

131 filename inzip ZIP "c:\users\pmiller\documents\be_output\be_output.zip";
132 filename xl "c:\users\pmiller\documents\be_output\may 2013 be monthly update.xml" ;
133 /* hat tip: "data _null_" on SAS-L */
134 data _null_;
135 /* using member syntax here */
136 infile inzip(May_2013_BE_Monthly_Update.xml) lrecl=256 recfm=F length=length eof=eof unbuf;
137 file xl lrecl=256 recfm=N;
138 input;
139 put _infile_ $varying256. length; 140 return; 141 eof: 142 stop; 143 run; ERROR: Open failure for c:\users\pmiller\documents\be_output\be_output.zip during attempt to create a local file handle. NOTE: UNBUFFERED is the default with RECFM=N. NOTE: The file XL is: Filename=c:\users\pmiller\documents\be_output\may 2013 be monthly update.xml, RECFM=N,LRECL=256,File Size (bytes)=0, Last Modified=09Jun2015:15:47:58, Create Time=09Jun2015:15:47:58 NOTE: The SAS System stopped processing this step because of errors. • Chris Hemedinger on I get that message if the file doesn't exist (c:\users\pmiller\documents\be_output\be_output.zip). Check to make sure that's the correct file name? 3. Hi Chris, Thanks for the blog, I tried your method to read and unzip a gz extension file (myfile.gz), but with no success. This is what I tried : /*************************************************************************************/ filename test zip "c:\mydirectory\myfile.gz" ; filename outtest "C:\Janicedirectory"; data _null_ ; infile test recfm=N ; file outtest recfm=N; input byte$char1. ;
put byte $char1. ; run; • Chris Hemedinger on Did you post a complete example? Try this for a start to see what's in the GZ file:  filename inzip ZIP "c:\mydirectory\myfile.gz"; /* Read the "members" (files) from the ZIP file */ data contents(keep=memname isFolder); length memname$200 isFolder 8; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname=dread(fid,i); /* check for trailing / in folder name */ isFolder = (first(reverse(trim(memname)))='/'); output; end; rc=dclose(fid); run; /* create a report of the ZIP contents */ title "Files in the GZ file"; proc print data=contents noobs N; run; 

• Actually , I have just a single compressed file «myfile.dat.gz». so I need to read and process the data in sas 9.4 from this compressed file. I try to read data from the zipped file by byte and output it to an external file , here is my code :

filename test zip "C:\mydirectory\myfile.dat.gz" member='myfile.dat.gz';
filename outtest "C:\Janicedirectory\myfile.dat";

data _null_ ;
Infile test lrecl= 256 recfm=N ;
File outtest lrecl=265 recfm=N ; /* output file*/
input ;
put _infile_ ;
run;

Still doesn't work ?

• Chris Hemedinger on

Almost there, I think. Try something like this (removing the .gz from the member= option):

 /* assuming file in archive is named "myfile.dat" */ filename test zip "C:\mydirectory\myfile.dat.gz" member='myfile.dat'; 

• Kimberley Shirley on

Hi,

I have a similar issue to Janice. I have a zip file with a single .dat file inside of it that I need to extract and then copy across to another folder without actually reading the .dat file. Currently my code is as follows:

filename test zip "/sasdata/sourcedata/transfer.zip" member='transfer.dat';
filename outtest "/sasdata/target/testtransfer.dat";
data _null_ ;
Infile test lrecl= 256 recfm=N ;
File outtest lrecl=265 recfm=N ; /* output file*/
input ;
put _infile_ ;
run;

Unfortunately, each time I run this, I get the following errors:

ERROR: Out of space writing to file /sasdata/target/testtransfer.dat.
ERROR: Unrecoverable I/O error detected in the execution of the DATA step program. Aborted during the EXECUTION phase.

I have also noted that the transfer.dat file is only 9GB but the size of the folder is 90GB.

• Chris Hemedinger on

I don't know what's going on here -- there might be some additional logging options that you can enable for more diagnostics. I imagine that the compressed file would need to be extracted to a WORK location before it is copied to the final destination, so perhaps that's the area that's running low on space. I suggest working with SAS Tech Support on this.

• Hi Chris,

I had tried the above code for unzipping files, the 7 GB(around 4 million records) non zipped sas dataset gets expanded to more than 65 GB(around 50 million records) after unzipping and since we do not have disk space it shows i/o error and insufficient space so the expansion stops.

I had tested with 2.8 GB sas dataset but it unzips to the tee, without any issue?

• Chris Hemedinger on

I don't know why a zipped data set would contain fewer records than an unzipped data set. I think I'm missing something in your question.

• Before zipping the file size was around 7 GB.
After zipping the file size is around 500MB
But when I am trying to unzip this file , the file is growing in size and the entire disk is full because of this unzip activity.

• Chris Hemedinger on

When you unzip such a large file, you do need a certain amount of scratch space to allow for the file expansion while it's written to disk. I don't know what the formula should be, but I'd say that if you're unzipping the entire 7GB file you should have at least 10-15GB of available space.

• True. We need certain disk space for the file to be written.
I had like 90GB of disk space and still this unzipped file was growing upto all of 90GB.
eventually I had to stop the process and delete the ever growing file.

But files of size 5GB before zipping gets zipped to around 500MB.
When I am unzipping them I am able to unzip them to the tee without any issues and very quickly too with the same method.

So it seems like files above 5GB size follow some different way while zipping, such that it causes issues while unzipping.

• Chris Hemedinger on

I guess we might need to see some sample code that you're using for unzipping. Is it possible that you have the extraction process in a loop that gets run multiple times? I suggest posting the question to SAS Support Communities -- it's easier to supply a better answer there, and other experts can chime in.

4. Are we assuming that the contents of the zip file are as XLSX? What if the data is a CSV?

• Chris Hemedinger on

I used XLSX and a SAS7BDAT file as examples, but CSV would work the same way. Use FILENAME ZIP to "address" the file, DATA step to copy it out as file block, then another DATA step to read it. You might be able to combine those two DATA steps to read the file contents just once, but I don't think you'll be able to INFILE the item as series of text characters while it's in the ZIP archive.

5. HI I am not getting the second part to work where you read in a file. You wrote: "With this information, I can now 'copy" the XLSX file out of the ZIP file and then import it into a SAS data set. Notice how I can use the "member" syntax (fileref with the file I want in parentheses) to address a specific file in the ZIP archive. I want to copy just from the actual files, and not the folder-level entries.' I am not sure but are you manually opening the zip file to put it in the work folder? I have an assignment to work with and find metadata on my entire 2TB network share and have a count of multiple K zip files to open and wanted to make sure this process could auto read, and extract the data without manually opening the files. This code: filename inzip ZIP "D:\MyFileSystem\This_zipped_file.zip";

/* Read the "members" (files) from the ZIP file */
data contents(keep=memname isFolder);
length memname $200 isFolder 8; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname=dread(fid,i); /* check for trailing / in folder name */ isFolder = (first(reverse(trim(memname)))='/'); output; end; rc=dclose(fid); run; /* create a report of the ZIP contents */ title "Files in the ZIP file"; proc print data=contents noobs N; where isFolder =0; VAR MEMNAME; run; /* makes this output: Files in the ZIP file 83 15:58 Tuesday, December 22, 2015 memname CSV/CollectorList.csv CSV/Sheet_1.csv Excel/CollectorList.xls Excel/Sheet_1.xls READ ME.txt N = 5 and for example I have tried these lines of code and a few variations but it does not work... */ filename xl "D:\MyFileSystem\Excel\CollectorList.xls" ; /* hat tip: "data _null_" on SAS-L */ data _null_; /* using member syntax here */ infile inzip(Excel\CollectorList.xls) /*... the rest of your code here..*/ any clues? • ...darn, I just tired my code again after manually extracting the folder and placing it in my path. I now works. Not what I was hoping it to do... :( Thank you for sharing. -KJ • Chris Hemedinger on Keith, my example is supposed to work without you having to manually unzip/extract the ZIP file -- everything should be handled by the SAS program using the FILENAME ZIP method. Is that what you have working? If not, let me know. 6. I'm having it hang up on the eof statement. I'm working with linux SAS. Suggestions would be appreciated.  36 filename ds "%sysfunc(getoption(work))/mic_file.sas7bdat"; 37 data _null_; 38 /*using member syntax here*/ 39 infile inzip(mic_file.sas7bdat) lrecl=256 recfm=F 39 ! length=length eof=eof unbuf; 40 file ds lrecl=256 recfm=N; 41 input; 42 put _infile_$varying256. length; 43 return; 44 eof; ___ 180 ERROR 180-322: Statement is not valid or it is used out of proper order. 45 stop; 46 run; 

• Chris Hemedinger on

In this example, "eof" on line 44 is a label, a target to "goto". Use a colon instead of a semicolon after it.

7. I am back on this project after a few months on other things and now, I am lost again with your sample code. I thought I had it figured out in Dec. In my case I have cloned our file systems with a robocopy command to a sub directory placed at the root level, I used a few excludes to not get any race conditions etc. I then added *.zip files to my robocopy command. I then found a purge empty directory script cut off the excess limbs. Now I have a trimmed down directory with only a copy of the original zip files that I can use as I see fit, my next idea was to extract them all to a relative folder of the location but instead of myfile.zip I would place all the data in myfile.X folder again relative to each *.zip file found. So I have written the SAS code to make the subdir’s and move the files to their new home. Now I need to extract all my files. My problem, and reason I found your post. Once I am done with my extract I have 7-or-8 SAS programs I can clone and point at this sub directly and scan for metadata (xls, xlsx, mdb, accdb, dbf, sas7bdat, and sav ) and produce a report as I am required. But as stated am stuck back on the extract phase. TIA for any pointers you can provide.
…I am just looking for a stripped down version of what you seem to be showing that does not bother with reading the meta data directly of the files found, but just extract them and if possible read from a dataset with one field like MyPathFile where the strings might be: E:\mypath1\mypath2\...\mypathN\MyFile.X\MyFile.zip
E:\mypath1\mypath2\...\mypathN\MyFile1.X\MyFile1.zip
E:\mypath1\mypath2\...\mypathN\MyFile3.X\MyFile3.zip
E:\mypath1\...\mypathN\MyFile.X\MyFile.zip
E:\MyFile.X\MyFile.zip
But I can easily split it in to path and file if needed.
-TIA -KJ

• Chris Hemedinger on

Keith,

For now, I'll have to leave this as an exercise for you -- or you can post your question and code-so-far to SAS Support Communities and perhaps someone else can help.

I can offer these tips though:

- use the dopen and dread functions to find the names/paths of the files inside the ZIP archive (as in my example in this post).

- using that information, create a fileref for each member you want to extract, then use a binary copy method (in DATA step) to copy the byte stream of an archive member to a destination fileref in your target folder. This step may be suited to a macro or to a DATA step loop or even a DOSUBL construct.

I have another FILENAME ZIP blog post teed up to publish soon, so stay tuned for more related information.

8. Diwakar Mahanti on

Hi Chris,

This method is very useful especially when the zip file contains several member files that are not needed except one. Initially I used unix unzip in my SAS code but that unzipped and saved all unwanted files on the server. -Thanks a lot

9. Chris,

Excellent post, but, unless I am incorrect, this approach (and FCOPY()), re-writes the file, thus changing the stamp of the last modification datetime? Have I overlooked anything?

Thank you,

Kevin

PS A short search of the internet suggest that using an X statement would require a third party program, like 7-Zip, in the Windows OS (7 or less, not sure about the newer OS's).

• Chris Hemedinger on

Yes, you're correct. This process rewrites the file and so changes the file date/time information. And yes, X command and a 3rd party tool can be used together for more flexibility in creating/extracting ZIP files.

10. Hi Chris,

I hope you can help, or anyone reading this blog.

I am trying to convert first part of your code into macro but keep getting messages that variable memcount and fid cannot be evaluated. I am just starting using macros and need help with this one in order to process 100s of zip files.
What I am trying to do is to read zip file names from a file and run macro for every zip file to get its contents file. Then I would only extract csv files from the contents file and then use your code from the other blog (using function fcopy) to copy csv files into single directory. In this way I would have all csv files in a single directory.
If I run your code separately for a single file there is no problem. It works fine.

Below is the code that I'd like to convert into macro (copied form your blogs). I hope you can help.

Thanks

STEP 1 - Create CONTENTS file of all CSV files

filename inzip ZIP "c:\projects\&datazip"; * Macro variable &datazip would be read from the file*;

/* Read the "members" (files) from the ZIP file */
data contents(keep=memname isFolder);
length memname $200 isFolder 8; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname=dread(fid,i); /* check for trailing / in folder name */ isFolder = (first(reverse(trim(memname)))='/'); output; end; rc=dclose(fid); proc append data=CONTENTS base=&ALL_CSV_FILES; run; * THis line added by me*; run; STEP 2 - Copy all csv files into _bcout directory. filename xl "%sysfunc(getoption(work))/&csvfile" ; * &csvfile is the file from ALL_CSV_FILES*; filename _bcin "%sysfunc(getoption(work))/&csvfile" recfm=n /* RECFM=N needed for a binary copy */; filename _bcout "C:\MyDir\&csvfile" recfm=n; data _null_; length msg$ 384;
rc=fcopy('_bcin', '_bcout');
if rc=0 then
put 'Copied _bcin to _bcout.';
else do;
msg=sysmsg();
put rc= msg=;
end;
run;

filename _bcin clear;
filename _bcout clear;

• Chris Hemedinger on

Congrats! Glad you got it working. Feel free to post your final version back here, or on SAS Support Communities!

11. Hi,Chris

Thanks for sharing this fancy blog, and it works pretty well for .zip files. Now when I am trying to apply your coding to .7z files,I run into trouble. Could you please show me how to unzip the .7z files and import the unzipped files into SAS datasets? Thanks a lot.

Best regards,
Yajun

• Chris Hemedinger on

Yajun,

Unfortunately 7-zip files are not standard ZIP files, and are not supported. Nor are .gz files (gzip on UNIX). Both feature requests have been entered for the developers.

12. Hello Chris
I am using your program with sas eg7.11 for a zip file on the unix server and am getting contents of 0. I know the zip file should contain only one txt file. Does your code only work in 9.4?

Thanks

• Chris Hemedinger on

Yes, this requires SAS 9.4 for FILENAME ZIP. Your version of EG doesn't matter, in this case.

• Thank your Chris. I am trying to read a .txt file. I can run the first part and see that the zipfile contains only RPT01546.txt . So I tried to run this code.

filename xl "%sysfunc(getoption(work))/RPT01546.txt" ;

/* hat tip: "data _null_" on SAS-L */
data _null_;
/* using member syntax here */
infile inzip(RPT01546.txt)
lrecl=512 recfm=F length=length eof=eof unbuf;
file xl lrecl=512 recfm=N;
input;
put _infile_ $varying256. length; return; eof: stop; run; Here is the error. What am I doing wrong. NOTE: The zip file RPT01546.txt doesn't exist. ERROR: Physical file does not exist, RPT01546.txt. NOTE: UNBUFFERED is the default with RECFM=N. NOTE: The file XL is: Filename=/plunx21/Global/saswork_Global/SAS_work438D000051F7_crlnxp070/SAS_work1E68000051F7_crlnxp070/RPT01546.txt, Owner Name=chihasti,Group Name=UUXStaff, Access Permission=-rw-rw-r--, Last Modified=10Feb2017:16:20:46 • Hello Chris, sorry I just figured out why the error because the file name itself did not have the txt at the end. Can I use proc import without a dlm ? thanks • Chris Hemedinger on Yes, but you might do better with just DATA step. A trick: grab a copy of the file and use SAS EG Import Data task to generate a working DATA step, then generalize that for use in your process. • Chris Hemedinger on Zubair, I don't think FILENAME ZIP can give you the file attributes (like size). 13. Chris, I need to open a sas7bdat that is inside a zip file and I'm with problems into the output dataset. This file is without any variables. I think the problem is in the piece of code below: input; put _infile_$varying256. length;
return;

Can you help me?

• Chris Hemedinger on

Hi Mark, I can't help without more information. If you can share the ZIP file or an example that shows the problem, I suggest posting on SAS Support Communities. If it's not something you can share publicly, you might need to work with SAS Tech Support.

• John Keighley on

I am not sure but why are you using an input statement with a SAS data set? Shouldn't it be a set?

• Chris Hemedinger on

DATA step can't read the file as a SAS data set while it's in the ZIP file. The INFILE and PUT combination is to read the bytes of the compressed data set file, extract and write it out to the file system. Then SAS can read and process it like data.

14. Is there a way to read the file info from the files inside the ZIP?
i.e. Size, Creation date

• Chris Hemedinger on

Yes, with SAS 9.4 Maint 3 or later, the FINFO function can work with zip file members. Here's a quick hit -- will work up a more complete example later.

 %macro info(z);
data _NULL_;
fId = fopen("&amp;z","S");
if fID then
do;
infonum=foptnum(fid);
do i=1 to infonum;
infoname=foptname(fid,i);
infoval=finfo(fid,infoname);
put @1 i= @5 infoname= @35 infoval=;
output;
end;
fId = fClose( fId );
end;
run;
%mend info;

filename zmem zip "c:\myzipfile.zip" member="folder/knownmember.png";
%info(zmem);


• Chris Hemedinger on

There are two methods, but each has drawbacks. The most common approach is to use FILENAME PIPE to run a command that unzips/extracts the item you need. However, this requires the ability to run shell commands from SAS, and many centralized SAS environments have that disabled (for security).

The other method is the undocumented SASZIPAM method. You can find papers/examples with a directed internet search. It won't be as robust as the FILENAME ZIP method that was added to SAS 9.4.