I've written about how to use the FILENAME ZIP method to read and update ZIP files in your SAS programs. The ZIP method was added in SAS 9.4, and its advantage is that you can accomplish more in SAS without having to launch external utilities such as WinZip, gunzip, or 7-Zip.
Several readers replied with questions about how you can use the content of these ZIP files within your SAS program. The basic scenario is: "I've got some data files in my ZIP archive. I want to use SAS to unzip these and then use them as data within my SAS process. Can I do this?"
Yes, you can -- but it does require an extra step. Even though FILENAME ZIP can show you the contents and structure of your ZIP file, most SAS procedures cannot access the content directly while it's in the archive. So, the additional step is to copy the file to another location, effectively extracting it from the ZIP file.
As an example, I created a ZIP file with two files and a subfolder:
data.zip |__ sas_tech_talks_15.xlsx |__ sas/ |__ instanttitles.sas7bdat
This SAS program helps me to discover how FILENAME ZIP sees the file:
filename inzip ZIP "c:\projects\data.zip"; /* Read the "members" (files) from the ZIP file */ data contents(keep=memname isFolder); length memname $200 isFolder 8; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname=dread(fid,i); /* check for trailing / in folder name */ isFolder = (first(reverse(trim(memname)))='/'); output; end; rc=dclose(fid); run; /* create a report of the ZIP contents */ title "Files in the ZIP file"; proc print data=contents noobs N; run; |
Output:
Files in the ZIP file memname isFolder sas/ 1 sas/instanttitles.sas7bdat 0 sas_tech_talks_15.xlsx 0 N = 3
With this information, I can now "copy" the XLSX file out of the ZIP file and then import it into a SAS data set. Notice how I can use the "member" syntax (fileref with the file I want in parentheses) to address a specific file in the ZIP archive. I want to copy just from the actual files, and not the folder-level entries.
/* identify a temp folder in the WORK directory */ filename xl "%sysfunc(getoption(work))/sas_tech_talks_15.xlsx" ; /* hat tip: "data _null_" on SAS-L */ data _null_; /* using member syntax here */ infile inzip(sas_tech_talks_15.xlsx) lrecl=256 recfm=F length=length eof=eof unbuf; file xl lrecl=256 recfm=N; input; put _infile_ $varying256. length; return; eof: stop; run; proc import datafile=xl dbms=xlsx out=confirmed replace; sheet=confirmed; run; |
Sample output from my SAS log:
NOTE: The infile INZIP(sas_tech_talks_15.xlsx) is: Filename=c:\projects\data.zip, Member Name=sas_tech_talks_15.xlsx NOTE: UNBUFFERED is the default with RECFM=N. NOTE: The file XL is: Filename=C:\SAS Temporary Files\_TD396_\Prc2\sas_tech_talks_15.xlsx, RECFM=N,LRECL=256,File Size (bytes)=0, Last Modified=11May2015:11:38:59, Create Time=11May2015:11:20:23 NOTE: A total of 55 records were read from the infile library INZIP. NOTE: 55 records were read from the infile INZIP(sas_tech_talks_15.xlsx). NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
To use the SAS data set in the file, I need to copy it into a location shared by a SAS library. In this example, I will again use the WORK location. Because my SAS data set is in a logical subfolder (named "sas") within the archive, I need to include that path as part of the member syntax on the INFILE statement.
/* Copy a zipped data set into the WORK library */ filename ds "%sysfunc(getoption(work))/instanttitles.sas7bdat" ; data _null_; /* reference the member name WITH folder path */ infile inzip(sas/instanttitles.sas7bdat) lrecl=256 recfm=F length=length eof=eof unbuf; file ds lrecl=256 recfm=N; input; put _infile_ $varying256. length; return; eof: stop; run; proc contents data=work.instanttitles; run; |
Partial output in my example:
Files in the ZIP file The CONTENTS Procedure Data Set Name WORK.INSTANTTITLES Observations 1475 Member Type DATA Variables 6 Engine V9 Indexes 0 Created 01/29/2015 15:09:54 Observation Length 248 Last Modified 01/29/2015 15:09:54 Deleted Observations 0 Protection Compressed NO Data Set Type Sorted NO Label Data Representation WINDOWS_64 Encoding wlatin1 Western (Windows)
Of course, all of this can be automated even further by writing SAS code that automatically iterates through the ZIP file member names and copies/imports each of the members as needed.
75 Comments
Hoping for better compression rates like ZIP with more seamless access in SAS, I submitted a SAS ballot for much better compression for SAS data sets. If anyone else is interested in both, please consider voting on this ballot idea.
WARNING: binary file copy may cause trouble!
I just used your binary file copy snippet here:
data _null_;
/* using member syntax here */
infile inzip(sas_tech_talks_15.xlsx)
lrecl=256 recfm=F length=length eof=eof unbuf;
file xl lrecl=256 recfm=N;
input;
put _infile_ $varying256. length;
return;
eof:
stop;
run;
For 99% of my files it worked fine. But unfortunately it does not make a 1:1 copy because it drops UTF byte order marks!
I played around with the code, but was not able to fix it. It looks like the UTF BOM is dropped before it is copied into the _infile_ variable :-(
Andreas, thanks for pointing that out. There are other methods to copy files within SAS, including the method that I shared here from SAS expert Bruno Mueller. I had used the method in this post as a bit of a shortcut.
The following is a SASLOG with an error when I try to duplicate your results in SAS 9.4 TS1M2. What is this error?
131 filename inzip ZIP "c:\users\pmiller\documents\be_output\be_output.zip";
132 filename xl "c:\users\pmiller\documents\be_output\may 2013 be monthly update.xml" ;
133 /* hat tip: "data _null_" on SAS-L */
134 data _null_;
135 /* using member syntax here */
136 infile inzip(May_2013_BE_Monthly_Update.xml) lrecl=256 recfm=F length=length eof=eof unbuf;
137 file xl lrecl=256 recfm=N;
138 input;
139 put _infile_ $varying256. length;
140 return;
141 eof:
142 stop;
143 run;
ERROR: Open failure for c:\users\pmiller\documents\be_output\be_output.zip during attempt to create a local file handle.
NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file XL is:
Filename=c:\users\pmiller\documents\be_output\may 2013 be monthly update.xml,
RECFM=N,LRECL=256,File Size (bytes)=0,
Last Modified=09Jun2015:15:47:58,
Create Time=09Jun2015:15:47:58
NOTE: The SAS System stopped processing this step because of errors.
I get that message if the file doesn't exist (
c:\users\pmiller\documents\be_output\be_output.zip
). Check to make sure that's the correct file name?I also got this message when I tried to unzip a file that was still being downloaded, so that it locked by the downloading process
Will it be working on Unix as well?
Thanks,
Andrei Mitiaev
Yes, it will work on all of the Unix variations.
Hi Chris,
Thanks for the blog, I tried your method to read and unzip a gz extension file (myfile.gz), but with no success. This is what I tried :
/*************************************************************************************/
filename test zip "c:\mydirectory\myfile.gz" ;
filename outtest "C:\Janicedirectory";
data _null_ ;
infile test recfm=N ;
file outtest recfm=N;
input byte $char1. ;
put byte $char1. ;
run;
Did you post a complete example? Try this for a start to see what's in the GZ file:
filename inzip ZIP "c:\mydirectory\myfile.gz"; /* Read the "members" (files) from the ZIP file */ data contents(keep=memname isFolder); length memname $200 isFolder 8; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname=dread(fid,i); /* check for trailing / in folder name */ isFolder = (first(reverse(trim(memname)))='/'); output; end; rc=dclose(fid); run; /* create a report of the ZIP contents */ title "Files in the GZ file"; proc print data=contents noobs N; run;
Actually , I have just a single compressed file «myfile.dat.gz». so I need to read and process the data in sas 9.4 from this compressed file. I try to read data from the zipped file by byte and output it to an external file , here is my code :
filename test zip "C:\mydirectory\myfile.dat.gz" member='myfile.dat.gz';
filename outtest "C:\Janicedirectory\myfile.dat";
data _null_ ;
Infile test lrecl= 256 recfm=N ;
File outtest lrecl=265 recfm=N ; /* output file*/
input ;
put _infile_ ;
run;
Still doesn't work ?
Almost there, I think. Try something like this (removing the .gz from the member= option):
/* assuming file in archive is named "myfile.dat" */ filename test zip "C:\mydirectory\myfile.dat.gz" member='myfile.dat';
Hi,
I have a similar issue to Janice. I have a zip file with a single .dat file inside of it that I need to extract and then copy across to another folder without actually reading the .dat file. Currently my code is as follows:
filename test zip "/sasdata/sourcedata/transfer.zip" member='transfer.dat';
filename outtest "/sasdata/target/testtransfer.dat";
data _null_ ;
Infile test lrecl= 256 recfm=N ;
File outtest lrecl=265 recfm=N ; /* output file*/
input ;
put _infile_ ;
run;
Unfortunately, each time I run this, I get the following errors:
ERROR: Out of space writing to file /sasdata/target/testtransfer.dat.
ERROR: Unrecoverable I/O error detected in the execution of the DATA step program. Aborted during the EXECUTION phase.
I have also noted that the transfer.dat file is only 9GB but the size of the folder is 90GB.
I don't know what's going on here -- there might be some additional logging options that you can enable for more diagnostics. I imagine that the compressed file would need to be extracted to a WORK location before it is copied to the final destination, so perhaps that's the area that's running low on space. I suggest working with SAS Tech Support on this.
Hi Chris,
I had tried the above code for unzipping files, the 7 GB(around 4 million records) non zipped sas dataset gets expanded to more than 65 GB(around 50 million records) after unzipping and since we do not have disk space it shows i/o error and insufficient space so the expansion stops.
I had tested with 2.8 GB sas dataset but it unzips to the tee, without any issue?
I don't know why a zipped data set would contain fewer records than an unzipped data set. I think I'm missing something in your question.
Before zipping the file size was around 7 GB.
After zipping the file size is around 500MB
But when I am trying to unzip this file , the file is growing in size and the entire disk is full because of this unzip activity.
When you unzip such a large file, you do need a certain amount of scratch space to allow for the file expansion while it's written to disk. I don't know what the formula should be, but I'd say that if you're unzipping the entire 7GB file you should have at least 10-15GB of available space.
True. We need certain disk space for the file to be written.
I had like 90GB of disk space and still this unzipped file was growing upto all of 90GB.
eventually I had to stop the process and delete the ever growing file.
But files of size 5GB before zipping gets zipped to around 500MB.
When I am unzipping them I am able to unzip them to the tee without any issues and very quickly too with the same method.
So it seems like files above 5GB size follow some different way while zipping, such that it causes issues while unzipping.
I guess we might need to see some sample code that you're using for unzipping. Is it possible that you have the extraction process in a loop that gets run multiple times? I suggest posting the question to SAS Support Communities -- it's easier to supply a better answer there, and other experts can chime in.
My code
filename in02 ZIP "C:\mydirectory/mo_od_main_JT00_2002.csv.gz" member='mo_od_main_JT00_2002.csv' GZIP;
filename out02 "C:\teamdirectory/mo_od_main_JT00_2002.sas7bdat";
data mylib.file02;
infile in02 recfm=N;
file out02 recfm=N;
input;
put _infile_;
run;
Log result:
NOTE: 3937 records were read from the infile IN02.
NOTE: The data set MYLIB.FILE02 has 3937 observations and 0 variables.
NOTE: DATA statement used (Total process time):
real time 0.39 seconds
cpu time 0.42 seconds
There are over 1 million records in the folder.
A couple of notes. First, you don't need the member= option, because GZIP files have just one file that's compressed. Second, don't use a FILENAME statement for the SAS data set. You just need the DATA mylib.file02 to identify where the data will go.
Since the file is a CSV, you can INFILE and INPUT the records directly. See this article for GZIP examples.
Are we assuming that the contents of the zip file are as XLSX? What if the data is a CSV?
I used XLSX and a SAS7BDAT file as examples, but CSV would work the same way. Use FILENAME ZIP to "address" the file, DATA step to copy it out as file block, then another DATA step to read it. You might be able to combine those two DATA steps to read the file contents just once, but I don't think you'll be able to INFILE the item as series of text characters while it's in the ZIP archive.
HI I am not getting the second part to work where you read in a file. You wrote: "With this information, I can now 'copy" the XLSX file out of the ZIP file and then import it into a SAS data set. Notice how I can use the "member" syntax (fileref with the file I want in parentheses) to address a specific file in the ZIP archive. I want to copy just from the actual files, and not the folder-level entries.' I am not sure but are you manually opening the zip file to put it in the work folder? I have an assignment to work with and find metadata on my entire 2TB network share and have a count of multiple K zip files to open and wanted to make sure this process could auto read, and extract the data without manually opening the files. This code: filename inzip ZIP "D:\MyFileSystem\This_zipped_file.zip";
/* Read the "members" (files) from the ZIP file */
data contents(keep=memname isFolder);
length memname $200 isFolder 8;
fid=dopen("inzip");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
memname=dread(fid,i);
/* check for trailing / in folder name */
isFolder = (first(reverse(trim(memname)))='/');
output;
end;
rc=dclose(fid);
run;
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=contents noobs N;
where isFolder =0;
VAR MEMNAME;
run;
/*
makes this output:
Files in the ZIP file 83
15:58 Tuesday, December 22, 2015
memname
CSV/CollectorList.csv
CSV/Sheet_1.csv
Excel/CollectorList.xls
Excel/Sheet_1.xls
READ ME.txt
N = 5
and for example I have tried these lines of code and a few variations but it does not work...
*/
filename xl "D:\MyFileSystem\Excel\CollectorList.xls" ;
/* hat tip: "data _null_" on SAS-L */
data _null_;
/* using member syntax here */
infile inzip(Excel\CollectorList.xls)
/*... the rest of your code here..*/
any clues?
...darn, I just tired my code again after manually extracting the folder and placing it in my path. I now works. Not what I was hoping it to do... :( Thank you for sharing. -KJ
Keith, my example is supposed to work without you having to manually unzip/extract the ZIP file -- everything should be handled by the SAS program using the FILENAME ZIP method. Is that what you have working? If not, let me know.
I'm having it hang up on the eof statement. I'm working with linux SAS. Suggestions would be appreciated.
36 filename ds "%sysfunc(getoption(work))/mic_file.sas7bdat"; 37 data _null_; 38 /*using member syntax here*/ 39 infile inzip(mic_file.sas7bdat) lrecl=256 recfm=F 39 ! length=length eof=eof unbuf; 40 file ds lrecl=256 recfm=N; 41 input; 42 put _infile_ $varying256. length; 43 return; 44 eof; ___ 180 ERROR 180-322: Statement is not valid or it is used out of proper order. 45 stop; 46 run;
In this example, "eof" on line 44 is a label, a target to "goto". Use a colon instead of a semicolon after it.
I am back on this project after a few months on other things and now, I am lost again with your sample code. I thought I had it figured out in Dec. In my case I have cloned our file systems with a robocopy command to a sub directory placed at the root level, I used a few excludes to not get any race conditions etc. I then added *.zip files to my robocopy command. I then found a purge empty directory script cut off the excess limbs. Now I have a trimmed down directory with only a copy of the original zip files that I can use as I see fit, my next idea was to extract them all to a relative folder of the location but instead of myfile.zip I would place all the data in myfile.X folder again relative to each *.zip file found. So I have written the SAS code to make the subdir’s and move the files to their new home. Now I need to extract all my files. My problem, and reason I found your post. Once I am done with my extract I have 7-or-8 SAS programs I can clone and point at this sub directly and scan for metadata (xls, xlsx, mdb, accdb, dbf, sas7bdat, and sav ) and produce a report as I am required. But as stated am stuck back on the extract phase. TIA for any pointers you can provide.
…I am just looking for a stripped down version of what you seem to be showing that does not bother with reading the meta data directly of the files found, but just extract them and if possible read from a dataset with one field like MyPathFile where the strings might be: E:\mypath1\mypath2\...\mypathN\MyFile.X\MyFile.zip
E:\mypath1\mypath2\...\mypathN\MyFile1.X\MyFile1.zip
E:\mypath1\mypath2\...\mypathN\MyFile3.X\MyFile3.zip
E:\mypath1\...\mypathN\MyFile.X\MyFile.zip
E:\MyFile.X\MyFile.zip
But I can easily split it in to path and file if needed.
-TIA -KJ
Keith,
For now, I'll have to leave this as an exercise for you -- or you can post your question and code-so-far to SAS Support Communities and perhaps someone else can help.
I can offer these tips though:
- use the dopen and dread functions to find the names/paths of the files inside the ZIP archive (as in my example in this post).
- using that information, create a fileref for each member you want to extract, then use a binary copy method (in DATA step) to copy the byte stream of an archive member to a destination fileref in your target folder. This step may be suited to a macro or to a DATA step loop or even a DOSUBL construct.
I have another FILENAME ZIP blog post teed up to publish soon, so stay tuned for more related information.
Pingback: Add files to a ZIP archive with FILENAME ZIP - The SAS Dummy
Hi Chris,
This method is very useful especially when the zip file contains several member files that are not needed except one. Initially I used unix unzip in my SAS code but that unzipped and saved all unwanted files on the server. -Thanks a lot
Chris,
Excellent post, but, unless I am incorrect, this approach (and FCOPY()), re-writes the file, thus changing the stamp of the last modification datetime? Have I overlooked anything?
Thank you,
Kevin
PS A short search of the internet suggest that using an X statement would require a third party program, like 7-Zip, in the Windows OS (7 or less, not sure about the newer OS's).
Yes, you're correct. This process rewrites the file and so changes the file date/time information. And yes, X command and a 3rd party tool can be used together for more flexibility in creating/extracting ZIP files.
Hi, Chris. Great post, and helpful as always. Is there a way to either modify this approach or take a different one when unzipping in SAS to keep the original file creation date intact?
Thanks,
Jenna
Not with this method. This "unzip" is extracting a file and creating a new copy of the file, with new attributes. To keep the original attributes, you would have to use the zip tools directly.
Hi Chris,
I hope you can help, or anyone reading this blog.
I am trying to convert first part of your code into macro but keep getting messages that variable memcount and fid cannot be evaluated. I am just starting using macros and need help with this one in order to process 100s of zip files.
What I am trying to do is to read zip file names from a file and run macro for every zip file to get its contents file. Then I would only extract csv files from the contents file and then use your code from the other blog (using function fcopy) to copy csv files into single directory. In this way I would have all csv files in a single directory.
If I run your code separately for a single file there is no problem. It works fine.
Below is the code that I'd like to convert into macro (copied form your blogs). I hope you can help.
Thanks
STEP 1 - Create CONTENTS file of all CSV files
filename inzip ZIP "c:\projects\&datazip"; * Macro variable &datazip would be read from the file*;
/* Read the "members" (files) from the ZIP file */
data contents(keep=memname isFolder);
length memname $200 isFolder 8;
fid=dopen("inzip");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
memname=dread(fid,i);
/* check for trailing / in folder name */
isFolder = (first(reverse(trim(memname)))='/');
output;
end;
rc=dclose(fid);
proc append data=CONTENTS base=&ALL_CSV_FILES; run; * THis line added by me*;
run;
STEP 2 - Copy all csv files into _bcout directory.
filename xl "%sysfunc(getoption(work))/&csvfile" ; * &csvfile is the file from ALL_CSV_FILES*;
filename _bcin "%sysfunc(getoption(work))/&csvfile" recfm=n /* RECFM=N needed for a binary copy */;
filename _bcout "C:\MyDir\&csvfile" recfm=n;
data _null_;
length msg $ 384;
rc=fcopy('_bcin', '_bcout');
if rc=0 then
put 'Copied _bcin to _bcout.';
else do;
msg=sysmsg();
put rc= msg=;
end;
run;
filename _bcin clear;
filename _bcout clear;
Hi Chris,
Macro works now. I managed to solve the issue.
Thanks for the nice post.
Congrats! Glad you got it working. Feel free to post your final version back here, or on SAS Support Communities!
Hi,Chris
Thanks for sharing this fancy blog, and it works pretty well for .zip files. Now when I am trying to apply your coding to .7z files,I run into trouble. Could you please show me how to unzip the .7z files and import the unzipped files into SAS datasets? Thanks a lot.
Best regards,
Yajun
Yajun,
Unfortunately 7-zip files are not standard ZIP files, and are not supported. Nor are .gz files (gzip on UNIX). Both feature requests have been entered for the developers.
Hello Chris
I am using your program with sas eg7.11 for a zip file on the unix server and am getting contents of 0. I know the zip file should contain only one txt file. Does your code only work in 9.4?
Thanks
Yes, this requires SAS 9.4 for FILENAME ZIP. Your version of EG doesn't matter, in this case.
Thank your Chris. I am trying to read a .txt file. I can run the first part and see that the zipfile contains only RPT01546.txt . So I tried to run this code.
filename xl "%sysfunc(getoption(work))/RPT01546.txt" ;
/* hat tip: "data _null_" on SAS-L */
data _null_;
/* using member syntax here */
infile inzip(RPT01546.txt)
lrecl=512 recfm=F length=length eof=eof unbuf;
file xl lrecl=512 recfm=N;
input;
put _infile_ $varying256. length;
return;
eof:
stop;
run;
Here is the error. What am I doing wrong.
NOTE: The zip file RPT01546.txt doesn't exist.
ERROR: Physical file does not exist, RPT01546.txt.
NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file XL is:
Filename=/plunx21/Global/saswork_Global/SAS_work438D000051F7_crlnxp070/SAS_work1E68000051F7_crlnxp070/RPT01546.txt,
Owner Name=chihasti,Group Name=UUXStaff,
Access Permission=-rw-rw-r--,
Last Modified=10Feb2017:16:20:46
Hello Chris, sorry I just figured out why the error because the file name itself did not have the txt at the end. Can I use proc import without a dlm ?
thanks
Yes, but you might do better with just DATA step. A trick: grab a copy of the file and use SAS EG Import Data task to generate a working DATA step, then generalize that for use in your process.
Hi Chris
How would I be able to print out file sizes of the files within the ZIP?
Thanks
Zubair, I don't think FILENAME ZIP can give you the file attributes (like size).
Chris,
I need to open a sas7bdat that is inside a zip file and I'm with problems into the output dataset. This file is without any variables.
I think the problem is in the piece of code below:
input;
put _infile_ $varying256. length;
return;
Can you help me?
Hi Mark, I can't help without more information. If you can share the ZIP file or an example that shows the problem, I suggest posting on SAS Support Communities. If it's not something you can share publicly, you might need to work with SAS Tech Support.
I am not sure but why are you using an input statement with a SAS data set? Shouldn't it be a set?
DATA step can't read the file as a SAS data set while it's in the ZIP file. The INFILE and PUT combination is to read the bytes of the compressed data set file, extract and write it out to the file system. Then SAS can read and process it like data.
Is there a way to read the file info from the files inside the ZIP?
i.e. Size, Creation date
Yes, with SAS 9.4 Maint 3 or later, the FINFO function can work with zip file members. Here's a quick hit -- will work up a more complete example later.
I've just created a blog post with a full example.
Pingback: Using FILENAME ZIP and FINFO to list the details in your ZIP files - The SAS Dummy
Hello Chris,
Thanks for sharing this. Can i get a workaround for this Procedure in SAS 9.3 ?
There are two methods, but each has drawbacks. The most common approach is to use FILENAME PIPE to run a command that unzips/extracts the item you need. However, this requires the ability to run shell commands from SAS, and many centralized SAS environments have that disabled (for security).
The other method is the undocumented SASZIPAM method. You can find papers/examples with a directed internet search. It won't be as robust as the FILENAME ZIP method that was added to SAS 9.4.
Hi,
I tried to replicate your example above and it worked just fine. However, I made a small modification in order to try and macrotize/automate the last step. I added this:
data _null_;
set contents;
if index(memname, "arc") > 0 then
call symput('fname', memname);
run;
data _null_;
infile inzip(&fname)
The rest is exactly the same as yours; all I did was try to create a macro variable with the value of memname in the contents step and use that in the infile statement.
The macro variable generated correctly, but the step won't work set up the way I have it above.
I literally copied the value of the fname macro variable and used that instead, and it worked fine--but for some reason it won't work if rendered as &fname instead of the value of &fname. I tried using double quotes. I even tried macrotizing just parts of the path and it seemed to work fine.
I would appreciate any insight into what the issue may be--I am stumped!
Try using CALL SYMPUTX instead of SYMPUT -- trims any trailing spaces.
That worked, thank you!
However, now I'm having another issue which is that work.test (the SAS file produced by reading in the zip file and outputting each record) has 0 records. It has the full set of variables, but no records are being output into the file. Do you have any ideas for how I can troubleshoot this?
Jenna, I think I would need to see your complete program in order to troubleshoot. If you don't want to post as a comment, you can send as e-mail to chris.hemedinger@sas.com.
Super useful thread - thanks Chris and everyone who replied! I compiled some of these tips (zip listing, the fcopy approach) into a macro, which will take a zip file and copy the contents out into a directory location. Just two parameters:
```
/* compile the macro and dependencies */
filename mc url "https://raw.githubusercontent.com/Boemska/macrocore/master/macrocore.sas";
%inc mc;
/* call the macro */
%mp_unzip(ziploc="/tmp/some.zip",outdir=/tmp/outputs)
```
Great, thanks Allan!
Thanks for this post. May I ask something? I found that the following minimized version also works.
data _null_;
infile zipfile(member) recfm=f;
file target recfm=n;
input;
put _infile_;
run;
(1) What are EOF=E, RETURN, and E:STOP in your version doing? Are they necessary?
(2) I also found that RECFM=F and RECFM=N are essential. What are they doing?
Sorry to bother you.
RECFM=F says to treat the input file records as if a fixed size. RECFM=N says that the target file is binary, a stream of bytes with no record boundaries. Basically, the program reads a number of bytes from the input stream, and writes them to an output file. The EOF indicator tells the program we've reached end-of-file for input, so stop processing.
Thank you Chris. It works perfectly
Hi Chris, Can you please tell me how to create a custom task for this unzip file
For SAS Studio? Or SAS Enterprise Guide? I don't have an example handy for this, but for SAS Studio you can reference this doc.
For SAS Studio
Hi Chris,
This code works extremely well if the files are simply inside the zip file. But if there is a folder in the zip folder, like this:
/home/MyZip.zip/MyFolder/mytest1.csv
then it returns an error: 'Entry mytest1.csv in zip file /home/MyZip.zip does not exist'.
The 'isFile' value that you included works, but is there a way I can extract files if a folder is inside the zip file?
Yes, I think the file reference in SAS would have to include the path. For example:
I am working with a .zip file that has child .zip files in the main directory and then file I need is within the child .zip file.
data.zip
|__data_sub.zip
|__table.txt
Is there a way to read through multiple levels of zip files to get to "table.txt"?
I'm afraid you'll have to do this in multiple steps. First extract the data_sub.zip as a single file and store on in a local folder. Then repeat the process to extract the file you need from the data_sub.zip file.