In previous articles, I've shared tips about how you can work with SAS and ZIP files without requiring an external tool like WinZip, gzip, or 7-Zip. I've covered:
- How to create ZIP files with ODS PACKAGE ZIP (available since SAS 9.2)
- How to "unzip" and read ZIP files using FILENAME ZIP (SAS 9.4 and later)
- How to create and update ZIP files with FILENAME ZIP (SAS 9.4 and later)
But a customer approached me the other day with one scenario I missed: how to add SAS data sets to an existing ZIP file. It's a variation of a tip that I've already shared, but with two differences. First, in order to add a data set to a ZIP file, you have to know its physical filename -- not just the LIBNAME.MEMBER reference that you use in SAS procedure steps. And second, I had not shown how to add a new file to an existing ZIP archive -- though it turns out that's pretty simple.
Find the file name for a SAS data set
There are several ways to do this. For my approach, I used the output from PROC CONTENTS. Notice that I had to capture the ODS output (not the OUT= data set) to grab the file name. I wrapped it in a macro for easy reuse. And since I ultimately need a SAS fileref to map to the path, I've assigned one (data_fn) in my macro.
/* macro to assign a fileref to a SAS data set in a Base library */ %macro assignFilerefToDataset(_dataset_name); %local outDsName; ods output EngineHost=File; proc contents data=&_dataset_name.; run; proc sql noprint; select cValue1 into: outDsName from work.file where Label1="Filename"; quit; filename data_fn "&outDsName."; %mend; |
How to add a new member to a ZIP file
Now that I have the source file, I need to designate a destination file in a ZIP archive. The FILENAME ZIP method will create a new ZIP file if one does not yet exist, or it can add to an existing ZIP. To ensure I'm starting from scratch, I assign a simple fileref to my target destination and then delete the file.
/* Assign the fileref - basic file method */ filename projzip "&projectDir./project.zip"; /* Start with a clean slate - delete ZIP if it exists */ data _null_; rc=fdelete('projzip'); run; |
To create a new ZIP file and designate a path and file name within it, I used the FILENAME ZIP method with the MEMBER= option. Note that I specified the "data/" subfolder in the MEMBER= value; this will place the file into a named subfolder within the archive.
/* Use FILENAME ZIP to add a new member -- CLASS */ /* Put it in the data subfolder */ filename addfile zip "&projectDir./project.zip" member='data/class.sas7bdat'; |
Then finally, I need to actually "copy" the file into the archive. I do this by streaming the source file into the target fileref byte-by-byte:
/* byte-by-byte copy */ /* "copies" the new file into the ZIP archive */ data _null_; infile data_fn recfm=n; file addfile recfm=n; input byte $char1. @; put byte $char1. @; run; filename addfile clear; |
That's it! I now have a ZIP file with one member entry. Now I can "press repeat" to add a second entry:
%assignFilerefToDataset(sashelp.cars); /* Use FILENAME ZIP to add a new member -- CARS */ /* Put it in the data subfolder */ filename addfile zip "&projectDir./project.zip" member='data/cars.sas7bdat'; /* byte-by-byte copy */ /* "copies" the new file into the ZIP archive */ data _null_; infile data_fn recfm=n; file addfile recfm=n; input byte $char1. @; put byte $char1. @; run; filename addfile clear; |
Optional: Report on the ZIP file contents
If I want to report on the total contents of the ZIP file now, here's a DATA step and PROC CONTENTS step that does the job:
/* OPTIONAL for reporting */ /* Report on the contents of the ZIP file */ /* Assign a fileref wth the ZIP method */ filename inzip zip "&projectDir./project.zip"; /* Read the "members" (files) from the ZIP file */ data contents(keep=memname); length memname $200; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname=dread(fid,i); output; end; rc=dclose(fid); run; /* create a report of the ZIP contents */ title "Files in the ZIP file"; proc print data=contents noobs N; run; |
Result:
Files in the ZIP file memname --------------------- data/class.sas7bdat data/cars.sas7bdat N = 2
I hope that this helps to make the FILENAME ZIP method more useful to those who want to try it out. I'm sure that there will be more scenarios that people will ask about; someday, if I write enough blog posts, I'll have it all covered!
Sample program: You can view/download the entire SAS program (containing the snippets I've featured and more) from my GitHub profile.
8 Comments
Thank you Chris! :-)
You can grab information from proc contents through ODS by using the correct ODS Table Name to query for the value you need, removing the need for a proc sql step. Here's an example that returns the Filename Chris is looking for in the first program sample:
libname foo '<wherever>'; ods output enginehost=temp(keep=cvalue1 label1 where=(label1 = "Filename")); ods listing close; proc contents data=foo.<dataset>; run; ods listing; data _null_; set temp; put "Filename=" cvalue1; run;
You can find the ODS Table Names in the Output Delivery System: User's Guide documentation.
Thanks Pauline! For those of you in the audience, Pauline is one of the SAS R&D testers who works on ODS -- so her tips really mean something!
Thanks a lot for this information.
Chris,
Thanks for this post. I was wondering if there is any Access module or something to enable SAS to read a zipped SAS dataset within SAS library directly and without unzipping it? (i.e. filename.sas7bdat.gzip)
Mahmoud,
No, you have to extract the file from the ZIP archive in order to read it. That's pretty much true for any tool and process -- to read the content of the file, it has to be uncompressed first. Some tools (like Windows) make that look easy, but behind the scenes the file is being extracted before it's processed. The SAS code I've provided in this post shows how to do that.
Chris ,
Para aquivos isso é possível , segue exemplo utilizando infile , o arquivo não precisa ser descompactado para criar uma base SAS
%let origem = diretorio_Entrada ;
filename origem pipe "zcat &origem./TXT.zip";
data base_sas;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile origem delimiter = '|' MISSOVER
dsd lrecl=32767 firstobs=2 ;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
Correct! But this example uses an external utility, zcat, and that's not always possible. FILENAME PIPE is not permitted in all environments.