Reading and updating ZIP files with FILENAME ZIP

43

In a previous post, I shared an example of using ODS PACKAGE to create ZIP files. But what if you need to read a ZIP file within your SAS program? In SAS 9.4, you can use the FILENAME ZIP access method to do the job.

In this example, let's pretend that I need to analyze data that a government agency published (maybe by using SAS!) into a ZIP file. I've selected an exciting data source (found via data.gov) about Large Truck Crash Causation.

First, I need to download the latest version of the data file. I'll use PROC HTTP to do that job:

/* detect proper delim for UNIX vs. Windows */
%let delim=%sysfunc(ifc(%eval(&sysscp. = WIN),\,/));
 
/* create a name for our downloaded ZIP */
%let ziploc = %sysfunc(getoption(work))&delim.datafile.zip;
filename download "&ziploc";
 
/* Download the ZIP file from the Internet*/
proc http
 method='GET'
 url="http://ai.fmcsa.dot.gov/ltccs/Data/TEXT/Public/LTCCS_db_txt_public_01.zip"
 out=download;
run;

Next, I need to discover what files are within the ZIP file. I'll assign a fileref using the new FILENAME ZIP method. FILENAME ZIP is a directory-based access method, similar to the CATALOG access method or to using FILENAME to map to a folder. You can use functions such as DOPEN and DREAD to treat the ZIP file as if it's a file directory (since that's what it is, in concept).

/* Assign a fileref wth the ZIP method */
filename inzip zip "&ziploc";
 
/* Read the "members" (files) from the ZIP file */
data contents(keep=memname);
 length memname $200;
 fid=dopen("inzip");
 if fid=0 then
  stop;
 memcount=dnum(fid);
 do i=1 to memcount;
  memname=dread(fid,i);
  output;
 end;
 rc=dclose(fid);
run;
 
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=contents noobs N;
run;

Here's the report of files within the ZIP archive:


I've identified the HAZMAT.TXT file as the one that I want to analyze. I peeked at the first couple of records and was able to scratch out a simple DATA step to read the data. Notice how I don't need to explicitly extract the HAZMAT.TXT file -- I can simply reference it as a "member" of the INZIP fileref. The ZIP access method does the rest.

/* Import a text file directly from the ZIP */
data hazmat;
 infile inzip(hazmat.txt) 
   firstobs=2 dsd dlm='09'x;
 input 
  CaseID $10.
  VehicleNumber 
  Material 
  Reportable 
  Waiver 	
  PSU	 
  PSUStrata	
  RATWeight;
run;
 
title "Box plot of Vehicles # per incident";
ods graphics / height=200 width=450;
proc sgplot data=hazmat;
	hbox vehiclenumber;
	label VehicleNumber="# of vehicles";
	xaxis labelattrs=(size=12) valueattrs=(size=12);
run;

SAS reads my data file successfully, and yields this interesting box plot from the SGPLOT step:


(It looks like most "hazardous materials" accidents involved just 2 or 3 vehicles, except for one messy outlier that had nearly 30. Imagine the cleanup effort on that one!)

As an alternative, if I know exactly which file I need, I can assign a direct fileref by using the MEMBER= syntax:

filename inzip zip "&ziploc" member="hazmat.txt";
 
/* then my INFILE references the file directly, no parenthesized-member */
data hazmat;
 infile inzip
   firstobs=2 dsd dlm='09'x;
/* ...  */

The ZIP access method isn't just for reading. I can also use it to create and update ZIP files. For creating ZIP files, I prefer to use ODS PACKAGE. But it's very handy to be able to update ZIP files from a SAS program without using an external tool. For example, here's a program that deletes an extraneous file from an existing ZIP file:

/* Remove the PackageMetadata piece that ODS PACKAGE creates */
filename pkg ZIP "c:\projects\filenamezip\new.zip" member="PackageMetaData";
data _null_;
 if (fexist('pkg')) then 
  rc = fdelete('pkg');
run;

Note: Like ODS PACKAGE, the FILENAME ZIP method does not support encrypted (password-protected) ZIP archives.

Download the complete SAS 9.4 program: filenameZipHttpExample.sas

Thanks to the growing size of data files, ZIP files are created and consumed by SAS users everywhere. Between ODS PACKAGE and FILENAME ZIP, you can teach your SAS programs to build and read the files without having to rely on external tools. The more you that you can use native SAS methods for this work, the more portable your SAS programs will be.

See also

Using FILENAME ZIP to unzip and read SAS data files in SAS
Reading and writing GZIP files in SAS

Share

About Author

Chris Hemedinger

Senior Manager, SAS Online Communities

+Chris Hemedinger is the manager of SAS Online Communities. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies.  He also hosts the SAS Tech Talk webcasts each year from SAS Global Forum, connecting viewers with smart people from SAS R&D and the impressive work that they do.

Related Posts

43 Comments

  1. Beautiful!
    Thanks for providing example how to get files within zip file listed!
    I can use this immediately.

  2. Hi Chris
    I have a little problem because i receive a archive zip file that has the following structure

    ZIP name
    Nielsen
    dir001 with file aaa
    dir002 with file aaa
    dir003 with file aaa
    dir004 with file aaa
    dir005 with file aaa
    dir006 with file aaa

    in each sub-folder there is a file that has the same name in all subdirectories but changes between reception of the zip file and the next and also the number of sub folder was variable.

    how can i read all files present in archive zip file (the program is a automatic program without human control).

    • Chris Hemedinger
      Chris Hemedinger on

      If I understand you, it sounds like you have repeating file names within the same archive. In the archive they are in a folder structure, so they can be treated distinctly. But when you use a SAS program to process, all of the files end up in a single folder? You would need to rename the file as you extract it, perhaps based on the folder name, to keep the name unique in the folder. You could use the SAS RENAME function for this.

      • Hi Chris,
        I met a similar situation. I couldn't find how to navigate a directory structure stored inside a zip file.

    • Chris Hemedinger
      Chris Hemedinger on

      Marc, yes, I think so. First, you would have to use the FILENAME ZIP method to copy the zipped data file from a ZIP archive. Then, you would assign a library to the location where you just copied that data file, and access the data from there.

      You can do something like this to reach the file inside the ZIP, then copy it to a target folder:

      filename _bcin zip "c:\temp\instanttitles.zip" member="instanttitles.sas7bdat" recfm=n;
      filename _bcout  "c:\projects\instanttitles.sas7bdat" recfm=n; 
      

      Then use something like the binaryFileCopy macro (which I shared in this post) to copy the file and access as data:

      %binaryFileCopy()
      %put NOTE: _bcrc=&_bcrc;
      
      filename _bcin clear;
      filename _bcout clear;
      
      libname project "c:\projects";
      proc datasets lib=project;
      contents data=instanttitles;
      quit;
      

  3. Kevan Mather on

    Hi Chris,

    Thanks for this it's very useful!

    What if there are excel files in the zipped folder, how would one go about reading them into SAS?

    Kind Regards

    • Chris Hemedinger
      Chris Hemedinger on

      You can discover the member name using the example I provided here. Once you know the member name, you can assign a fileref to the Excel file you want to read. You can't PROC IMPORT the Excel file directly from the ZIP file, so you'll need to copy it out first. Here's an example that copies an XLSX file to the SAS Work location, and then runs PROC IMPORT on the result.

      filename xl "%sysfunc(getoption(work))/sas_tech_talks_15.xlsx" ;
      data _null_;
        infile inzip(sas_tech_talks_15.xlsx) recfm=n;
        file xl;
        input;
        put _infile_;
      run;
      
      proc import datafile=xl dbms=xlsx out=confirmed;
        sheet=confirmed;
      run;
      

  4. Darren Mayne on

    Hi Chris

    Thanks for this, it was exactly the solution I was looking for to support an analytic project with zipped source files. Just out of interest, does the ZIP engine work with the newer ZIPX file format?

    Cheers
    Darren

    • Chris Hemedinger
      Chris Hemedinger on

      José,

      gz is usually for a single file, while ZIP bundles up a collection of files and compresses them. There isn't a FILENAME method for gz, but ZIP and GZIP are mostly compatible. That is, you should be able to read a GZIPped file with the FILENAME ZIP method and vice versa. As some others have pointed out, password-protected files are not supported -- so that's one feature difference.

  5. My data is encrypted and requires a password.

    Can I add to the parameters? It is a simple fixed length file, once I give a pw, unzip it, and read it in!

    Thanks

    • Chris Hemedinger
      Chris Hemedinger on

      Do you mean that the ZIP file is encrypted with a password? Then no, that is not supported by FILENAME ZIP.

      If the ZIP file contains a SAS data set that is protected with a data set password, you can specify that in syntax when you read the data set. First, you must extract the data to a folder, then assign a library. This example shows you how.

    • Chris Hemedinger
      Chris Hemedinger on

      Unfortunately, none of the SAS language methods (ODS PACKAGE, FILENAME ZIP) support passwords. You'll have to use the "old school" method: use X command or SYSTASK to call 7zip or gzip commands to compress with a password.

  6. Mary Rosenbloom on

    Chris,

    This is very helpful. Thanks so much. I am migrating a bunch of files from SDD using the desktop connection, and I am finding that when I extract the files from the zip that they don't retain their true original creation date. Is there a way to get around this? We are migrating thousands of files and need to zip them to move them.

  7. Thanks.

    How would you deal with a zip file inside the zip without having to extract the second archive?
    To be clear, I have xxx.zip on which I can filename xx ZIP "xxx.zip" ; but the content of this is yyy.zip. How can I access from a filename statement the content of yyy.zip which happens to contain a csv file I want the dataset ?

    • Chris Hemedinger
      Chris Hemedinger on

      There is NO getting around that extraction. At least while using SAS, you'll have to extract the embedded ZIP and then use FILENAME ZIP to access that result, then extract the CSV within that. It's possible that other tools hide this complexity, but any process that needs to get to that CSV file will need to read/extract the entire "nesting doll" of ZIPs.

  8. Pingback: Add files to a ZIP archive with FILENAME ZIP - The SAS Dummy

  9. Pingback: Using FILENAME ZIP to unzip and read data files in SAS - The SAS Dummy

  10. Pingback: Using SAS and ODS PACKAGE to create ZIP files - The SAS Dummy

  11. Thanks for this Chris. What if there are csv files in the zipped folder, how would one go about reading them into SAS?

  12. Pingback: Using FILENAME ZIP and FINFO to list the details in your ZIP files - The SAS Dummy

  13. Hi, Chris,

    I'm using SAS EG 7.15 HF2 with a UNIX (AIX) back end. I can't get the ZIP method to find out the contents of a zip archive. The DOPEN consistently returns a zero. I've tried multiple zip files. Is the ZIP method only for those whose SAS executes on a Windows machine?

    Jim

    Code used:
    *------------------------------------------------------------------------------------------------*;
    ** File and library allocations. **;
    FILENAME FileIn ZIP "&Zip_Dir.";
    &NoMacs %Error_Check (MsgLvl=&MsgLvl, ErrLvl=&ErrLvl);

    LIBNAME SASout "&SAS_Lib";
    &NoMacs %Error_Check (MsgLvl=&MsgLvl, ErrLvl=&ErrLvl);

    *------------------------------------------------------------------------------------------------*;
    ** Program logic. **;

    DATA SASout.&SAS_Out;
    DROP _:;

    LENGTH mem_name $200.;
    LENGTH _File_ID 8.;
    LENGTH _mem_count 8.;
    LENGTH _i 8.;

    _File_ID = DOPEN("FileIn");

    IF MISSING(_File_ID) OR
    _File_ID = 0 THEN
    DO;
    CALL SYMPUT('SYSCC','4');
    PUTLOG "WARNING- ";
    PUTLOG "WARNING- ******************************************************************** ";
    PUTLOG "WARNING: Unable to open file. " _N_= _File_ID=;
    PUTLOG "WARNING- ******************************************************************** ";
    PUTLOG "WARNING- ";
    STOP;
    END;

    _mem_count = DNUM(_File_ID);

    IF NOT MISSING(_mem_count) THEN
    DO _i = 1 TO _mem_count;
    mem_name = DREAD(_File_ID, _i);
    OUTPUT;
    END;
    ELSE
    DO;
    CALL SYMPUT('SYSCC','4');
    PUTLOG "WARNING- ";
    PUTLOG "WARNING- ******************************************************************** ";
    PUTLOG "WARNING: Unable to process members. " _N_= _File_ID= _mem_count=;
    PUTLOG "WARNING- ******************************************************************** ";
    PUTLOG "WARNING- ";
    STOP;
    END;

    _RC = DCLOSE(_File_ID);
    RUN;
    &NoMacs %Error_Check (MsgLvl=&MsgLvl, ErrLvl=&ErrLvl);

    *------------------------------------------------------------------------------------------------*;

    • Chris Hemedinger
      Chris Hemedinger on

      No Jim, it works with UNIX platforms too. Requires SAS 9.4 or later. Does &ZIP_Dir in your program refer to a ZIP file or to a folder that contains ZIP files? The FILENAME ZIP method requires a filename, not directory name.

      • Hi, Chris, thanks for your very speedy response!

        Hmm. Well, I sort of figured that it was intended to work for UNIX, but no luck thus far.

        In answer to your question, &Zip_Dir points to:
        NOTE: The file being processed is:
        /analytics/data_intelligence/Reporting/TU_FFR40TU_FFR40.zip

        I'm running SAS 9.4:
        SAS (r) Proprietary Software Release 9.4 TS1M4

        I did notice one other thing. There's a note in my log as follows:
        NOTE: List Handle Creation Failed.

        My system errors are as follows (if they're salient):
        NOTE: Checking for warnings and errors
        SYSERR = 4
        SYSCC = 4
        SYSFILRC = 0
        SYSLIBRC = 0
        SQLRC = 0
        SQLXRC = 0
        ReturnCode = 0

        Jim

        • Chris Hemedinger
          Chris Hemedinger on

          Jim -- I'm stumped. I know this works with UNIX -- I've used in SAS University Edition (Linux-based) and in other Linux environments here. I suggest that you open a track with tech support. First, double-check that the file exists with the name you specified -- remember, case has to match on UNIX.

          • Hi, Chris,

            I guess the "NOTE: List Handle Creation Failed" message didn't help any? That was the only diagnostic I got.

            I've tried this with multiple files, and I cut and paste in the names to avoid the very problem you mention (lower case vs. upper case). I also created a zip archive on my Win 7 machine and brought it over both in "text" mode and binary to see if perhaps the mode of transport were the issue. It was not.

            I'll check with tech support.

            Thanks,

            Jim

  14. Oh, gad! I'm a complete idiot. I forgot to put a "/" between my macro variables when I concatenated the path with the file name. Duh. I are a very smart computer programmer. Amazing how much better it works when you get the path and file name put together properly.

    However, somewhat in my defense, I'm going to go ahead and say "NOTE: List Handle Creation Failed" is a wee tad obscure in terms of a diagnostic message. "File not found" would have been a whole lot more helpful, especially for a SAS Dummy like me. :)

    Jim

      • OK! It's all integrated and working:
        a) Get member list, identifying which members are directories.
        b) Sort so sub directories inside the archive are first in the member list.
        c) DATA step to create all sub directories and then call execute a macro to read in the zipped data and write to UNIX directory

        Very nice, very useful (when you don't forget to put a "/" between your path macro variable and your file macro variable).

        Thanks, Chris!

Leave A Reply

Back to Top