Using FILENAME ZIP to unzip and read data files in SAS

75

I've written about how to use the FILENAME ZIP method to read and update ZIP files in your SAS programs. The ZIP method was added in SAS 9.4, and its advantage is that you can accomplish more in SAS without having to launch external utilities such as WinZip, gunzip, or 7-Zip.

Several readers replied with questions about how you can use the content of these ZIP files within your SAS program. The basic scenario is: "I've got some data files in my ZIP archive. I want to use SAS to unzip these and then use them as data within my SAS process. Can I do this?"

Yes, you can -- but it does require an extra step. Even though FILENAME ZIP can show you the contents and structure of your ZIP file, most SAS procedures cannot access the content directly while it's in the archive. So, the additional step is to copy the file to another location, effectively extracting it from the ZIP file.

As an example, I created a ZIP file with two files and a subfolder:

data.zip
  |__ sas_tech_talks_15.xlsx
  |__ sas/
      |__ instanttitles.sas7bdat

This SAS program helps me to discover how FILENAME ZIP sees the file:

filename inzip ZIP "c:\projects\data.zip";
 
/* Read the "members" (files) from the ZIP file */
data contents(keep=memname isFolder);
 length memname $200 isFolder 8;
 fid=dopen("inzip");
 if fid=0 then
  stop;
 memcount=dnum(fid);
 do i=1 to memcount;
  memname=dread(fid,i);
  /* check for trailing / in folder name */
  isFolder = (first(reverse(trim(memname)))='/');
  output;
 end;
 rc=dclose(fid);
run;
 
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=contents noobs N;
run;

Output:

        Files in the ZIP file                                         
 memname                       isFolder
 sas/                             1  
 sas/instanttitles.sas7bdat       0  
 sas_tech_talks_15.xlsx           0  
                N = 3

With this information, I can now "copy" the XLSX file out of the ZIP file and then import it into a SAS data set. Notice how I can use the "member" syntax (fileref with the file I want in parentheses) to address a specific file in the ZIP archive. I want to copy just from the actual files, and not the folder-level entries.

/* identify a temp folder in the WORK directory */
filename xl "%sysfunc(getoption(work))/sas_tech_talks_15.xlsx" ;
 
/* hat tip: "data _null_" on SAS-L */
data _null_;
   /* using member syntax here */
   infile inzip(sas_tech_talks_15.xlsx) 
       lrecl=256 recfm=F length=length eof=eof unbuf;
   file   xl lrecl=256 recfm=N;
   input;
   put _infile_ $varying256. length;
   return;
 eof:
   stop;
run;
 
proc import datafile=xl dbms=xlsx out=confirmed replace;
  sheet=confirmed;
run;

Sample output from my SAS log:

NOTE: The infile INZIP(sas_tech_talks_15.xlsx) is:
      Filename=c:\projects\data.zip,
      Member Name=sas_tech_talks_15.xlsx

NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file XL is:
      Filename=C:\SAS Temporary Files\_TD396_\Prc2\sas_tech_talks_15.xlsx,
      RECFM=N,LRECL=256,File Size (bytes)=0,
      Last Modified=11May2015:11:38:59,
      Create Time=11May2015:11:20:23

NOTE: A total of 55 records were read from the infile library INZIP.
NOTE: 55 records were read from the infile INZIP(sas_tech_talks_15.xlsx).
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

To use the SAS data set in the file, I need to copy it into a location shared by a SAS library. In this example, I will again use the WORK location. Because my SAS data set is in a logical subfolder (named "sas") within the archive, I need to include that path as part of the member syntax on the INFILE statement.

/* Copy a zipped data set into the WORK library */
filename ds "%sysfunc(getoption(work))/instanttitles.sas7bdat" ;
 
data _null_;
   /* reference the member name WITH folder path */
   infile inzip(sas/instanttitles.sas7bdat) 
	  lrecl=256 recfm=F length=length eof=eof unbuf;
   file   ds lrecl=256 recfm=N;
   input;
   put _infile_ $varying256. length;
   return;
 eof:
   stop;
run;
 
proc contents data=work.instanttitles;
run;

Partial output in my example:

                             Files in the ZIP file                          
                             The CONTENTS Procedure

 Data Set Name        WORK.INSTANTTITLES            Observations          1475
 Member Type          DATA                          Variables             6   
 Engine               V9                            Indexes               0   
 Created              01/29/2015 15:09:54           Observation Length    248 
 Last Modified        01/29/2015 15:09:54           Deleted Observations  0   
 Protection                                         Compressed            NO  
 Data Set Type                                      Sorted                NO  
 Label                                                                        
 Data Representation  WINDOWS_64                                              
 Encoding             wlatin1  Western (Windows)                              

Of course, all of this can be automated even further by writing SAS code that automatically iterates through the ZIP file member names and copies/imports each of the members as needed.

Share

About Author

Chris Hemedinger

Director, SAS User Engagement

+Chris Hemedinger is the Director of SAS User Engagement, which includes our SAS Communities and SAS User Groups. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies

75 Comments

  1. Andreas Menrath on

    WARNING: binary file copy may cause trouble!

    I just used your binary file copy snippet here:

    data _null_;
    /* using member syntax here */
    infile inzip(sas_tech_talks_15.xlsx)
    lrecl=256 recfm=F length=length eof=eof unbuf;
    file xl lrecl=256 recfm=N;
    input;
    put _infile_ $varying256. length;
    return;
    eof:
    stop;
    run;

    For 99% of my files it worked fine. But unfortunately it does not make a 1:1 copy because it drops UTF byte order marks!
    I played around with the code, but was not able to fix it. It looks like the UTF BOM is dropped before it is copied into the _infile_ variable :-(

  2. Paige Miller on

    The following is a SASLOG with an error when I try to duplicate your results in SAS 9.4 TS1M2. What is this error?

    131 filename inzip ZIP "c:\users\pmiller\documents\be_output\be_output.zip";
    132 filename xl "c:\users\pmiller\documents\be_output\may 2013 be monthly update.xml" ;
    133 /* hat tip: "data _null_" on SAS-L */
    134 data _null_;
    135 /* using member syntax here */
    136 infile inzip(May_2013_BE_Monthly_Update.xml) lrecl=256 recfm=F length=length eof=eof unbuf;
    137 file xl lrecl=256 recfm=N;
    138 input;
    139 put _infile_ $varying256. length;
    140 return;
    141 eof:
    142 stop;
    143 run;

    ERROR: Open failure for c:\users\pmiller\documents\be_output\be_output.zip during attempt to create a local file handle.
    NOTE: UNBUFFERED is the default with RECFM=N.
    NOTE: The file XL is:
    Filename=c:\users\pmiller\documents\be_output\may 2013 be monthly update.xml,
    RECFM=N,LRECL=256,File Size (bytes)=0,
    Last Modified=09Jun2015:15:47:58,
    Create Time=09Jun2015:15:47:58

    NOTE: The SAS System stopped processing this step because of errors.

    • Chris Hemedinger
      Chris Hemedinger on

      I get that message if the file doesn't exist (c:\users\pmiller\documents\be_output\be_output.zip). Check to make sure that's the correct file name?

      • I also got this message when I tried to unzip a file that was still being downloaded, so that it locked by the downloading process

  3. Hi Chris,
    Thanks for the blog, I tried your method to read and unzip a gz extension file (myfile.gz), but with no success. This is what I tried :

    /*************************************************************************************/
    filename test zip "c:\mydirectory\myfile.gz" ;
    filename outtest "C:\Janicedirectory";

    data _null_ ;
    infile test recfm=N ;
    file outtest recfm=N;
    input byte $char1. ;
    put byte $char1. ;
    run;

    • Chris Hemedinger
      Chris Hemedinger on

      Did you post a complete example? Try this for a start to see what's in the GZ file:

      filename inzip ZIP "c:\mydirectory\myfile.gz"; /* Read the "members" (files) from the ZIP file */ data contents(keep=memname isFolder); length memname $200 isFolder 8; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname=dread(fid,i); /* check for trailing / in folder name */ isFolder = (first(reverse(trim(memname)))='/'); output; end; rc=dclose(fid); run; /* create a report of the ZIP contents */ title "Files in the GZ file"; proc print data=contents noobs N; run;

      • Actually , I have just a single compressed file «myfile.dat.gz». so I need to read and process the data in sas 9.4 from this compressed file. I try to read data from the zipped file by byte and output it to an external file , here is my code :

        filename test zip "C:\mydirectory\myfile.dat.gz" member='myfile.dat.gz';
        filename outtest "C:\Janicedirectory\myfile.dat";

        data _null_ ;
        Infile test lrecl= 256 recfm=N ;
        File outtest lrecl=265 recfm=N ; /* output file*/
        input ;
        put _infile_ ;
        run;

        Still doesn't work ?

        • Chris Hemedinger
          Chris Hemedinger on

          Almost there, I think. Try something like this (removing the .gz from the member= option):

          /* assuming file in archive is named "myfile.dat" */ filename test zip "C:\mydirectory\myfile.dat.gz" member='myfile.dat';

          • Kimberley Shirley on

            Hi,

            I have a similar issue to Janice. I have a zip file with a single .dat file inside of it that I need to extract and then copy across to another folder without actually reading the .dat file. Currently my code is as follows:

            filename test zip "/sasdata/sourcedata/transfer.zip" member='transfer.dat';
            filename outtest "/sasdata/target/testtransfer.dat";
            data _null_ ;
            Infile test lrecl= 256 recfm=N ;
            File outtest lrecl=265 recfm=N ; /* output file*/
            input ;
            put _infile_ ;
            run;

            Unfortunately, each time I run this, I get the following errors:

            ERROR: Out of space writing to file /sasdata/target/testtransfer.dat.
            ERROR: Unrecoverable I/O error detected in the execution of the DATA step program. Aborted during the EXECUTION phase.

            I have also noted that the transfer.dat file is only 9GB but the size of the folder is 90GB.

          • Chris Hemedinger
            Chris Hemedinger on

            I don't know what's going on here -- there might be some additional logging options that you can enable for more diagnostics. I imagine that the compressed file would need to be extracted to a WORK location before it is copied to the final destination, so perhaps that's the area that's running low on space. I suggest working with SAS Tech Support on this.

          • Hi Chris,

            I had tried the above code for unzipping files, the 7 GB(around 4 million records) non zipped sas dataset gets expanded to more than 65 GB(around 50 million records) after unzipping and since we do not have disk space it shows i/o error and insufficient space so the expansion stops.

            I had tested with 2.8 GB sas dataset but it unzips to the tee, without any issue?

          • Chris Hemedinger
            Chris Hemedinger on

            I don't know why a zipped data set would contain fewer records than an unzipped data set. I think I'm missing something in your question.

          • Before zipping the file size was around 7 GB.
            After zipping the file size is around 500MB
            But when I am trying to unzip this file , the file is growing in size and the entire disk is full because of this unzip activity.

          • Chris Hemedinger
            Chris Hemedinger on

            When you unzip such a large file, you do need a certain amount of scratch space to allow for the file expansion while it's written to disk. I don't know what the formula should be, but I'd say that if you're unzipping the entire 7GB file you should have at least 10-15GB of available space.

          • True. We need certain disk space for the file to be written.
            I had like 90GB of disk space and still this unzipped file was growing upto all of 90GB.
            eventually I had to stop the process and delete the ever growing file.

            But files of size 5GB before zipping gets zipped to around 500MB.
            When I am unzipping them I am able to unzip them to the tee without any issues and very quickly too with the same method.

            So it seems like files above 5GB size follow some different way while zipping, such that it causes issues while unzipping.

          • Chris Hemedinger
            Chris Hemedinger on

            I guess we might need to see some sample code that you're using for unzipping. Is it possible that you have the extraction process in a loop that gets run multiple times? I suggest posting the question to SAS Support Communities -- it's easier to supply a better answer there, and other experts can chime in.

          • My code
            filename in02 ZIP "C:\mydirectory/mo_od_main_JT00_2002.csv.gz" member='mo_od_main_JT00_2002.csv' GZIP;
            filename out02 "C:\teamdirectory/mo_od_main_JT00_2002.sas7bdat";
            data mylib.file02;
            infile in02 recfm=N;
            file out02 recfm=N;
            input;
            put _infile_;
            run;

            Log result:

            NOTE: 3937 records were read from the infile IN02.
            NOTE: The data set MYLIB.FILE02 has 3937 observations and 0 variables.
            NOTE: DATA statement used (Total process time):
            real time 0.39 seconds
            cpu time 0.42 seconds
            There are over 1 million records in the folder.

          • Chris Hemedinger
            Chris Hemedinger on

            A couple of notes. First, you don't need the member= option, because GZIP files have just one file that's compressed. Second, don't use a FILENAME statement for the SAS data set. You just need the DATA mylib.file02 to identify where the data will go.

            Since the file is a CSV, you can INFILE and INPUT the records directly. See this article for GZIP examples.

    • Chris Hemedinger
      Chris Hemedinger on

      I used XLSX and a SAS7BDAT file as examples, but CSV would work the same way. Use FILENAME ZIP to "address" the file, DATA step to copy it out as file block, then another DATA step to read it. You might be able to combine those two DATA steps to read the file contents just once, but I don't think you'll be able to INFILE the item as series of text characters while it's in the ZIP archive.

  4. HI I am not getting the second part to work where you read in a file. You wrote: "With this information, I can now 'copy" the XLSX file out of the ZIP file and then import it into a SAS data set. Notice how I can use the "member" syntax (fileref with the file I want in parentheses) to address a specific file in the ZIP archive. I want to copy just from the actual files, and not the folder-level entries.' I am not sure but are you manually opening the zip file to put it in the work folder? I have an assignment to work with and find metadata on my entire 2TB network share and have a count of multiple K zip files to open and wanted to make sure this process could auto read, and extract the data without manually opening the files. This code: filename inzip ZIP "D:\MyFileSystem\This_zipped_file.zip";

    /* Read the "members" (files) from the ZIP file */
    data contents(keep=memname isFolder);
    length memname $200 isFolder 8;
    fid=dopen("inzip");
    if fid=0 then
    stop;
    memcount=dnum(fid);
    do i=1 to memcount;
    memname=dread(fid,i);
    /* check for trailing / in folder name */
    isFolder = (first(reverse(trim(memname)))='/');
    output;
    end;
    rc=dclose(fid);
    run;

    /* create a report of the ZIP contents */
    title "Files in the ZIP file";
    proc print data=contents noobs N;
    where isFolder =0;
    VAR MEMNAME;
    run;

    /*
    makes this output:
    Files in the ZIP file 83
    15:58 Tuesday, December 22, 2015

    memname

    CSV/CollectorList.csv
    CSV/Sheet_1.csv
    Excel/CollectorList.xls
    Excel/Sheet_1.xls
    READ ME.txt

    N = 5

    and for example I have tried these lines of code and a few variations but it does not work...
    */

    filename xl "D:\MyFileSystem\Excel\CollectorList.xls" ;

    /* hat tip: "data _null_" on SAS-L */
    data _null_;
    /* using member syntax here */
    infile inzip(Excel\CollectorList.xls)
    /*... the rest of your code here..*/

    any clues?

    • ...darn, I just tired my code again after manually extracting the folder and placing it in my path. I now works. Not what I was hoping it to do... :( Thank you for sharing. -KJ

      • Chris Hemedinger
        Chris Hemedinger on

        Keith, my example is supposed to work without you having to manually unzip/extract the ZIP file -- everything should be handled by the SAS program using the FILENAME ZIP method. Is that what you have working? If not, let me know.

  5. I'm having it hang up on the eof statement. I'm working with linux SAS. Suggestions would be appreciated.
    36 filename ds "%sysfunc(getoption(work))/mic_file.sas7bdat"; 37 data _null_; 38 /*using member syntax here*/ 39 infile inzip(mic_file.sas7bdat) lrecl=256 recfm=F 39 ! length=length eof=eof unbuf; 40 file ds lrecl=256 recfm=N; 41 input; 42 put _infile_ $varying256. length; 43 return; 44 eof; ___ 180 ERROR 180-322: Statement is not valid or it is used out of proper order. 45 stop; 46 run;

    • Chris Hemedinger
      Chris Hemedinger on

      In this example, "eof" on line 44 is a label, a target to "goto". Use a colon instead of a semicolon after it.

  6. I am back on this project after a few months on other things and now, I am lost again with your sample code. I thought I had it figured out in Dec. In my case I have cloned our file systems with a robocopy command to a sub directory placed at the root level, I used a few excludes to not get any race conditions etc. I then added *.zip files to my robocopy command. I then found a purge empty directory script cut off the excess limbs. Now I have a trimmed down directory with only a copy of the original zip files that I can use as I see fit, my next idea was to extract them all to a relative folder of the location but instead of myfile.zip I would place all the data in myfile.X folder again relative to each *.zip file found. So I have written the SAS code to make the subdir’s and move the files to their new home. Now I need to extract all my files. My problem, and reason I found your post. Once I am done with my extract I have 7-or-8 SAS programs I can clone and point at this sub directly and scan for metadata (xls, xlsx, mdb, accdb, dbf, sas7bdat, and sav ) and produce a report as I am required. But as stated am stuck back on the extract phase. TIA for any pointers you can provide.
    …I am just looking for a stripped down version of what you seem to be showing that does not bother with reading the meta data directly of the files found, but just extract them and if possible read from a dataset with one field like MyPathFile where the strings might be: E:\mypath1\mypath2\...\mypathN\MyFile.X\MyFile.zip
    E:\mypath1\mypath2\...\mypathN\MyFile1.X\MyFile1.zip
    E:\mypath1\mypath2\...\mypathN\MyFile3.X\MyFile3.zip
    E:\mypath1\...\mypathN\MyFile.X\MyFile.zip
    E:\MyFile.X\MyFile.zip
    But I can easily split it in to path and file if needed.
    -TIA -KJ

    • Chris Hemedinger
      Chris Hemedinger on

      Keith,

      For now, I'll have to leave this as an exercise for you -- or you can post your question and code-so-far to SAS Support Communities and perhaps someone else can help.

      I can offer these tips though:

      - use the dopen and dread functions to find the names/paths of the files inside the ZIP archive (as in my example in this post).

      - using that information, create a fileref for each member you want to extract, then use a binary copy method (in DATA step) to copy the byte stream of an archive member to a destination fileref in your target folder. This step may be suited to a macro or to a DATA step loop or even a DOSUBL construct.

      I have another FILENAME ZIP blog post teed up to publish soon, so stay tuned for more related information.

  7. Pingback: Add files to a ZIP archive with FILENAME ZIP - The SAS Dummy

  8. Diwakar Mahanti on

    Hi Chris,

    This method is very useful especially when the zip file contains several member files that are not needed except one. Initially I used unix unzip in my SAS code but that unzipped and saved all unwanted files on the server. -Thanks a lot

  9. Chris,

    Excellent post, but, unless I am incorrect, this approach (and FCOPY()), re-writes the file, thus changing the stamp of the last modification datetime? Have I overlooked anything?

    Thank you,

    Kevin

    PS A short search of the internet suggest that using an X statement would require a third party program, like 7-Zip, in the Windows OS (7 or less, not sure about the newer OS's).

    • Chris Hemedinger
      Chris Hemedinger on

      Yes, you're correct. This process rewrites the file and so changes the file date/time information. And yes, X command and a 3rd party tool can be used together for more flexibility in creating/extracting ZIP files.

      • Hi, Chris. Great post, and helpful as always. Is there a way to either modify this approach or take a different one when unzipping in SAS to keep the original file creation date intact?

        Thanks,
        Jenna

        • Chris Hemedinger
          Chris Hemedinger on

          Not with this method. This "unzip" is extracting a file and creating a new copy of the file, with new attributes. To keep the original attributes, you would have to use the zip tools directly.

  10. Hi Chris,

    I hope you can help, or anyone reading this blog.

    I am trying to convert first part of your code into macro but keep getting messages that variable memcount and fid cannot be evaluated. I am just starting using macros and need help with this one in order to process 100s of zip files.
    What I am trying to do is to read zip file names from a file and run macro for every zip file to get its contents file. Then I would only extract csv files from the contents file and then use your code from the other blog (using function fcopy) to copy csv files into single directory. In this way I would have all csv files in a single directory.
    If I run your code separately for a single file there is no problem. It works fine.

    Below is the code that I'd like to convert into macro (copied form your blogs). I hope you can help.

    Thanks

    STEP 1 - Create CONTENTS file of all CSV files

    filename inzip ZIP "c:\projects\&datazip"; * Macro variable &datazip would be read from the file*;

    /* Read the "members" (files) from the ZIP file */
    data contents(keep=memname isFolder);
    length memname $200 isFolder 8;
    fid=dopen("inzip");
    if fid=0 then
    stop;
    memcount=dnum(fid);
    do i=1 to memcount;
    memname=dread(fid,i);
    /* check for trailing / in folder name */
    isFolder = (first(reverse(trim(memname)))='/');
    output;
    end;
    rc=dclose(fid);

    proc append data=CONTENTS base=&ALL_CSV_FILES; run; * THis line added by me*;
    run;

    STEP 2 - Copy all csv files into _bcout directory.

    filename xl "%sysfunc(getoption(work))/&csvfile" ; * &csvfile is the file from ALL_CSV_FILES*;
    filename _bcin "%sysfunc(getoption(work))/&csvfile" recfm=n /* RECFM=N needed for a binary copy */;
    filename _bcout "C:\MyDir\&csvfile" recfm=n;

    data _null_;
    length msg $ 384;
    rc=fcopy('_bcin', '_bcout');
    if rc=0 then
    put 'Copied _bcin to _bcout.';
    else do;
    msg=sysmsg();
    put rc= msg=;
    end;
    run;

    filename _bcin clear;
    filename _bcout clear;

      • Chris Hemedinger
        Chris Hemedinger on

        Congrats! Glad you got it working. Feel free to post your final version back here, or on SAS Support Communities!

  11. Hi,Chris

    Thanks for sharing this fancy blog, and it works pretty well for .zip files. Now when I am trying to apply your coding to .7z files,I run into trouble. Could you please show me how to unzip the .7z files and import the unzipped files into SAS datasets? Thanks a lot.

    Best regards,
    Yajun

    • Chris Hemedinger
      Chris Hemedinger on

      Yajun,

      Unfortunately 7-zip files are not standard ZIP files, and are not supported. Nor are .gz files (gzip on UNIX). Both feature requests have been entered for the developers.

  12. Hello Chris
    I am using your program with sas eg7.11 for a zip file on the unix server and am getting contents of 0. I know the zip file should contain only one txt file. Does your code only work in 9.4?

    Thanks

    • Chris Hemedinger
      Chris Hemedinger on

      Yes, this requires SAS 9.4 for FILENAME ZIP. Your version of EG doesn't matter, in this case.

      • Thank your Chris. I am trying to read a .txt file. I can run the first part and see that the zipfile contains only RPT01546.txt . So I tried to run this code.

        filename xl "%sysfunc(getoption(work))/RPT01546.txt" ;

        /* hat tip: "data _null_" on SAS-L */
        data _null_;
        /* using member syntax here */
        infile inzip(RPT01546.txt)
        lrecl=512 recfm=F length=length eof=eof unbuf;
        file xl lrecl=512 recfm=N;
        input;
        put _infile_ $varying256. length;
        return;
        eof:
        stop;
        run;

        Here is the error. What am I doing wrong.
        NOTE: The zip file RPT01546.txt doesn't exist.
        ERROR: Physical file does not exist, RPT01546.txt.
        NOTE: UNBUFFERED is the default with RECFM=N.
        NOTE: The file XL is:
        Filename=/plunx21/Global/saswork_Global/SAS_work438D000051F7_crlnxp070/SAS_work1E68000051F7_crlnxp070/RPT01546.txt,
        Owner Name=chihasti,Group Name=UUXStaff,
        Access Permission=-rw-rw-r--,
        Last Modified=10Feb2017:16:20:46

        • Hello Chris, sorry I just figured out why the error because the file name itself did not have the txt at the end. Can I use proc import without a dlm ?
          thanks

          • Chris Hemedinger
            Chris Hemedinger on

            Yes, but you might do better with just DATA step. A trick: grab a copy of the file and use SAS EG Import Data task to generate a working DATA step, then generalize that for use in your process.

    • Chris Hemedinger
      Chris Hemedinger on

      Zubair, I don't think FILENAME ZIP can give you the file attributes (like size).

  13. Chris,

    I need to open a sas7bdat that is inside a zip file and I'm with problems into the output dataset. This file is without any variables.
    I think the problem is in the piece of code below:
    input;
    put _infile_ $varying256. length;
    return;

    Can you help me?

    • Chris Hemedinger
      Chris Hemedinger on

      Hi Mark, I can't help without more information. If you can share the ZIP file or an example that shows the problem, I suggest posting on SAS Support Communities. If it's not something you can share publicly, you might need to work with SAS Tech Support.

      • John Keighley on

        I am not sure but why are you using an input statement with a SAS data set? Shouldn't it be a set?

        • Chris Hemedinger
          Chris Hemedinger on

          DATA step can't read the file as a SAS data set while it's in the ZIP file. The INFILE and PUT combination is to read the bytes of the compressed data set file, extract and write it out to the file system. Then SAS can read and process it like data.

  14. Is there a way to read the file info from the files inside the ZIP?
    i.e. Size, Creation date

    • Chris Hemedinger
      Chris Hemedinger on

      Yes, with SAS 9.4 Maint 3 or later, the FINFO function can work with zip file members. Here's a quick hit -- will work up a more complete example later.

       %macro info(z);
        data _NULL_;
          fId = fopen("&z","S");
          if fID then
          do;
           infonum=foptnum(fid);
           do i=1 to infonum;
            infoname=foptname(fid,i);
            infoval=finfo(fid,infoname);
            put @1 i= @5 infoname= @35 infoval=;
            output;
           end;
           fId = fClose( fId );
         end;
        run;
      %mend info;
      
      filename zmem zip "c:\myzipfile.zip" member="folder/knownmember.png";
      %info(zmem);
      

  15. Pingback: Using FILENAME ZIP and FINFO to list the details in your ZIP files - The SAS Dummy

    • Chris Hemedinger
      Chris Hemedinger on

      There are two methods, but each has drawbacks. The most common approach is to use FILENAME PIPE to run a command that unzips/extracts the item you need. However, this requires the ability to run shell commands from SAS, and many centralized SAS environments have that disabled (for security).

      The other method is the undocumented SASZIPAM method. You can find papers/examples with a directed internet search. It won't be as robust as the FILENAME ZIP method that was added to SAS 9.4.

  16. Hi,

    I tried to replicate your example above and it worked just fine. However, I made a small modification in order to try and macrotize/automate the last step. I added this:
    data _null_;
    set contents;
    if index(memname, "arc") > 0 then
    call symput('fname', memname);
    run;

    data _null_;
    infile inzip(&fname)

    The rest is exactly the same as yours; all I did was try to create a macro variable with the value of memname in the contents step and use that in the infile statement.

    The macro variable generated correctly, but the step won't work set up the way I have it above.

    I literally copied the value of the fname macro variable and used that instead, and it worked fine--but for some reason it won't work if rendered as &fname instead of the value of &fname. I tried using double quotes. I even tried macrotizing just parts of the path and it seemed to work fine.

    I would appreciate any insight into what the issue may be--I am stumped!

      • That worked, thank you!

        However, now I'm having another issue which is that work.test (the SAS file produced by reading in the zip file and outputting each record) has 0 records. It has the full set of variables, but no records are being output into the file. Do you have any ideas for how I can troubleshoot this?

  17. Super useful thread - thanks Chris and everyone who replied! I compiled some of these tips (zip listing, the fcopy approach) into a macro, which will take a zip file and copy the contents out into a directory location. Just two parameters:

    ```
    /* compile the macro and dependencies */
    filename mc url "https://raw.githubusercontent.com/Boemska/macrocore/master/macrocore.sas";
    %inc mc;
    /* call the macro */
    %mp_unzip(ziploc="/tmp/some.zip",outdir=/tmp/outputs)
    ```

  18. Thanks for this post. May I ask something? I found that the following minimized version also works.
    data _null_;
    infile zipfile(member) recfm=f;
    file target recfm=n;
    input;
    put _infile_;
    run;
    (1) What are EOF=E, RETURN, and E:STOP in your version doing? Are they necessary?
    (2) I also found that RECFM=F and RECFM=N are essential. What are they doing?
    Sorry to bother you.

    • Chris Hemedinger
      Chris Hemedinger on

      RECFM=F says to treat the input file records as if a fixed size. RECFM=N says that the target file is binary, a stream of bytes with no record boundaries. Basically, the program reads a number of bytes from the input stream, and writes them to an output file. The EOF indicator tells the program we've reached end-of-file for input, so stop processing.

  19. Hi Chris,

    This code works extremely well if the files are simply inside the zip file. But if there is a folder in the zip folder, like this:

    /home/MyZip.zip/MyFolder/mytest1.csv

    then it returns an error: 'Entry mytest1.csv in zip file /home/MyZip.zip does not exist'.

    The 'isFile' value that you included works, but is there a way I can extract files if a folder is inside the zip file?

    • Chris Hemedinger
      Chris Hemedinger on

      Yes, I think the file reference in SAS would have to include the path. For example:

      infile myzip(MyFolder/mytest1.csv);
      

  20. Evan Williamson on

    I am working with a .zip file that has child .zip files in the main directory and then file I need is within the child .zip file.

    data.zip
    |__data_sub.zip
    |__table.txt

    Is there a way to read through multiple levels of zip files to get to "table.txt"?

    • Chris Hemedinger
      Chris Hemedinger on

      I'm afraid you'll have to do this in multiple steps. First extract the data_sub.zip as a single file and store on in a local folder. Then repeat the process to extract the file you need from the data_sub.zip file.

Back to Top