A couple of years ago I shared a method for copying any file within a SAS program. It was a simple approach, copying the file byte-by-byte from one fileref (SAS file reference) to another.
My colleague Bruno Müller, a SAS trainer in Switzerland, has since provided a much more robust method. Bruno's method has several advantages:
- It's coded as a SAS macro, so it is simple to reuse -- similar to a function.
- It copies the file content in chunks rather than byte-by-byte, so it's more efficient.
- It provides good error checks and reports any errors and useful diagnostics to the SAS log.
- It's an excellent example of a well-documented SAS program!
Bruno tells me that "copying files" within a SAS program -- especially from nontraditional file systems such as Web sites -- is a common need among his SAS students. I asked Bruno for his permission to share his solution here, and he agreed.
To use the macro, you simply define two filerefs: _bcin (source) and _bcout (target), then call the %binaryFileCopy() macro. Here is an example use that copies a file from my Dropbox account:
filename _bcin TEMP; filename _bcout "C:\temp\streaming.sas7bdat"; proc http method="get" url="https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat" out=_bcin ; run; %binaryFileCopy() %put NOTE: _bcrc=&_bcrc; filename _bcin clear; filename _bcout clear; |
The following is partial log output from the program:
NOTE: BINARYFILECOPY start 17SEP2013:20:50:33 NOTE: BINARYFILECOPY infile=_bcin C:\SASTempFiles\_TD5888\#LN00066 NOTE: BINARYFILECOPY outfile=_bcout C:\temp\streaming.sas7bdat NOTE: BINARYFILECOPY processed 525312 bytes NOTE: DATA statement used (Total process time): real time 0.20 seconds cpu time 0.07 seconds NOTE: BINARYFILECOPY end 17SEP2013:20:50:34 NOTE: BINARYFILECOPY processtime 00:00:00.344
You can download the program -- which should work with SAS 9.2 and later -- from here: binaryfilecopy.sas
Update: using FCOPY in SAS 9.4
Updated: 18Sep2013
Within hours of my posting here, Vince DelGobbo reminded me about the new FCOPY function SAS 9.4. With two filerefs assigned to binary-formatted files, you can use FCOPY to copy the content from one to the other. When I first tried it with my examples, I had problems because of the way FCOPY treats logical record lengths. However, Jason Secosky (the developer for FCOPY and tons of other SAS functions) told me that if I use RECFM=N on each FILENAME statement, the LRECL would not be a problem. And of course, he was correct.
Here's my example revisited:
filename _bcin TEMP recfm=n /* RECFM=N needed for a binary copy */; filename _bcout "C:\temp\streaming.sas7bdat" recfm=n; proc http method="get" url="https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat" out=_bcin ; run; data _null_; length msg $ 384; rc=fcopy('_bcin', '_bcout'); if rc=0 then put 'Copied _bcin to _bcout.'; else do; msg=sysmsg(); put rc= msg=; end; run; filename _bcin clear; filename _bcout clear; |
22 Comments
This is very cool. I would like to use it to copy some .csv files from our Windows LAN to our Citrix/Unix server. Is this possible?
If so, how do I define the FILENAME to do that?
For importing/exporting excel files, using a proc import/export we use:
(although I realize the interface to PC files probably has different requirements than the Filename statement).
I always enjoy your blog posts - very informative!
Mary, it should be possible as long as your UNIX server can reach the Windows file location in some way. This might be a network mounted filesystem, or an FTP server (using FILENAME FTP), or a web server (using FILENAME URL). Here I'm assuming that your SAS is on Unix, while the files you need are in Windows.
You can also use EG (assuming you've got that in your Citrix environment) and the Copy Files task to achieve a similar end.
Mary,
I have the same task to implement, did you finally find how to do this ?
Thanks.
Glad to see that we have new functionality with SAS9.4 that makes coding easier.
One thing I find useful with the macro, you do not have to care for the RECFM option just a fileref to the location where the file sits whether it is disk, FTP, WebDAV etc is sufficient.
Hi ! I have tested the macro restoreTitles but for the line ...where type="T", only one row is select (macro variable SQLOBS resolves to 1) ! How can I fixed it (to have 2 rows select) ?
The log :
Simon,
I think you're missing the dash on the macro range:
That uses a feature from SAS 9.3. If you have SAS 9.2,
then you'll first have to do a SELECT COUNT to get the number of "T" records, then fill in the range. That technique is in this post.
Hi, Chris,
I got a question about bulk file copying done with SAS. The volume of files is over 1 million (tif image files averaging 50 kb each).
My data step to do this operation takes columns in a table for the file source and destination, then calls the file copy DOS command to do the actual copy. The reason it is done one at a time is because the source and destination paths are structured differently, and each of these files have their own source and destination.
Below is the line where the copy command is called (after confirming source path exists):
------------------------------------------------------------------------------------------------------
if (fileexist(Source)) then RC=system('copy /y ' !! Source !! ' ' !! Out_FilePath);
------------------------------------------------------------------------------------------------------
Last time it was run, the log showed the resulting times.
real time 16:42:48.68
cpu time 1:52:14.81
Looks like IO bottleneck, but just in case, maybe you know a more efficient SAS code to do this? I looked at SAS configuration, and memsize was set to 2GB and I don't know if I should be playing with those settings (this operation is done on a virtual server).
That's a ton of files! Since each file has a bit of overhead to copy, I'm not sure how to speed it up. Have you looked into specialized tools like Robocopy or TeraCopy?
If the file structure between source and destination were the same, I would use robocopy with the /mir (mirror) option. In fact, I already use it for another process where the source and destination structure are the same.
Right now the I have to access the SAS table and get the paths one by one (I'm not allowed to change the file name or the path structure). Had an idea of putting path information in the file name (so I don't have to lookup SAS table during copy) then changing it back after the copy is done. But that would just mean extra processing time spent on renaming files twice.
As for terracopy, my work does not allow me to download other copy tools.
Question for you, was I correct in assuming this is an IO issue given the SAS log times? The cpu time was how long SAS took to read through the table, and the rest was spent on the actual reading and writing of files?
real time 16:42:48.68
cpu time 1:52:14.81
Thanks for your input.
My uneducated guess is Yes -- that's I/O time. CPU time is calculations -- building filenames and such.
Hello Chris,
I successfully used this method for files in the first layer of a zip file with
filename _bcin "&in_dir.&fn" member="&memname.";
However members in a subfolder result in
ERROR: No logical assign for filename _BCIN.
ERROR: Error in the FILENAME statement.
Is there some tip for using filename with a zip file and selection a member in a subfolder within the zip file?
Thank you for any help you can give.
To reference a subfolder in the archive, add it to the MEMBER= value. Example: member="data/&memname".
See this post as an example. And this post for listing the complete contents of your ZIP file from SAS. Complete series of FILENAME ZIP and ODS PACKAGE articles.
Thank you for your reply.
The error messages occur when a / is in the &memname value:
filename _bcin zip "&in_dir.&fn" member="Split Files/82031 GA Design File.xls";
I've tried swapping with a \ but nothing seems to work.
The forward slash should be okay, but maybe the spaces in the name are an issue. You have to make sure your target name doesn't contain the slash too. Here's a sample SAS program that should handle at least one level of folders in the ZIP.
FCOPY, if I recall, does not change in separate operating systems, whereas system commands (Dos copy and Linux/Unix cp, obviously do). In a Windows 10 test, FCOPY (or Dos copy, for that matter) did NOT preserve the datetime metadata and, in fact, the "Date created" value was less than the "Date modified". This can be problematic in certain environments. Renaming using Dos rename did preserve the datetime metadata, so that suggests that zipping the file then moving the zip file then extracting could work, but a test showed that "Date created" then became "Date modified", which in this case was older. Note that the ZIP engine to the FILENAME also did not preserve these metadata; we discussed in the comments about using a System command and a third party compression program: https://blogs.sas.com/content/sasdummy/2015/05/11/using-filename-zip-to-unzip-and-read-data-files-in-sas/
Hello Chris,
I've got LEHD Employment Statistics files in csv.gz format. I extracted using 7zip, however Excel's row limit (1,048,576) restricts the data. To get the full records, I'd like to read it to SAS, can you please help? Thank you
I used:
filename year02_gz ZIP "file_path/mo_od_main_JT00_2002.csv.gz"
data mylib.Year02;
infile 7zip(mo_od_main_JT00_2002.csv);
input w_geocode 16.;
put _All_;
run;
You can do this with the GZIP option with FILENAME ZIP (see article here). Just add a GZIP on the end of your FILENAME statement.
Hi Chris. Is it also possible to upload a file into a website using SAS?
Yes, if the web site has an API for doing so. PROC HTTP with a POST or POUT method can be used to add content to sites that allow it.
Hi Chris, sorry to come to this years late but it sparked my interest. Your blog post "Using FILENAME ZIP to unzip and read data files in SAS" (https://blogs.sas.com/content/sasdummy/2015/05/11/using-filename-zip-to-unzip-and-read-data-files-in-sas/) has an example of reading a SAS data set in a ZIP file that I used successfully, but your reply to Andreas Menrath's 5/25/2015 note had a link to this post where you note that another method was much more robust. That made me nervous so I tried to apply the FCOPY method in this post to a ZIP file.
First, to verify the FCOPY code, I successfully copied a SAS data set from one location to another using the method who showed above, e.g.,
But, when I tried to use the same code to copy a SAS data set from a ZIP file to a SAS library, I got an error. As noted above, I used your code from the earlier blog post to successfully read from this small ZIP file, which I created manually in WIndows, so I am confident the ZIP file is valid.
I don't understand this error. Should I read the SAS data set in this ZIP file this way, or is the method in the earlier post sufficiently robust? Thanks so much for looking at this.
Hi Bruce, the DATA step binary copy method (read with INFILE then write to a FILE) is reliable I think. FCOPY() should work in theory, but I have seen cases where it doesn't work properly when trying to copy out of a ZIP member.
Thanks Chris, I'll stick to the DATA step binary copy method. And I might have another question in another ~7 years. ;-)