Being able to access information about your operating system from within SAS can be incredibly useful. For example, if you have a directory that contains an unknown number of CSV files that you would like to read into SAS, you certainly would not want to have to write an IMPORT procedure for each of the files. It would be great if SAS just knew the names of all the files as well how many there were. That would make this task much less tedious.
Fortunately, there is a straightforward way to accomplish this task within SAS. One of the easiest ways to gather information from your operating system from within SAS is to use a pipe (|). Pipes enable your SAS application to do the following:
- Receive input from any operating-system command that writes to standard output.
- Route output to any operating-system command that reads from standard input.
In other words, a pipe can be used to capture the output of an operating-system command and put that output into SAS, which gives you the ability to easily run PROC IMPORT for each CSV file in your directory.
Before we take a look at an example, let’s look at the syntax needed to use a pipe. To use a pipe on a directory-based operating system, you just need to issue a FILENAME statement with the following syntax:
FILENAME fileref PIPE 'operating-system-command' option-list;
Here is additional information about the syntax above:
fileref: Can be any valid fileref, as described in Referencing External Files.
PIPE: Is the device-type keyword that tells SAS that you want to use an unnamed pipe.
‘operating-system-command’: Is the name of an operating-system command, executable program, or shell script that you want to route output to or that you want to read input from. You must enclose the command in quotation marks.
option-list: Can be any of the options that are valid in the FILENAME statement, such as the LRECL= or RECFM= options. Options that affect the operating-system command can also be listed here.
So let’s take a look at a pipe in action.
In the following example, I have an unknown number of CSV files in the MYFILES subdirectory. I want to read in each file and create an output data set that has the same name as the CSV file. Also, I am only interested in the files that were created within the last year.
To do this import, I am going to use a pipe, which enables me to use the DIR operating-system command to list all the files in the directory. The output of the DIR command shows the filename and its create date. Based on the create date, I can pass in each file =name to a macro that contains PROC IMPORT. Then, PROC IMPORT reads in each file and outputs it to a data set with the desired name.
Here is the code:
Note: The code below contains numbers that are enclosed in parentheses; these numbers correspond to further explanation below the code block.
%let mydir=c:\myfiles; (1) %macro mymac(fname,dsname); (2) proc import datafile="&mydir\&fname" out=&dsname dbms=csv replace; getnames=yes; run; %mend; filename myfiles pipe "dir &mydir /T:C"; (3) data _null_; infile myfiles truncover; input; (4) if index(lowcase(_infile_),'.csv') then do; (5) date=input(scan(_infile_,1,' '),mmddyy10.); (6) fname=substr(_infile_,38); (7) dsname=compress(tranwrd(strip(scan(fname,1,'.')),'_',' '),,'p'); (8) dsname=tranwrd(strip(dsname),' ','_'); if intck('year',date,today(),’c’)<1 then do; (9) call=cats('%mymac(',fname,',',dsname,')'); call execute(call); (10) end; end; run;
- The MYDIR macro variable lists the directory that contains the CSV files.
- The MYMAC macro contains the PROC IMPORT code needed to read in the CSV files. The macro contains the parameters FNAME and DSNAME. FNAME is the name of the CSV file being read in, and DSNAME is the name of the output data set being created.
- The FILENAME statement uses a pipe, which enables me to use the DIR operating-system command. This command shows the filename and the create date. Note that the ‘/T:C’ option is needed to show the create date. The default date returned by the DIR operating-system command is the last-modified date.
- The INPUT statement does not list any variables, so it loads the entire record into the input record buffer.
- _INFILE_ is an automatic variable in the DATA step that contains the contents of the current input record buffer. This buffer is created by the INPUT statement. In this example, I use the INDEX function to determine if the file has a .csv extension.
- When you run the DIR command, dates are always the first value listed. I use the SCAN function to extract the date and the INPUT function to convert the extracted date to a SAS date format.
- Before you can extract the filename, you have to search the output from the DIR command to find the column number that contains the filename value. In this example, the filename starts in the 38th column of the output file. The SUBSTR function specifies this column number and enables me to extract the filename.
- DATA-step logic is used to create the output data-set name. To make sure that there are no invalid characters in the filename like spaces and punctuation other than an underscore, I use the TRANWRD and STRIP functions and specify what characters should not be included.
- I use the INTCK function to select values that contain a create date that are a year or less from today’s date before I execute the macro.
- CALL EXECUTE invokes the MYMAC macro. CALL EXECUTE enables me to pass in the values of FNAME and DSNAME as the values of the macro’s parameters and conditionally execute the macro based on DATA-step logic.
As this example shows, using a pipe to capture the output of the DIR operating-system command can be extremely useful. In this example, it enables me to dynamically run code based on an unknown number of files that meet a certain condition. Using a pipe to get the file information for you can drastically reduce the amount of time it would take to perform this task manually.