The list goes on and on -- how to get it all done!

0

Continuing with our stringed list theme, in this post I would like to talk about applying functions to those lists. So far we have developed quite a few tools that can manipulate lists in different ways, but what good are they if we can’t do something meaningful with the lists? We want to be able to do anything that we can normally do to one thing (be it a variable, table, number, or file) to each element in a list.

Here is my current version of this macro. It applies the specified function to each element in the list, collects the return values and returns them as a corresponding list.

%MACRO applyFunctionList(list, function);
   %LOCAL return i current_element current_functioncall list function;
   %LET return = %STR( );
   %LET i = 1;
   %DO %WHILE (%LENGTH(%QSCAN(&list, &i, %STR( ))) GT 0);
       %LET current_element = %stripp(%QSCAN(&list, &i, %STR( )));
       %LET current_functioncall = %SYSFUNC(TRANWRD(&function,
                                       %BQUOTE(||element||),
                                       %BQUOTE(&current_element) ));
       %LET return = &return &current_functioncall;
       %LET i = %EVAL(&i + 1);
   %END;
   %stripp(&return)
%MEND;

When passing a function in, it is important to use the %NRSTR() function so that nothing within it is resolved before its time. Also, because we are specifying a function in a general way, not specifically to any particular element, there is a convention that the macro assumes the following: the occurrence of the text “||element||” signals where the actual elements will be inserted prior to execution. In other words, wherever “||element||” occurs in the specified function, it will be replaced with the actual elements, one by one, and then run.

So for example, running:

%applyFunctionList(list=%STR(test test2 test3 test43),
          function=%NRSTR(%LENGTH(||element||)) );

Here is a variation of that macro. This one essentially does the same thing, the difference is that it does not collect return values. It is designed for use when only the side effects are desired, i.e. not expecting a return. It is useful in certain situations where %applyFunctionList() does not work because the resulting list is too big and violates the maximum macro variable length rule:
%MACRO applyFunctionList_noReturn(list, function);
   %LOCAL return i current_element current_functioncall list function;
   %LET return = %STR( );
   %LET i = 1;
   %DO %WHILE (%LENGTH(%QSCAN(&list, &i, %STR( ))) GT 0);
       %LET current_element = %stripp(%QSCAN(&list, &i, %STR( )));
       %SYSFUNC(TRANWRD(&function,
               %BQUOTE(||element||),
               %BQUOTE(&current_element) ))
       %LET i = %EVAL(&i + 1);
   %END;
%MEND;


Example 1: Adding missing variables to a data set

Suppose that we had a list of variables that are expected to be in a dataset. We might have information about them stored in some meta table (we’ll get more into that later on in the series), but for now let’s say we can just list them out and that they are all numeric.

Assume that we are getting this dataset from a third party that has simply omitted fields that do not apply. Unfortunately, we have a bunch of code that will not work unless all the expected fields are present.

One quick solution is to simply add those variables to the table with a null value. Depending on your situation this may or may not be a feasible solution, but it is a base from which you can build. You can add all the missing variables in one DATA step by putting the call to %applyFunctionList within it, using %differenceList() that I shared last time to determine the missing variables:

DATA buy_modified;
   SET sashelp.buy;
   %applyFunctionList(list = %differenceList(%STR(buy date amount type source currency), %LOWCASE(%varlist(sashelp.buy))),
                      function = %NRSTR(
                           ||element|| = .;
                       )
                      );
RUN;

If your meta table had information such as data type, then we could pull that in to assign the appropriate null value for each variable. We’ll do more stuff like that later on.

Example 2: Recoding missing data
This is an example that you will do for yourself! So I guess it is more like an assignment, but it’s fun. I want you to modify all the variables that begin with “S” in the data set sashelp.Qtr111 so that missing data is recoded as 0. HINT: look at the %getFilteredList() macro I have provided.

Example 3: Importing a directory list into a data set

In this example we are going to make a tool that imports all the CSVs in a specified directory into the WORK library.

Listing the contents of a directory.
First I will share with you a function with two helper functions that lists the contents of a given directory. I’ve also used the regular expressions tools I shared last time. Note that we could use a list function to process the list rather than implementing a filtering feature in the %listDir macro. In many ways that would be a more modular approach, but there is a limitation with our current list framework - elements with spaces are not supported. All our functions interpret a space as a delimiter so an element with a space within it would be seen as two elements and not one. SAS doesn’t allow spaces in dataset names or variable names. However, Windows supports filenames and folder names with spaces so this can be a problem if you’re working with a lot of those types of filenames. One solution is to modify the tools to use a delimiter that will never be encountered within an element. You can also enforce a certain type of quoting scheme. For now I will continue with the space separated element approach to keep things simple:

/*
  Returns whether or not a specified path is a directory.

  EXAMPLE:
    %PUT %isDir(%STR(C:\WINDOWS_DNE));
    %PUT %isDir(%STR(C:\WINDOWS));
    %PUT %isDir(%STR(c:\temp));
*/
%MACRO isDir(path);
  %LOCAL return fileref rc d_id;

  %LET return = 0;

  %IF %SYSFUNC(FILEEXIST(&path)) %THEN %DO;
    %LET rc = %SYSFUNC(FILENAME(fileref, &path));
    %LET d_id = %SYSFUNC(DOPEN(&fileref));
    %LET rc = %SYSFUNC(DCLOSE(&d_id));
    %IF &d_id > 0 %THEN %LET return = 1;
  %END;

  &return
%MEND;

/*
  Returns whether or not a specified path is a file.

  EXAMPLE:
    %PUT %isFile(%STR(C:\WINDOWS_DNE));
    %PUT %isFile(%STR(C:\WINDOWS));
    %PUT %isFile(%STR(c:\temp));
*/
%MACRO isFile(path);
  %LOCAL return fileref;

  %LET return = 0;

  %IF %SYSFUNC(FILEEXIST(&path)) AND ^%isDir(&path) %THEN %LET return = 1;

  &return
%MEND;

/*
  Lists the contents of a specified path. Shows contained files and folders by default.
  Can specify to show either all files (&filesOnly) or all directories (&directoriesOnly).

  EXAMPLE:
    %PUT %listDir(%STR(C:\WINDOWS_DOES_NOT_EXIST));
    %PUT %listDir(%STR(C:\WINDOWS));
    %PUT %listDir(%STR(C:\WINDOWS), filesOnly=1);
    %PUT %listDir(%STR(C:\WINDOWS), directoriesOnly=1);
    %PUT %listDir(%STR(C:\WINDOWS), filesOnly=1, directoriesOnly=1);
    %PUT %listDir(%STR(C:\WINDOWS), regex=%NRSTR(..exe$));
*/
%MACRO listDir(path, regex, filesOnly=0, directoriesOnly=0);
  %LOCAL return fileref rc d_id num_members member_name;

  %LET return = %STR();

  %IF &filesOnly AND &directoriesOnly %THEN
    %PUT ERROR: filesOnly and directoriesOnly cannot both be 1;
  %ELSE %IF ^%SYSFUNC(FILEEXIST(&path)) %THEN
    %PUT ERROR: &path DOES NOT EXIST.;
  %ELSE %IF ^%isDir(&path) %THEN
    %PUT ERROR: &path IS NOT A DIRECTORY.;
  %ELSE %DO;
    %LET rc = %SYSFUNC(FILENAME(fileref, &path));
    %LET d_id = %SYSFUNC(DOPEN(&fileref));
      %LET num_members = %SYSFUNC(DNUM(&d_id));
      %DO i = 1 %TO &num_members;
      %LET member_name = %SYSFUNC(DREAD(&d_id, &i));
      /* Append to return list if member type matches specified filters.
        (Longer processing time than if in sepearate %IFs but a lot cleaner).*/
      %IF (%PRXMatch(/&regex/, %STR(&member_name))) AND
        ((&filesOnly AND %isFile(&path/&member_name)) OR
        (&directoriesOnly AND %isDir(&path/&member_name)) OR
        (^&filesOnly AND ^&directoriesOnly))
      %THEN
        %LET return = &return &member_name;
    %END;
    %LET rc = %SYSFUNC(DCLOSE(&d_id));
  %END;

  &return
%MEND;


Importing the list into a dataset.
Back to the importing tool we are creating - we first think of what we would do if we just wanted to import one CSV. I am using a generic PROC IMPORT here just as an example. It works, but it is not perfect. We will develop a more robust import process later on:
      PROC IMPORT DATAFILE="C:\sampleCSVs\example.csv"
           OUT=WORK.example
           DBMS=CSV
           REPLACE;
           GETNAMES=YES;
       RUN;

Next we know that we want to call this from within %applyFunctionList() so we modify all the things specific to our example.csv to refer to ||element|| instead:
      PROC IMPORT DATAFILE="&path\||element||.csv"
           OUT=WORK.||element||
           DBMS=CSV
           REPLACE;
           GETNAMES=YES;
       RUN;

Extracting the filenames from the pathname. Then we can add in the code to extract the CSV file names from the given path. Here is the full macro:
%MACRO importAllCSVs(path);
   %LOCAL fullfilenames filenames i;

   %LET fullfilenames = %listDir(&path, regex=%NRSTR(..csv$), filesOnly=1);
   %LET filenames = %applyFunctionList(list=&fullfilenames,
                           function=%NRSTR(%SUBSTR(||element||, 1, %EVAL(%INDEX(||element||, %STR(.csv))-1))));

   /*    Walk through each CSV file. */
   %applyFunctionList_noReturn(list=&filenames,
                      function=%NRSTR(
                           %PUT Importing "&path\||element||.csv";

                           PROC IMPORT DATAFILE="&path\||element||.csv"
                               OUT=WORK.||element||
                               DBMS=CSV
                               REPLACE;
                               GETNAMES=YES;
                           RUN;
                      ));
%MEND;

I’ve used another call to %applyFunctionList() to extract the name portion of the files (excluding the extension). I’ve included a folder with subfolders, each with sample CSVs, in the accompanying files for this post. You can run the macro on it and see how it works. Try refactoring the macro to use only one %applyFunctionList() call, rather than using two separate calls.

It is interesting to note that you can call this macro from within another invocation of %applyFunctionList():

%applyFunctionList_noReturn(list=%STR(1004 1100 1202),
                  function=%NRSTR(%importAllCSVs(C:\sampleCSVs\||element||));
                             );

This will import all the CSVs from each of the three folders: “C:\sampleCSVs\1004”, “C:\sampleCSVs\1100” and “C:\sampleCSVs\1202”.

Try it yourself!
You can find this post’s files at http://dontsasme.com/SASUserBlogSeries/Post3/ . I’ve included necessary tools from previous posts as well. Play around with these ideas in your SAS playground. I hope you are starting to see the real world applications of this approach.

Keep in mind that we are still in the foundation phase. We are laying down the framework from which we will build really neat things. As always, I encourage you to take in as many ideas as you can, see what they are doing objectively and blend the things that are useful to you so that you can create your own.

Additional resources
Here is some reference material on main system calls used in Example 3:
FILEEXIST
FILENAME
DOPEN

Share

About Author

Praveen Sawh

Programmer, Database Architect

Praveen Sawh, Programmer, Database Architect, is a programmer at the University of Toronto. He is in charge of getting big data from different sources to play nice and work together so that meaningful real-world questions can be answered. His treatment of choice is a healthy dose of Base SAS and a few open source solutions.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top