Formatted or unformatted, that is the question

2

In my previous post, I began a discussion of lists by explaining that big problems can be solved by breaking the problem into smaller pieces. The solutions to those smaller problems can then be used in combination to solve other problems. In this post, I will talk about formatting and unformatting the lists.

Remember that I use a space separated list, so most of my macros expect this and return a list in this format (where applicable). However, I often need to format the list with different delimiters and/or quotations for presentation purposes. Having the ability to apply and remove this formatting gives me the flexibility to format when necessary and still be able to use all my regular list functions by unformatting.

Here is the %formatList() macro:

%MACRO formatList(list, quote=%STR(), listquote=%STR(), delimiter=%STR(, ));
   %LOCAL return i;
   %LET i = 1;
   %DO %WHILE(%LENGTH(%QSCAN(&list, &i, %STR( ))) GT 0);
         %LET return = &return.&quote.%QSCAN(&list,&i,%STR( ))"e;
       %LET i = %EVAL(&i + 1);
       %IF %LENGTH(%QSCAN(&list,&i,%STR( ))) GT 0 %THEN %LET return=&return.&delimiter;
   %END;
   %UNQUOTE(&listquote.&return.&listquote.)
%MEND;

The arguments are as follows:
&delimiter specifies an output delimiter, default is a comma.
&quote specifies an output quotation to put around each element.
&listquote specifies an output quotation to put around the entire list.

The implementation is pretty straightforward. The %formatList() macro iterates through the elements in the list, adding delimiters and quotes as specified.

Before I move to unformatting, I first would like to share a couple of useful macros that I found. (If I find a macro that does what I want in an elegant way, I add it to my library and cite the source.) These two macros work with regular expressions and were taken from Lei Zhang’s paper, Add the Power of Regular Expressions to SAS Macro.
(For the purposes of this post, it is enough to know that regular expressions are a system for defining text patterns and finding how and where they occur in text.)

Zhang’s first macro:

The %PRXMatch () macro function takes a Perl matching operator (m//), an input string, and an optional action. If action is ‘P’, it returns the position (>=1) where the first match for the regex pattern was found in the string; or returns 0 if no match is found. If action is `E’, it extracts the first matched substring; if no match, it return blanks.

For example:

   %PRXMatch(/He/, Hello); [Returns 1, meaning matched]
   %PRXMatch(/He/, world); [Returns 0, meaning no matched]
   %PRXMatch(/He/, Hello, E); [Returns He.]

   ()

*/
%Macro PRXMATCH(Regex, Source, Action);
   %local RegexID RtnVal;
   %let RegexID=%sysfunc(PRXPARSE(&Regex));
   %let RtnVal=0;
   %if RegexID > 0 %then %do;
   %if %length(&Action)=0 %then %let Action=P;
   %let Action=%upcase(&Action);
   %if &Action=P %then %do;
   %let rtnval=%sysfunc(PRXMatch(&RegexID, &Source));
   %end; %else %if &Action=E %then %do;
   %local Pos Len;
   %let Pos=0;
   %let Len=0;
   %syscall PRXSUBSTR (RegexID, Source, Pos, Len);
   %let RtnVal=;
   %if &Pos > 0 %then %do;
   %let RtnVal=%substr(&Source, &Pos, &Len);
   %end;
   %syscall PRXFREE(RegexID);
   %end; %else %do;
   %let RtnVal=;
   %put Unknown action: &Action;
   %end;
   %end; %else %do;
   %put Errors found in the pattern: &Regex;
   %end;
   &RtnVal
%Mend;

Zhang’s second macro:

The %PRXChange () macro function takes a Perl substitution operator (s///), and an input string as parameters. It returns the fully replaced string if any matches for the regex pattern were found in the string. If not, it returns the original string.

For example:

   %PRXChange(s/or/ou/, world); [Returns would]
   %PRXChange(s/ar/ou/, world); [No match, Returns world]

*/
%Macro PRXCHANGE(Regex, Source);
   %local RegexID RtnVal;
   %let RegexID=%sysfunc(PRXPARSE(&Regex));
   %if RegexID > 0 %then %do;
   %let RtnVal=%sysfunc(PRXCHANGE(&RegexID,-1, &Source));
   %syscall PRXFREE(RegexID);
   %end; %else %do;
   %put Errors found in the pattern: &Regex;
   %end;
   &RtnVal
%Mend;

The unformatting list function

This macro removes formatting from a stringed list of elements. (You don't have to specify quotes, list quotes or delimiters.) The macro parses the list using regular expressions and looks for patterns that look like a list with punctuation characters as possible delimiters and quotes.

For example:

   %PUT 1 - %unformatList(%STR("|item1|, |item2|, |item3|"));
   %PUT 2 - %unformatList('|item1|, |item2|, |item3|');
   %PUT 3 - %unformatList("|item1|, |item2|, |item3|");
   %PUT 4 - %unformatList(%STR("|child_id|, |adult_id|, |open_date|"));
   %PUT 5 - %unformatList("A, B");
   %PUT 6 - %unformatList((A, B));
   %PUT 7 - %unformatList((A));
   %PUT 8 - %unformatList(A );
   %PUT 9 - %unformatList( );
   %PUT 10 - %unformatList();

*/
%MACRO unformatList(list);
   %LOCAL return;
   %LET return = %PRXChange(s/\s?[[:punct:]]?[[:punct:]]?(\w+)[[:punct:]]?\s?[[:punct:]]?\s?/$1 /, &list);
   %LET return = %stripp(&return);
   &return
%MEND;

The main line is where I call the %PRXChange() function. It basically looks for elements that may or may not have punctuation at the front and back and punctuation around the whole list of elements. It just returns the elements, which in effect strips any delimiters, element quotes and list quotes. Note the use of [:punct:], which is a character class. It is like a shortcut for the following set of characters:

-. , " ' ? ! ; : # $ % & ( ) * + / < > = @ [ ] \ ^ _ { } | ~

So %unformatList() will only work if the formatting is comprised of characters from that set. If you want to be able to format and unformat with other characters, then I would suggest making your own unformatting function that allows the caller to specify the quotes and delimiters. That way it will know exactly what to look for at the appropriate positions in the list. I only use the characters in [:punct:] as formatting so %unformatList() is a neat and simple tool that works in my development environment.

I'm going to leave you with a few more list functions (You can find all of the programs covered in this post at DontSASme.com):
%unduplicateList - unduplicates a list of stringed elements.
%unionList - returns the union of two lists.
%differenceList -returns the difference between two lists.
%isEmptyList - checks whether or not a list is empty.

Again, these are pretty simple, but - used in different combinations - they can be very powerful! In the posts to follow, I will talk more about tools we have created - remember that we are laying a foundation. Keeping these tools simple and well-defined is key to being able to rely on them when we need them.

Next time I will talk about a function that allows you to perform an operation on each element in a list, and we’ll get to some applications of these tools. Please let me know how you are using this information.

Share

About Author

Praveen Sawh

Programmer, Database Architect

Praveen Sawh, Programmer, Database Architect, is a programmer at the University of Toronto. He is in charge of getting big data from different sources to play nice and work together so that meaningful real-world questions can be answered. His treatment of choice is a healthy dose of Base SAS and a few open source solutions.

2 Comments

    • Awsome! Thanks for sharing! If any1 else has solutions they have please feel free to share. The more ideas we have the more things we have to draw from and remix into our own.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top