A vector-to-string function for SAS IML

0

A previous article discusses the MakeString function, which you can use to convert an IML character vector into a string. This can be very useful. When I originally wrote the MakeString function, I was disappointed that I could not vectorize the computation. Recently, I learned about the COMBL function in Base SAS, which enables me to rewrite the MakeString function in a vectorized manner. I will call the new function 'vec_to_str' so that it does not conflict with the name of the MakeString function.

Concatenating elements in a character vector

It isn't hard to concatenate the values of a character vector into a string. The hard(ish) part was doing in it a way that strips unnecessary blanks from the string. As a reminder, SAS IML vectors (like character variables in a SAS data set) pad values with blanks. To help understand where the blanks are located, you can define a helper function that uses the TRANWRD function to convert blanks to underscores, then prints the result, as follows:

proc iml;
/* create a helper function: visualize blanks in a string by converting them to underscores */
start BlankPrint(x);
   name = parentname("x");
   xBlanks = tranwrd(x,' ','_'); /* convert blanks to underscores */
   print xBlanks[L=name];
finish;
 
s = {"The","quick","brown","fox","jumped","over","the","lazy","dog"};
run BlankPrint(s);          /* see that the elements are padded with blanks */
 
/* Goal: concatenate the elements into a single string */
c = rowcat( rowvec(s) );    /* attempt to concatenate a row vector of values */
run BlankPrint(c);          /* Argh! Not correct! */

You can see two problems with this attempt. First, the longest element ("jumped") does not contain a blank after it. Consequently, there is no blank between the words "jumped" and "over" when you concatenate the elements. Second, the short words contain multiple blanks. Consequently, there are multiple blanks between most words in the final string.

You can fix both issues:

  • Use the CAT function to add a blank after every word.
  • Use the COMBL function to compress multiple blanks into a single blank. (I saw the COMPBL function in a macro by Michael Friendly. Thanks, Michael!)

Let's see what happens if we incorporate these fixes:

/* we need to add a blank to the end of every line and
   compress multiple blanks into one blank by using the COMPBL function */
c1 = cat(rowvec(s), ' ');   /* add blank at end */
c2 = compbl(rowcat(c1));    /* concatenate values and compress blanks into one blank */
run BlankPrint(c2);
The_quick_brown_fox_jumped_over_the_lazy_dog_

We are almost there! The only remaining issue is that we added a blank after the last word, but we shouldn't have. We can use the TRIM function to eliminate that blank. The following function converts a character vector to a string in a vectorized manner:

/* concatenate a character vector into a single string of blank-separated values */
start vec_to_str(s);
   c1 = cat(rowvec(s), ' ');   /* add blank at end */
   c2 = compbl(rowcat(c1));    /* concatenate values and compress blanks */
   return trim(c2);            /* remove the extra blank at the end of the string */
finish;
 
str = vec_to_str(s);
run BlankPrint(str);
The_quick_brown_fox_jumped_over_the_lazy_dog.

Success! You can use the vec_to_str function to convert a character vector to a string.

I will mention that the MakeString function supports arbitrary delimiters, not just blanks. If you need that extra functionality, you can use the TRANWRD function to replace the blanks with another character. This alternative version is shown in the Appendix.

Summary

In the IML language, you can convert a character vector into a string. However, the elements in a character vector are blank-padded, so a straightforward concatenation results in extra blanks in the string. You can use the COMPBL function in Base SAS to convert multiple blanks into a single blank, thus enabling you to create a vectorized function that converts a character vector to a string.

Appendix

The vec_to_str function results in a string in which words are separated by a single blank. If you want the option to use a different delimiter between words, you can use the following modification of the function:

/* concatenate a character vector into a single string of values separated
   by a specified delimiter. By default, the delimiter is a blank character. */
start vec_to_str(s, delim=' ');
   c1 = cat(rowvec(s), ' ');   /* add blank at end */
   c2 = compbl(rowcat(c1));    /* concatenate values and compress blanks */
   c3 = trim(c2);              /* remove the extra blank at the end of the string */
   if delim=' ' then 
      return c3;
   else 
      return tranwrd(c3, ' ', delim);
finish;
 
str = vec_to_str(s);        /* use blank as a delimiter */
run BlankPrint(str);
 
str = vec_to_str(s, '/');   /* use '/' as a delimiter */
run BlankPrint(str);
The_quick_brown_fox_jumped_over_the_lazy_dog. 
 
The/quick/brown/fox/jumped/over/the/lazy/dog.
Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top