A previous article discusses the MakeString function, which you can use to convert an IML character vector into a string. This can be very useful. When I originally wrote the MakeString function, I was disappointed that I could not vectorize the computation. Recently, I learned about the COMBL function in Base SAS, which enables me to rewrite the MakeString function in a vectorized manner. I will call the new function 'vec_to_str' so that it does not conflict with the name of the MakeString function.
Concatenating elements in a character vector
It isn't hard to concatenate the values of a character vector into a string. The hard(ish) part was doing in it a way that strips unnecessary blanks from the string. As a reminder, SAS IML vectors (like character variables in a SAS data set) pad values with blanks. To help understand where the blanks are located, you can define a helper function that uses the TRANWRD function to convert blanks to underscores, then prints the result, as follows:
proc iml; /* create a helper function: visualize blanks in a string by converting them to underscores */ start BlankPrint(x); name = parentname("x"); xBlanks = tranwrd(x,' ','_'); /* convert blanks to underscores */ print xBlanks[L=name]; finish; s = {"The","quick","brown","fox","jumped","over","the","lazy","dog"}; run BlankPrint(s); /* see that the elements are padded with blanks */ /* Goal: concatenate the elements into a single string */ c = rowcat( rowvec(s) ); /* attempt to concatenate a row vector of values */ run BlankPrint(c); /* Argh! Not correct! */ |
You can see two problems with this attempt. First, the longest element ("jumped") does not contain a blank after it. Consequently, there is no blank between the words "jumped" and "over" when you concatenate the elements. Second, the short words contain multiple blanks. Consequently, there are multiple blanks between most words in the final string.
You can fix both issues:
- Use the CAT function to add a blank after every word.
- Use the COMBL function to compress multiple blanks into a single blank. (I saw the COMPBL function in a macro by Michael Friendly. Thanks, Michael!)
Let's see what happens if we incorporate these fixes:
/* we need to add a blank to the end of every line and compress multiple blanks into one blank by using the COMPBL function */ c1 = cat(rowvec(s), ' '); /* add blank at end */ c2 = compbl(rowcat(c1)); /* concatenate values and compress blanks into one blank */ run BlankPrint(c2); |
The_quick_brown_fox_jumped_over_the_lazy_dog_ |
We are almost there! The only remaining issue is that we added a blank after the last word, but we shouldn't have. We can use the TRIM function to eliminate that blank. The following function converts a character vector to a string in a vectorized manner:
/* concatenate a character vector into a single string of blank-separated values */ start vec_to_str(s); c1 = cat(rowvec(s), ' '); /* add blank at end */ c2 = compbl(rowcat(c1)); /* concatenate values and compress blanks */ return trim(c2); /* remove the extra blank at the end of the string */ finish; str = vec_to_str(s); run BlankPrint(str); |
The_quick_brown_fox_jumped_over_the_lazy_dog. |
Success! You can use the vec_to_str function to convert a character vector to a string.
I will mention that the MakeString function supports arbitrary delimiters, not just blanks. If you need that extra functionality, you can use the TRANWRD function to replace the blanks with another character. This alternative version is shown in the Appendix.
Summary
In the IML language, you can convert a character vector into a string. However, the elements in a character vector are blank-padded, so a straightforward concatenation results in extra blanks in the string. You can use the COMPBL function in Base SAS to convert multiple blanks into a single blank, thus enabling you to create a vectorized function that converts a character vector to a string.
Appendix
The vec_to_str function results in a string in which words are separated by a single blank. If you want the option to use a different delimiter between words, you can use the following modification of the function:
/* concatenate a character vector into a single string of values separated by a specified delimiter. By default, the delimiter is a blank character. */ start vec_to_str(s, delim=' '); c1 = cat(rowvec(s), ' '); /* add blank at end */ c2 = compbl(rowcat(c1)); /* concatenate values and compress blanks */ c3 = trim(c2); /* remove the extra blank at the end of the string */ if delim=' ' then return c3; else return tranwrd(c3, ' ', delim); finish; str = vec_to_str(s); /* use blank as a delimiter */ run BlankPrint(str); str = vec_to_str(s, '/'); /* use '/' as a delimiter */ run BlankPrint(str); |
The_quick_brown_fox_jumped_over_the_lazy_dog. The/quick/brown/fox/jumped/over/the/lazy/dog. |
4 Comments
Rick,
I think you could use CATX() function in data step to get this job done.
proc iml;
start vec_to_str(s, delim=' ');
str=blankstr(200);
do i=1 to ncol(s)*nrow(s);
str=catx(delim,str,s[i]);
end;
return str;
finish;
s = {"The","quick","brown","fox","jumped","over","the","lazy","dog"};
str = vec_to_str(s); /* use blank as a delimiter */
print str;
str = vec_to_str(s,'/'); /* use '/' as a delimiter */
print str;
quit;
Yes. That loop is essentially what I implemented in the MakeStr function, except I used the '+' operator instead of CATX. The current article demonstrates that you can construct the string in a vectorized fashion without writing a loop.
Rick,
Yes.But if there are some blanks in values. Yours and me would get different result. E.X.
s = {"The","quick","brown and red","fox","jumped","over","the","lazy","dog"};
yours:
The/quick/brown/and/red/fox/jumped/over/the/lazy/dog
me:
The/quick/brown and red/fox/jumped/over/the/lazy/dog
Yes. Thank you for noticing that.