Tips for concatenating strings in SAS/IML

0
Last week, as part of an article on how spammers generate comments for blogs, I showed how to generate random messages by using the CATX function in the DATA step. In that example, the strings were scalar quantities, but you can also concatenate vectors of strings in the SAS/IML language. However, there are some problems that need to be handled when concatenating vectors. In this post I describe the following:
  • how to get rid of spaces when you concatenate strings
  • how to insert spaces (or other delimiters) between strings

A canonical application of concatenation is combining the first and last names of individuals to form the full name. For example, the following SAS/IML statements define vectors that contains the first and last names of three famous mathematicians. You can use the SAS/IML CONCAT function or the string concatenation operator (+) to concatenate the names. This example explicitly concatenates a space between the names:

proc iml;
first = {"C."    "Isaac"  "Leonhard"};
last  = {"Gauss" "Newton" "Euler"};
name = first + " " + last;   /* concatenate with space between first and last */
print name;
t_concat1

As I mentioned in my article on how to understand SAS/IML character vectors, there are actually several blank characters between the first and last names of Gauss and Newton. If you use the PrintChars module from my previous post, you see the following:

/* define or load the PrintChars module here... */
run PrintChars(name);
t_concat2

These blank characters appear because the array of first names (first) is a character array in which all elements have the same length. In this case, all elements have length 8, which is the number of characters in the longest name, "Leonhard." Shorter strings are padded with trailing blanks.

Vectorized approach to trim blanks

Although the SAS/IML language and the SAS DATA step language are similar, string concatenation in SAS/IML has some complexities that are not present in the DATA step. In the DATA step, you can use the TRIM function (or the STRIP function) to get rid of blanks. Unfortunately, when you apply these functions to a matrix, they don't solve the problem. The matrix that is returned by trim(first) is exactly the same as first because after TRIM strips off the trailing blanks, the trimmed strings are assembled into a matrix of length 8, which re-adds the trailing blanks!

So how can you get rid of the trailing blanks? One vectorized approach is to use the RIGHT function to right-align the first names prior to concatenating them with the last names. Of course, that will result in strings with leading spaces, so you then need to use the LEFT function to get rid of leading blank. This approach gets messy when you are concatenating many vectors.

I scratched my head over this problem for a long time. For a while I even abandoned trying to use a vectorized approach; I just iterated over the elements of the vectors and concatenated scalar quantities. (Mea culpa!) Then one day I remembered that you can call Base SAS functions from SAS/IML and pass in matrices as arguments. Can the CATX function, which solves the problem for scalar quantities, also solve the concatenation problem for vector quantities? Let's see:

name = catx(" ", first, last);   /* insert space delimiter between names */
run PrintChars(name);
t_concat3

Success! The concatenated strings have trailing blanks, and in every case the first name is separated from the last name by exactly one space. Once again, Base SAS functions help me to solve a problem that involves vectors! From now on I will use the CATX function to concatenate vectors of strings when I want to insert a space between strings.

The first argument to the CATX function specifies the delimiter that is inserted between strings. You can use that argument to insert commas, slashes, or any other delimiters between strings. You can even specify the "null string" (two quotation marks with no space between them) to concatenate strings when you do not want to insert a delimiter.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top