I know a lot of you have been programming in SAS for a long time, which is awesome! However, when you do something for a long time, sometimes you get set in your ways and you miss out on new ways of doing things.
Although the COUNT and CAT functions have been around for a while now, I see a lot of customer code that is counting and concatenating text strings the "old-fashioned" way. In this article, I would like to introduce you to the COUNT, COUNTW, CATS and CATX functions. These functions make certain tasks much simpler, like counting words in a string and concatenating text together.
Counting words or text occurrences
First let's take a look at the COUNT and COUNTW functions.
The COUNT function counts the number of times that a specified substring appears within a character string.
COUNT Syntax:
COUNT(string, substring <, modifier(s)>)
The string is the text you are searching and the substring is the text you are searching for. The COUNT function is useful for when the text you are searching for is part of another word or connected series of characters. For example, in the following example I am using the COUNT function to count how many times the text ‘inc’ appears in the list.
Data a; Contributors='The Big Company INC, The Little Company, ACME Incorporated, Big Data Co, Donut Inc.'; Num=count(contributors,'inc','i'); /* the 'i' modifier means to ignore case*/ Put num=; Run; |
When we examine the SAS log, we can see that NUM has a value of 3.
Num=3 NOTE: The data set WORK.A has 1 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.01 seconds
The COUNTW function counts the number of words in a character string.
COUNTW Syntax:
COUNTW(<string> <, <character(s)> <, <modifier(s)>>>)
string is the value in which you are counting words. The character(s) argument is the delimiter between the words in string.
COUNTW is the easiest way to count the number of words in a text string list. Before the COUNTW function, you had to use logic similar to the following to loop through a list of values.
/* DON'T USE - use COUNTW instead */ data a(drop=done i); x='a#b#c#d#e'; do until(done); i+1; y=scan(x,i,'#'); if y='' then done=1; else output; end; run; |
I realize this code isn't terrible, but I try to avoid DO UNTIL/WHILE loops if I can. There is always the possibility of going into an infinite loop.
The COUNTW function eliminates the need for a DO UNTIL/WHILE loop.
Here is an example of logic that I use all the time. In this example, I have a macro variable that contains a list of values that I want to loop through. I can use the COUNTW function to easily loop through each file listed in the resolved value of &FILE_NAMES. The code then uses the file name on the DATA statement and the INFILE statement.
%let file_names=01JAN2018.csv 01FEB2018.csv 01MAR2018.csv 01APR2018.csv; %macro test(files); %do i=1 %to %sysfunc(countw(&file_names,%str( ))); %let file=%scan(&file_names,&i,%str( )); data _%scan(&file,1,.); infile "c:\my files\&file"; input region $ manager $ sales; run; %end; %mend; %test(&file_names) |
The log is too large to list here, but you can see one of the generated DATA steps in the MPRINT output of this snapshot of the log.
MPRINT(TEST): data _01JAN2018; MPRINT(TEST): infile "c:\my files\01JAN2018.csv"; MPRINT(TEST): input region $ manager $ sales; MPRINT(TEST): run;
This data step will be generated for each file listed.
Counting strings within another text string should be easy to do. The COUNT functions definitely make this a reality!
Concatenating strings in SAS
Now that we know how to COUNT text in SAS, let me show you how to CAT in SAS with the CATS and CATX functions.
Back in the old days, I had hair(!) and we concatenated text strings using double pipes syntax.
X=var1||var2||var3;
This syntax is not too bad, but what if VARn has trailing blanks? Prior to SAS Version 9 you had to remove the trailing blanks from each value. Also, if the text was right justified, you had to left justify the text. This complicates the syntax:
X=trim(left(var1))||trim(left(var2))||trim(left(var3));
You can now accomplish the same thing using CATS. The CATS function removes leading and trailing blanks, and returns a concatenated character string.
CATS Syntax:
CATS(item-1 <, ..., item-n>)
The CATS function makes the task much easier to code.
data a; length var1 var2 var3 $12; var1='abc'; var2='123'; var3='xyz'; x=cats(var1,var2,var3); put x=; run; |
VAR1-VAR3 have a length of 12, which means each value contains trailing blanks. By using the CATS function, all trailing blanks are removed and the text is concatenated without any spaces between the text. Here is the result of the above PUT statement.
x=abc123xyz
Another common need when concatenating text together is to create a delimited string. This can now be done using the CATX function. The CATX function removes leading and trailing blanks, inserts delimiters, and returns a concatenated character string.
CATX Syntax:
CATX(delimiter, item-1 <, ... item-n>)
Prior to the CATX function, each delimiter had to be inserted into the syntax. For example:
X=trim(left(var1))||','||trim(left(var2))||','||trim(left(var3));
Using the CATX function simplifies this:
data a; length var1 var2 var3 $12; var1='abc'; var2='123'; var3='xyz'; x=catx(',',var1,var2,var3); put x=; run; |
This syntax creates a comma separated list with all leading and trailing blanks removed. Here is the result of the PUT statement.
x=abc,123,xyz
Just how the COUNT functions making counting text in SAS easier, the CAT functions make concatenating strings so much easier. For more explanation and examples of these CAT* functions, see this paper by Louise Hadden, Purrfectly Fabulous Feline Functions (because they are CAT functions, get it?).
Before I let you go, let me point out that in addition to the COUNT, COUNTW, CATS and CATX functions, there are also the COUNTC, CAT, CATQ and CATT functions that provide even more functionality. These functions are not used as often, so I haven't discussed them here. Please refer to our online documentation for more details about these functions.
1 Comment
Might also mention that the CAT functions work with numeric data (and there are no messages in the log).
Also, there is a way to concatenate the FORMATTED values of variables that does not involve a CAT function. Try this and look at the results with PROC PRINT.
proc format;
value ag low-<13.5 = '0' other = '1';
value ht low-<64.15 = '0' other = '1';
value wt low-<107.25 = '0' other = '1';
run;
data males (drop=sex);
length htwt $3;
set sashelp.class (where=(sex eq 'M'));
put @1 age ag. height ht. weight wt. @;
htwt = _file_;
run;