In the past, the COMPRESS function was useful. Since SAS version 9, it has become a blockbuster, and you might not have noticed. The major change was the addition of a new optional parameter called MODIFIERS.
The traditional use of the COMPRESS function was to remove blanks or a list of selected characters from a character string. The addition of a MODIFIER argument does two things. First, you can specify classes of characters to remove, such as all letters, all punctuation marks, or all digits. That is extremely useful, but the addition of the 'k' modifier is why I used the term blockbuster in my description. The 'k' modifier flips the function from one that removes characters from a string to one that keeps a list of characters and removes everything else. Let me show you some examples.
This first example stems from a real problem I encountered while trying to read values that contained units. My data looked something like this:
ID Weight 001 100lbs. 002 59Kgs. 003 210LBS 004 83kg |
My goal was to create a variable called Wt that represented the person's weight in pounds as a numeric value.
First, let’s look at the code. Then, I’ll give an explanation.
data Convert; length ID $3 Weight $8; input ID Weight; Wt = input(compress(Weight,,'kd'),8.); /* The COMPRESS function uses two modifiers, 'k' and 'd'. This means keep the digits, remove anything else. The INPUT function does the character-to-numeric conversion. */ If findc(Weight,'k','i') then Wt = Wt * 2.2; /* the FINDC function is looking for an upper or lowercase 'k' in the original character string. If found, it converts the value in kilograms to pounds (note: 1 kg = 2.2 pounds). */ datalines; 001 100lbs. 002 59Kgs. 003 210LBS 004 83kg ; title "Listing of Data Set Convert"; footnote "This program was run using SAS OnDemand for Academics"; proc print data=Convert noobs; run; |
The program reads the value of Weight as a character string. The COMPRESS function uses 'k' and 'd' as modifiers. Notice the two commas in the list of arguments. A single comma would interpret 'kd' as the second argument (the list of characters to remove). Including two commas notifies the function that 'kd' is the third argument (modifiers). You can list these modifiers in any order, but I like to use 'kd', and I think of it as "keep the digits." What remains is the string of digits. The INPUT function does the character-to-numeric conversion.
Your next step is to figure out if the original value of Weight contained an upper or lowercase 'k'. The FINDC function can take three arguments: the first is the string that you are examining, the second is a list of characters that you are searching for, and the third argument is the 'i' modifier that says, "ignore case" (very useful).
If the original character string (Weight) contains an uppercase or lowercase 'k', you convert from kilograms to pounds.
Here is the output:
There is one more useful application of the COMPRESS function that I want to discuss. Occasionally, you might have a text file in ASCII or EBCDIC that contains non-printing characters (usually placed there in error). Suppose you want just the digits, decimal points (periods), blanks, and commas. You need to read the original value as a text string. Let's call the original string Contains_Junk. All you need to convert these values is one line of code like this:
Valid = compress(Contains_Junk,'.,','kdas'); |
In this example, you are using all three arguments of the COMPRESS function. As in pre-9 versions of SAS, the second argument is a list of characters that you want to remove. However, because the third argument (modifiers) contains a 'k', the second argument is a list of characters that you want to keep. In addition to periods and commas, you use modifiers to include all digits, uppercase and lowercase letters (the 'a' modifier - 'a' for alpha), and space characters (these include spaces, tabs, and a few others such as carriage returns and linefeeds). If you did not want to include tabs and other "white space" characters, you could rewrite this line as:
Valid = compress(Contains_Junk,'., ','kda'); |
Here you are including a blank in the second argument and omitting the 's' in the modifier list.
You can read more about the COMPRESS function in any of the following books, available from SAS Press as an e-book or from Amazon in print form:
Or my latest programming book:
Questions and/or comments are welcome.
3 Comments
Ron,
Helpful article. Thank you!
Shouldn't the second code snippet include an "a" in the third argument - like so:
Valid = compress(Contains_Junk,'., ','kda');
Kathleen
Hi Kathleen. You got me! Yup, there should be an 'a' in the modifiers list.
Thanks to you, the blog has been corrected