How many times have you entered a phone number on a web page, only to be told that you did not type it the "correct" form? I find that annoying. Don't you?
In my latest book, Cody's Data Cleaning Techniques, 3rd edition, I show how to convert a phone number in any form and convert it to a standard form. Because not everyone will buy and read this book (well, I could hope), I thought I would share this technique with you on this blog.
Let's start out with a small data set (Numbers) that contains standard 10 digit US phone numbers. You can run the DATA step yourself, if you wish to "follow along."
data Numbers; input Phone $15.; datalines; (908)123-4567 8007776666 888.555.8765 #(210) 567-9451 ; |
Yes, this looks like a mess. Look how easy it is to extract the digits from the number and, if you wish, convert the number into the common form, for example: (###)###-####.
data Convert; set Numbers; length Digits $ 10 Standard $ 14; Digits = compress(Phone,,'kd'); Standard = cats('(',substr(Digits,1,3),')',substr(Digits,4,3), '-',substr(Digits,7,4)); run; |
The key to this program is the COMPRESS function with the two modifiers 'k' (keep) and 'd' (digits) as the third argument to the function. SAS added a third argument to the older COMPRESS function starting with SAS 9. This argument allows you to specify modifiers such as 'a' (all upper- and lowercase letters – alpha), 'd' (all digits), and 'k' (keep the specified characters and remove everything else). The variable Digits is a character variable that contains only the digits in the variable Phone (see listing below). To create the variable Standard, you concatenate all the components necessary to create the required standard form. The CATS function concatenates all of its arguments after first stripping leading and trailing blanks. You could have used the standard || or !! to perform this operation but I like to show off some of the great SAS functions. By the way, be sure to specify the length of the variable Standard, because if you don't, the default length for the result of any of the CATS function is 200.
Here is a listing of data set Convert:
Keep the COMPRESS function in mind when you encounter problems similar to this one.
2 Comments
Hi Mike.
Cute, but I prefer simple solutions.
Hi. another idea (phone converted to a numeric variable) ...
proc format;
picture phone (default=13)
low-high = '999)999-9999' (prefix='(');
run;
data numbers;
set numbers (rename=(phone=temp));
phone=input(compress(temp,,'kd'),10.);
format phone phone.;
drop temp;
run;