Standardizing phone numbers using SAS

2

How many times have you entered a phone number on a web page, only to be told that you did not type it the "correct" form? I find that annoying. Don't you?

In my latest book, Cody's Data Cleaning Techniques, 3rd edition, I show how to convert a phone number in any form and convert it to a standard form. Because not everyone will buy and read this book (well, I could hope), I thought I would share this technique with you on this blog.

Let's start out with a small data set (Numbers) that contains standard 10 digit US phone numbers. You can run the DATA step yourself, if you wish to "follow along."

data Numbers;
   input Phone $15.;
datalines;
(908)123-4567
8007776666
888.555.8765
#(210) 567-9451
;

 

Yes, this looks like a mess. Look how easy it is to extract the digits from the number and, if you wish, convert the number into the common form, for example: (###)###-####.

data Convert;
   set Numbers;
   length Digits $ 10 Standard $ 14;
   Digits = compress(Phone,,'kd');
   Standard = cats('(',substr(Digits,1,3),')',substr(Digits,4,3),
                   '-',substr(Digits,7,4));
run;

 

The key to this program is the COMPRESS function with the two modifiers 'k' (keep) and 'd' (digits) as the third argument to the function.  SAS added a third argument to the older COMPRESS function starting with SAS 9.  This argument allows you to specify modifiers such as 'a' (all upper- and lowercase letters – alpha), 'd' (all digits), and 'k' (keep the specified characters and remove everything else).  The variable Digits is a character variable that contains only the digits in the variable Phone (see listing below).  To create the variable Standard, you concatenate all the components necessary to create the required standard form.  The CATS function concatenates all of its arguments after first stripping leading and trailing blanks.  You could have used the standard || or !! to perform this operation but I like to show off some of the great SAS functions.  By the way, be sure to specify the length of the variable Standard, because if you don't, the default length for the result of any of the CATS function is 200.

Here is a listing of data set Convert:

Keep the COMPRESS function in mind when you encounter problems similar to this one.

Share

About Author

Ron Cody

Private Consultant

Ron Cody, EdD is a retired professor from the Robert Wood Johnson Medical School. He now works as a private consultant and a national instructor for SAS Institute Inc. A SAS user since 1977, Ron's extensive knowledge and innovative style have made him a popular presenter at local, regional, and national SAS conferences. He has authored or co-authored numerous books, as well as countless articles in medical and scientific journals.

Related Posts

2 Comments

  1. Hi. another idea (phone converted to a numeric variable) ...

    proc format;
    picture phone (default=13)
    low-high = '999)999-9999' (prefix='(');
    run;

    data numbers;
    set numbers (rename=(phone=temp));
    phone=input(compress(temp,,'kd'),10.);
    format phone phone.;
    drop temp;
    run;

Leave A Reply

Back to Top