Standardizing phone numbers using SAS

2

How many times have you entered a phone number on a web page, only to be told that you did not type it the "correct" form? I find that annoying. Don't you?

In my latest book, Cody's Data Cleaning Techniques, 3rd edition, I show how to convert a phone number in any form and convert it to a standard form. Because not everyone will buy and read this book (well, I could hope), I thought I would share this technique with you on this blog.

Let's start out with a small data set (Numbers) that contains standard 10 digit US phone numbers. You can run the DATA step yourself, if you wish to "follow along."

data Numbers;
   input Phone $15.;
datalines;
(908)123-4567
8007776666
888.555.8765
#(210) 567-9451
;

 

Yes, this looks like a mess. Look how easy it is to extract the digits from the number and, if you wish, convert the number into the common form, for example: (###)###-####.

data Convert;
   set Numbers;
   length Digits $ 10 Standard $ 14;
   Digits = compress(Phone,,'kd');
   Standard = cats('(',substr(Digits,1,3),')',substr(Digits,4,3),
                   '-',substr(Digits,7,4));
run;

 

The key to this program is the COMPRESS function with the two modifiers 'k' (keep) and 'd' (digits) as the third argument to the function.  SAS added a third argument to the older COMPRESS function starting with SAS 9.  This argument allows you to specify modifiers such as 'a' (all upper- and lowercase letters – alpha), 'd' (all digits), and 'k' (keep the specified characters and remove everything else).  The variable Digits is a character variable that contains only the digits in the variable Phone (see listing below).  To create the variable Standard, you concatenate all the components necessary to create the required standard form.  The CATS function concatenates all of its arguments after first stripping leading and trailing blanks.  You could have used the standard || or !! to perform this operation but I like to show off some of the great SAS functions.  By the way, be sure to specify the length of the variable Standard, because if you don't, the default length for the result of any of the CATS function is 200.

Here is a listing of data set Convert:

Keep the COMPRESS function in mind when you encounter problems similar to this one.

Share

About Author

Ron Cody

Private Consultant

Dr. Ron Cody was a Professor of Biostatistics at the Rutgers Robert Wood Johnson Medical School in New Jersey for 26 years. During his tenure at the medical school, he taught biostatistics to medical students as well as students in the Rutgers School of Public Health. While on the faculty, he authored or co-authored over a hundred papers in scientific journals. His first book, Applied Statistics and the SAS Programming Language, was first published by Prentice Hall in 1985 and is now in its fifth edition. Since then, he has published over a dozen books on SAS programming and statistical analysis using SAS. His latest book, A Gentle Introduction to Statistics Using SAS Studio was published this year. Ron has presented numerous papers at SAS Global forums, regional conferences, as well as local user groups. He is presently a contract instructor for SAS Institute and continues to write books on SAS and statistical topics.

Related Posts

2 Comments

  1. Hi. another idea (phone converted to a numeric variable) ...

    proc format;
    picture phone (default=13)
    low-high = '999)999-9999' (prefix='(');
    run;

    data numbers;
    set numbers (rename=(phone=temp));
    phone=input(compress(temp,,'kd'),10.);
    format phone phone.;
    drop temp;
    run;

Back to Top