A new year’s resolution that sounds like more fun than a spinning class

6

Last December I taught a SAS Programming 1: Essentials class at Statistics Canada (Statcan). My class could barely contain their mirth while I valiantly struggled to find the semicolon on the French keyboard. Far cry from my first move to Canada (which is a bilingual country) and my excitement about practicing French learned at Alliance Française. Clearly it has gotten rusty as my recent keyboard experience proved. So when this January rolled along, I’m sure you can picture the resolve that first jumped to mind. Thanks to my humbling Statcan experience my lofty new year’s work aspiration is to teach a SAS course in French or maybe present to a user group in French!

Meanwhile here is a SAS resolution to consider for an easy data life:

Scrub data with the SOUNDS-LIKE operator

I have to confess this is a personal favourite. Having worked at Devry Institute of Technology, from personal experience, I’ve seen just how complex student registration data can get. So how did I go about finding all clients from the suburb of Mississauga without complex WHERE clauses? Take a look at my data with its many misspellings and the code I wrote to capture it.

The clever SOUNDS-LIKE operator (=*) uses the SOUNDEX algorithm to test whether a character variable contains a spelling variation of a word. It searches and selects character data based on two expressions: the search value and the matched value and brings up possible phonetic variations. So you too can find misspelled data easily with no complicated coding involved. You can find additional information on the SOUNDS-LIKE operator from this SUGI 29 paper.

As a musician I am constantly listening to other forms and rhythms. I’d like to leave you with a catchy beat played at parties over the holidays. From the island of Mauritius on the Indian Ocean here’s the Sega –sounds like fun, doesn’t it. Did you find the SOUNDS-LIKE operator useful? Isn’t it more fun to keep than that spinning class resolve? I’d love to hear any resolutions, SAS or otherwise that you might have made!

Tags
Share

About Author

Charu Shankar

Technical Training Specialist

Charu Shankar has been a Technical Training Specialist with SAS since 2007. She started as a programmer, and has taught computer languages, business and English Language skills. At SAS, Charu teaches the SAS language, SQL, SAS Enterprise guide and Business Intelligence. She interviews clients to recommend the right SAS training to help them meet their needs. She is helping build a center for special needs kids in this project. http://www.handicareintl.org/pankaja/pankaja.swf

6 Comments

  1. I think the SOUNDEX the SOUNDS-LIKE operator and its underlying SOUNDEX algorithm is not very effective and often leads to improper results.
    I would caution against using it, especially on family (last) names that are transliterations in the English from languages that do not use the Latin character set. It's limitations with non-English words is well documented in the literature.
    SAS has implemented much more powerful functions for computing measures of similarity/dissimilarity between two character strings that should be used instead of SOUNDEX.
    These include the SPEDIS (spelling distance) function added in SAS 8, and the COMPLEV and COMGED functions added in SAS 9. COMPLEV calculates the Levenshtine edit distance and COMPGED the generalized edit distance between two strings. And, the CALL COMPCOST routine allows the user to override the default penalty costs embedded in COMPGED, if needed.

  2. thanks Andrew for pointing out other functions to deal with non-English words. this may be a topic for a future post. Appreciate your comments..

  3. Having worked with Cobol, I know the feeling. Lucky, there's SAS now to help out with quick & easy searches through reams & reams of data!

Leave A Reply

Back to Top