SAS author's tip: the OTHERS group

0

Gerhard Svolba is one of our busiest authors. His new book Data Quality for Analytics Using SAS just released and next Monday, June 18, he's participating in a live e-chat at 11 a.m. ET. In the meantime, I decided to feature our weekly SAS tip from Gerhard's first and also highly-regarded book Data Preparation for Analytics Using SAS. If you're interested in analytics in general, visit Gerhard's author page to learn more about him and his work, as well as to read a free chapter from each book.

The following excerpt is from SAS author Gerhard Svolba and his book "Data Preparation for Analytics Using SAS" Copyright © 2006, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. (please note that results may vary depending on your version of SAS software)

17.4.1 The OTHERS Group

A very frequent way to reduce the number of categories is to combine values into a so-called OTHERS group. In most cases these categories are assigned to the OTHERS group that has a low frequency.

PROC FREQ with the ORDER = FREQ option is an important tool for this task. Using the ProductMainGroup data from the preceding example we create the following output:

PROC FREQ DATA = codes ORDER = FREQ;
TABLE ProductMainGroup;
RUN;

We see that the ProductMainGroups 1 and 4 occur only once and we want to assign them to the OTHERS group. This can be done with the following statements:

FORMAT ProductMainGroupNEW $6.;
IF ProductMainGroup IN (‘1’ ‘4’) THEN ProductMainGroupNEW = ‘OTHERS’;
ELSE ProductMainGroupNEW = ProductMainGroup;

Note the following:

  • The FORMAT statement can be important here because otherwise the length of the newly created variable corresponds to the length of the first character value that is assigned to it. This might result in truncated category values such as “OTHE”.
  • The IN operator “IN (‘1’ ‘4’)” is much more efficient than a long row of OR expressions such as ProductMainGroup = 1 OR ProductMainGroup = 4.
  • In the case of many different categories the selection of the relevant groups can be difficult. Here the “matrix selection” of characters out of the Output window can be helpful.

The advantage of “OTHERS groups” is that a high number of potentially low frequent categories are combined into one group. This helps to speed up data preparation and analysis and also makes the interpretation of results easier.

 

Share

About Author

Shelly Goodin

Social Media Specialist, SAS Publications

Shelly Goodin is SAS Publications' social media marketer and the editor of "SAS Publishing News". She’s worked in the publishing industry for over thirteen years, including seven years at SAS, and enjoys creating opportunities for fans of SAS and JMP software to get to know SAS Publications' many offerings and authors.

Comments are closed.

Back to Top