How to perform an operation on all numeric or all character variables in a SAS data set

2
View Ron Cody's Author Page

In a previous blog, I demonstrated a program and macro that could identify all numeric variables set to a specific value, such as 999. This blog discusses an immensely useful technique that allows you to perform an operation on all numeric or all character variables in a SAS data set. For example, you could convert all numeric variables with a value of 999 to a SAS missing value and all character values to uppercase.

In order to demonstrate how to accomplish these two tasks, first run the program below to create a data set called Original. This data set contains numeric values of 999 and some character values in either lowercase or mixed case:

Creating a Data Set Called Original

***Create data set Original;
data Original;
   input X Y A $ X1-X3 Z $;
datalines;
1 2 x 3 4 5 Y
2 999 y 999 1 999 J
999 999 R 999 999 999 X
1 2 yes 4 5 6 No
;

 

Below is a listing of data set Original:

Listing of Data Set Original

The program below searches a SAS data set for all numeric values equal to 999 and converts these values to a SAS missing value.  The program also converts all character values to uppercase.

Converting all Values of 999 to Missing and converting all Character Values to Uppercase

data New;
   set Original;
   array Nums[*] _numeric_;
   array Chars[*] _character_;
   do i = 1 to dim(Nums);
      if Nums[i] = 999 then Nums[i] = .;
   end;
 
   do i = 1 to dim(Chars);
      Chars[i] = upcase(Chars[i]);
   end;
   drop i;
run;

 

You use the two keywords _NUMERIC_ and _CHARACTER_ in the ARRAY statements to start the ball rolling.  When these keywords are used in a DATA step, they refer to all of the numeric or character variables at that point in the DATA step.  This is an important fact.  If you had placed the two ARRAY statements before the SET statement, the two arrays would not contain any variables.

In defining the two arrays, you can use an asterisk (*) instead of the number of elements in the array.  But, because you didn't want to count how many numeric and character variables were in data set Original, you use the DIM (stands for dimension) function that returns the number of elements in an array   You then use this value as the upper limit of your DO loop.

The output is shown below:

Notice that the original values of 999 are now missing values and all the character values are in uppercase.  Please think of this technique any time you need to perform an operation on all numeric or character variables in a SAS data set.

Share

About Author

Ron Cody

Private Consultant

Dr. Ron Cody was a Professor of Biostatistics at the Rutgers Robert Wood Johnson Medical School in New Jersey for 26 years. During his tenure at the medical school, he taught biostatistics to medical students as well as students in the Rutgers School of Public Health. While on the faculty, he authored or co-authored over a hundred papers in scientific journals. His first book, Applied Statistics and the SAS Programming Language, was first published by Prentice Hall in 1985 and is now in its fifth edition. Since then, he has published over a dozen books on SAS programming and statistical analysis using SAS. His latest book, A Gentle Introduction to Statistics Using SAS Studio was published this year. Ron has presented numerous papers at SAS Global forums, regional conferences, as well as local user groups. He is presently a contract instructor for SAS Institute and continues to write books on SAS and statistical topics.

2 Comments

  1. HI,
    How can I do proc means in database in only a numeric variable and proc freq in only caractrial variable in big database (3000 variable).

    Thank you

  2. Thanks Ron. Nice and very useful but you could also, in this context, share the tip on how to use implicit arrays that are easier.
    data New;
    set Original;
    array _Nums _numeric_;
    array _Chars _character_;
    do over _Nums; if _Nums=999 then _Nums=.; end;
    do over _Chars; _Chars = upcase(_Chars); end;
    run;

Back to Top