If you use formatted variables in a CLASS statement in procedures such as MEANS or UNIVARIATE, SAS will use the formatted values of those variables and not the internal values. For example, consider you have a data set (Health) with variables Subj, Age, Weight, and Height. You want to see the mean Weight and Height for three age groups: 0 to 40, 41-70, and greater than 70. A program to create the Health data set and compute the means is shown below.
Data Health; input Subj $ Age Weight Height; datalines; 001 23 120 72 002 35 200 80 003 77 98 63 004 51 230 75 005 15 98 54 006 44 160 68 007 83 111 65 008 10 . 45 009 60 200 . ; proc format; value AgeGrp 0-40 = '0 to 40' 41-70 = '41 to 70' 71-high = '>70'; run; proc means data=Health n nmiss mean; class Age; format Age AgeGrp.; var Weight Height; run; |
Here is the output:
Notice that there was no need to run another DATA step and create a variable representing age ranges. Not only is this an easier way to get these results—it is also much more efficient.
The other day, I received an email from a person and he posed the following question. How can I see the mean of the variable Sales (in the SASHELP data set Shoes) for three ranges of Sales: 0 to 100,000; 100,000 to 200,000; 200,000 and above?
My first thought was to create a variable (called Range) and then use that variable in a CLASS statement with PROC MEANS. However, after some thought, I wondered if I could use the Sales variable in the CLASS statement (and supply a format) and use the same variable in the VAR statement. As is often the case, the best way to answer a SAS programming question is to write a short program and see what happens. Here is what I suggested:
proc format; value SalesRange 0 - <100000 = 'Lower' 100000 - <200000 = 'Middle' 200000 - high = 'Upper'; run; proc means data=sashelp.Shoes n nmiss mean std; class Sales; format Sales SalesRange.; var Sales; run; |
Here is the output:
As you can see from the output, it worked just fine.
Before we leave this topic, let's use the first example, referencing the Health data set, to demonstrate a Multi-Label Format. Suppose you want to see means of Weight and Height for two different ranges of Age. One range is the original 0–40, 41–70, and 70+; the other is 0–50 and 51+. Here is a way to accomplish this using a single format:
proc format; value AgeGrp (multilabel) 0-40 = '0 to 40' 41-70 = '41 to 70' 71-high = '>70' 0-50 = '0 to 50' 51-high = '>50'; run; proc means data=Health n nmiss mean; class Age / MLF; format Age AgeGrp.; var Weight Height; run; |
You need to add the option "multilabel" following the format name on the VALUE statement. Without this option, you will get an error message if you have overlapping ranges. Next, use the option MLF (multi-label format) on the CLASS statement. Here is the output:
You now have the mean Weight and Height for two separate ranges of Age.
Consider using a formatted variables in CLASS statements for those procedures that allow this statement. It will save you time and reduce CPU time as well.
If you liked this blog, then you might like my latest book: Cody’s Collection of SAS Blogs. It gathers all of my best tips and tricks from my blogs in one place for easy reference.
2 Comments
The URL to my "Blog" book is: https://www.amazon.com/Codys-Collection-SAS-Blogs-Cody/dp/B09YR61K7X/
By the way, there are quite a few blogs in the book that have not been published on my SAS blog. I hope you enjoy it.
There was an agreement with SAS Press that SAS would help publicize the "Blog" book. Each of the blogs references one of my SAS Press books. I hope this gets approved soon.