It’s an understatement to say there are many Base SAS procedures!

Some procedures may be used for basic report writing. Other procedures may be used to perform statistical analysis. Some have similar functions. Others are unique in the output that they can produce. Which procedure you choose generally depends on the type of output you are trying to generate—with perhaps a bit of personal preference sprinkled into the mix

I often get calls from SAS users who are trying sort through the options and thought a blog post illustrating a few alternatives might help you choose the procedure that’s the best fit for your needs. Here are a few common choices for calculating frequency, percentages and a few other simple statistics, but you can certainly use other Base SAS procedures or DATA step processing to perform these calculations. I’ve also included a few notes on customizing calculations and output.

It’s helpful to note that Base procedures have specific keywords to refer to statistics. For future reference, you might want to bookmark this table of common procedures and the simple statistics.

### Calculating frequency

If you need to generate basic frequency (N) or sum reports, you can use a number of Base procedures.

**PROC PRINT** allows you to get a frequency count within a BY group and across the entire data set. In addition, PROC PRINT can create summarized values of numeric variables, also within a BY group and for the entire data set.

proc sort data=sashelp.class out=class; by sex; run; proc print data=class noobs sumlabel='Subtotal' grandtotal_label='Grand Total'; by sex; var name age height weight; sum height weight; run; |

**PROC REPORT** allows for more customized grouping and display of variable values, and it supports the computation of new variables within COMPUTE blocks.

proc report data=sashelp.class nowd; column sex age height weight bmi; define sex / group; define age / group; define height / sum; define weight / sum; define bmi / computed format=8.2; compute bmi; bmi=(weight.sum/(height.sum)**2)*703; endcomp; run; |

**PROC FREQ** is another procedure that outputs basic frequency counts. This procedure will group like variable values together and return the frequency count for the grouping. PROC FREQ also has the ability to create an output data set.

proc freq data=sashelp.class; tables age*sex / out=new outpct; run; proc print data=new; run; |

If you want to get the distinct count of a variable’s values, you can use PROC FREQ with the NLEVELS option.

proc freq data=sashelp.class nlevels; tables age; run; |

### Calculating percentages

**PROC FREQ**, by default, outputs percentages for multi-way tables, representing overall, row, and column percents.

proc freq data=sashelp.class; tables age*sex; run; |

**PROC TABULATE** outputs comparable percentages using the following statistic keywords: PCTN, ROWPCTN, and COLPCTN.

proc tabulate data=sashelp.class; class age sex; table age*(n pctn rowpctn colpctn) all*(n rowpctn), sex all; run; |

PROC TABULATE has the added ability to generate more advanced denominator definitions. You will find the SAS Global Forum 2013 paper** **Tips for Generating Percentages Using the SAS® TABULATE Procedure helpful.

**PROC REPORT** uses the PCTN statistic to generate a column percentage. Other custom percentages and be computed in PROC REPORT using COMPUTE blocks.

proc report data=sashelp.class nowd; column age sex,(n pctn); define age / group; define sex / across; define pctn / format=percent8.2 'Col %'; run; |

### Calculating other statistics

SAS Base procedures MEANS, SUMMARY, REPORT and TABULATE can calculate many statistics as highlighted in the table of common procedures and the simple statistics.

**PROC TABULATE** and **PROC REPORT** have a report-friendly tabular structure.

proc tabulate data=sashelp.class; class age; var height weight; table age, (height weight)*(sum mean min max); run; |

**PROC SUMMARY **or **PROC MEANS** are recommended if you need to create an output data set for your requested statistics. These two procedures are essentially the same except for a few defaults:

- PROC SUMMARY does not create printed output by default, but PROC MEANS does.
- Another difference is if you omit the VAR statement, PROC SUMMARY creates a simple frequency count of observations, but PROC MEANS analyzes all numeric variables that are not listed on other statements.

proc summary data=sashelp.class; class age; var height weight; output out=stats sum= mean= min= max= / autoname; run; proc print data=stats; run; |

### Customizing calculations, summaries and output

If your output needs to include customized summaries using IF/THEN logic, then PROC REPORT is the procedure to choose with its’ COMPUTE blocks and LINE statements. The SAS Samples below illustrate how to:

- Flag a row and add a conditional footnote at the end of a page with PROC REPORT
- Create multiple summary rows without using a LINE statement

Finally, any of these procedures can be customized and output to any destination, including Excel, RTF, PDF and HTML, using the Output Delivery System (ODS). Here is an example to Demonstrate the use of banding in PROC TABULATE.

## 17 Comments

Hello,

thanks for this very useful comparison of the different procedures. It is easy to read and to understand.

Annette

Thanks for your comments. I am glad you found it helpful.

Kathryn

I have been using proc Freq for more than 30 plus years, and it never failed me until this past week.

I am dealing with a new project which requires me to create a Data profile for a variety of MS SQL tables creating such stats as nmiss, min, mean, max, range, etc for every variable in the SQL table.

Well, I found out that Proc Freq failed when I encountered a variable with more than 17 million unique values and I did not have enough memory on my SAS Grid environment to create "count and percent" for this variable.

It would be nice to have an option within Proc Freq to get these stats but in a slower manner when necessary. Instead, I had to do a Proc sort and then process the data twice to do my own count.

Then because this was taking so long to sort the dataset when I had numerous variables, I implemented a conditional macro to check to see if the "count (distinct "&single_variable)" was greater than 15 million values ( I arbituarily decied on 15 instead of 17 million) and go back to sorting the dataset for any of those variables which had more levels than 15 million.

So, if you can persuade Proc FREQ R&D people to handle any size number of levels for a one-way FREQ table, I would appreciate that.

Just to let you know, my Data profile for each table and each variable looks like this - it would be nice to have a proc that would do this automatically:

Library Name Member Name Member Type DBMS Member Type Data Set Label Data Set Type Date Created Date Modified Number of Physical Observations Observation Length Number of Variables Type of Password Protection Compression Routine Encryption Number of Pages Size of File Percent Compression Reuse Space Bufsize Number of Deleted Observations Number of Logical Observations Longest variable name Longest label Maximum number of generations Generation number Data Set Attributes Type of Indexes Data Representation Name of Collating Sequence Sorting Type Charset Sorted By Requirements Vector Data Representation Name Data Encoding Audit Trail Active? Audit Before Image? Audit Admin Image? Audit Error Image? Audit Data Image? Number of Character Variables Number of Numeric Variables

JDE_PATH EDM_CLAIMS_2010_APLUSPROP DATA DATA 18SEP14:13:55:26 18SEP14:13:55:26 11928602 713 35 --- CHAR NO 398363 3263397888 63 no 8192 0 11928602 24 24 0 . ON NATIVE 181F101133220033330102310133012333001D0000200301 SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 latin1 Western (ISO) no no no no no 27 8

LibPath table_name Num_Obs Var_Name Var_Label Var_Type Var_Length Var_Format Var_InFmt _Var_Num_ nmiss distinct_cnt min mean max range sum _TOP_1 _TOP_2 _TOP_3 _TOP_4 _TOP_5 __RECORD__1 __RECORD__2 __RECORD__3 __RECORD__4 __RECORD__5 __RECORD__6 __RECORD__7 __RECORD__8 __RECORD__9 __RECORD__10

MS_SQL_TABLE 11928602 ALLCLAIMKEY ALLCLAIMKEY N 8 20 20 1 0 11928602 1 5964301.5 11928602 11928601 7.11458E+13 1 | 1 2 | 1 3 | 1 4 | 1 5 | 1 1 2 3 4 5 6 7 8 9 10

Regards,

A Long Time SAS User - like 37+ years

Charles Patridge

Charles,

Thanks for your comments. This type of question would be best addressed by SAS Technical Support where you could submit your specific problem and log. If you have a

comment or question about the blog post, include that here.

The PROC FREQ documentation has a useful section on Computational Resources that discusses memory requirements.

Thanks,

Kathryn

thanks for that.

it would have been useful to have sample output.

Thanks for the suggestion to include output.

ah - good to know - thanks

Running SAS9.3 I'm getting an error on the first example - this appears to be the problem:

sumlabel='Subtotal' grandtotal_label='Grand Total';

The SUMLABEL= and GRANDTOTAL_LABEL= options are new for SAS 9.4. The code samples were written with SAS 9.4 since that is the most current release available. The other samples are generic enough to work in any of the SAS 9 releases.

Thanks for posting this useful summary for getting simple statistics.

Thanks. I am glad you found it helpful.

Useful tips!

Do you know of a routine that will read in ANY data set, and summarise character and numeric data appropriately (without having to specify individual variable names)?

In PROC MEANS, if you do not specify a VAR statement, all numeric variables will be analyzed. In PROC FREQ, if you do not use a TABLES statement, you will get a one-way frequency table for every variable in the data set. In any procedure, you can use the _CHARACTER_ keyword to represent all character variables in a data set and the _NUMERIC_ keyword to respresent all numeric variables.

Hope this helps,

Kathryn

The Proc freq and proc univariate is a common procedures which we use for statistics... Love this article reviewing the basic common statistics procedures 🙂

Thank you for the feedback, Ravi. I'm always glad to hear what kind of content SAS users find useful!

Christina

Editor, SAS Users blog

Nice/useful post!

I also like to use Proc Sql (which ships with Base SAS) to calculate basic statistics 🙂

Thanks. With all the procedures and DATA step, there are so many ways to accomplish the same thing!