Which Base procedure is best for simple statistics?

17

It’s an understatement to say there are many Base SAS procedures!

Some procedures may be used for basic report writing. Other procedures may be used to perform statistical analysis. Some have similar functions. Others are unique in the output that they can produce. Which procedure you choose generally depends on the type of output you are trying to generate—with perhaps a bit of personal preference sprinkled into the mix

I often get calls from SAS users who are trying sort through the options and thought a blog post illustrating a few alternatives might help you choose the procedure that’s the best fit for your needs.  Here are a few common choices for calculating frequency, percentages and a few other simple statistics, but you can certainly use other Base SAS procedures or DATA step processing to perform these calculations. I’ve also included a few notes on customizing calculations and output.

It’s helpful to note that Base procedures have specific keywords to refer to statistics. For future reference, you might want to bookmark this table of common procedures and the simple statistics.

Calculating frequency

If you need to generate basic frequency (N) or sum reports, you can use a number of Base procedures.

PROC PRINT allows you to get a frequency count within a BY group and across the entire data set. In addition, PROC PRINT can create summarized values of numeric variables, also within a BY group and for the entire data set.

proc sort data=sashelp.class out=class;                   
by sex;                                                
run;                         
 
proc print data=class noobs                               
     sumlabel='Subtotal' grandtotal_label='Grand Total';   
by sex;                                                
var name age height weight;                            
sum height weight;                                     
run;

PROC REPORT allows for more customized grouping and display of variable values, and it supports the computation of new variables within COMPUTE blocks.

proc report data=sashelp.class nowd;
column sex age height weight bmi;
define sex / group;
define age / group;
define height / sum;
define weight / sum;
define bmi / computed format=8.2;
 
compute bmi;
bmi=(weight.sum/(height.sum)**2)*703;
endcomp;
run;

PROC FREQ is another procedure that outputs basic frequency counts.  This procedure will group like variable values together and return the frequency count for the grouping.  PROC FREQ also has the ability to create an output data set.

proc freq data=sashelp.class;
tables age*sex / out=new outpct;
run;
 
proc print data=new;
run;

If you want to get the distinct count of a variable’s values, you can use PROC FREQ with the NLEVELS option.

proc freq data=sashelp.class nlevels;
tables age;
run;

Calculating percentages

PROC FREQ, by default, outputs percentages for multi-way tables, representing overall, row, and column percents.

proc freq data=sashelp.class;
tables age*sex;
run;

PROC TABULATE outputs comparable percentages using the following statistic keywords:  PCTN, ROWPCTN, and COLPCTN.

proc tabulate data=sashelp.class;
class age sex;
table age*(n pctn rowpctn colpctn) all*(n rowpctn), sex all;
run;

PROC TABULATE has the added ability to generate more advanced denominator definitions.  You will find the SAS Global Forum 2013 paper Tips for Generating Percentages Using the SAS® TABULATE Procedure helpful.

PROC REPORT uses the PCTN statistic to generate a column percentage. Other custom percentages and be computed in PROC REPORT using COMPUTE blocks.

proc report data=sashelp.class nowd;
column age sex,(n pctn);
define age / group;
define sex / across;
define pctn / format=percent8.2 'Col %';
run;

Calculating other statistics

SAS Base procedures MEANS, SUMMARY, REPORT and TABULATE can calculate many statistics as highlighted in the table of common procedures and the simple statistics.

PROC TABULATE and PROC REPORT have a report-friendly tabular structure.

proc tabulate data=sashelp.class;
class age;
var height weight;
table age, (height weight)*(sum mean min max);
run;

PROC SUMMARY or PROC MEANS are recommended if you need to create an output data set for your requested statistics. These two procedures are essentially the same except for a few defaults:

  • PROC SUMMARY does not create printed output by default, but PROC MEANS does.
  • Another difference is if you omit the VAR statement, PROC SUMMARY creates a simple frequency count of observations, but PROC MEANS analyzes all numeric variables that are not listed on other statements.
proc summary data=sashelp.class;
class age;
var height weight;
output out=stats sum= mean= min= max= / autoname;
run;
 
proc print data=stats;
run;

Customizing calculations, summaries and output

If your output needs to include customized summaries using IF/THEN logic, then PROC REPORT is the procedure to choose with its’ COMPUTE blocks and LINE statements.  The SAS Samples below illustrate how to:

Finally, any of these procedures can be customized and output to any destination, including Excel, RTF, PDF and HTML, using the Output Delivery System (ODS).  Here is an example to Demonstrate the use of banding in PROC TABULATE.

Share

About Author

Kathryn McLawhorn

Principal Technical Support Analyst

Kathryn McLawhorn has worked in Technical Support at SAS since 1997. She started as a consultant in the Base Procedures and ODS group, and she is currently a consultant in the CAS/Open Source Languages and SAS Programming group. She primarily supports report writing, ODS, and Base summary procedures. Kathryn has her SAS Certification in both Base Programming for SAS 9 and Advanced Programming for SAS 9.

17 Comments

  1. Hello,
    thanks for this very useful comparison of the different procedures. It is easy to read and to understand.
    Annette

  2. I have been using proc Freq for more than 30 plus years, and it never failed me until this past week.

    I am dealing with a new project which requires me to create a Data profile for a variety of MS SQL tables creating such stats as nmiss, min, mean, max, range, etc for every variable in the SQL table.

    Well, I found out that Proc Freq failed when I encountered a variable with more than 17 million unique values and I did not have enough memory on my SAS Grid environment to create "count and percent" for this variable.

    It would be nice to have an option within Proc Freq to get these stats but in a slower manner when necessary. Instead, I had to do a Proc sort and then process the data twice to do my own count.

    Then because this was taking so long to sort the dataset when I had numerous variables, I implemented a conditional macro to check to see if the "count (distinct "&single_variable)" was greater than 15 million values ( I arbituarily decied on 15 instead of 17 million) and go back to sorting the dataset for any of those variables which had more levels than 15 million.

    So, if you can persuade Proc FREQ R&D people to handle any size number of levels for a one-way FREQ table, I would appreciate that.

    Just to let you know, my Data profile for each table and each variable looks like this - it would be nice to have a proc that would do this automatically:

    Library Name Member Name Member Type DBMS Member Type Data Set Label Data Set Type Date Created Date Modified Number of Physical Observations Observation Length Number of Variables Type of Password Protection Compression Routine Encryption Number of Pages Size of File Percent Compression Reuse Space Bufsize Number of Deleted Observations Number of Logical Observations Longest variable name Longest label Maximum number of generations Generation number Data Set Attributes Type of Indexes Data Representation Name of Collating Sequence Sorting Type Charset Sorted By Requirements Vector Data Representation Name Data Encoding Audit Trail Active? Audit Before Image? Audit Admin Image? Audit Error Image? Audit Data Image? Number of Character Variables Number of Numeric Variables
    JDE_PATH EDM_CLAIMS_2010_APLUSPROP DATA DATA 18SEP14:13:55:26 18SEP14:13:55:26 11928602 713 35 --- CHAR NO 398363 3263397888 63 no 8192 0 11928602 24 24 0 . ON NATIVE 181F101133220033330102310133012333001D0000200301 SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 latin1 Western (ISO) no no no no no 27 8
    LibPath table_name Num_Obs Var_Name Var_Label Var_Type Var_Length Var_Format Var_InFmt _Var_Num_ nmiss distinct_cnt min mean max range sum _TOP_1 _TOP_2 _TOP_3 _TOP_4 _TOP_5 __RECORD__1 __RECORD__2 __RECORD__3 __RECORD__4 __RECORD__5 __RECORD__6 __RECORD__7 __RECORD__8 __RECORD__9 __RECORD__10
    MS_SQL_TABLE 11928602 ALLCLAIMKEY ALLCLAIMKEY N 8 20 20 1 0 11928602 1 5964301.5 11928602 11928601 7.11458E+13 1 | 1 2 | 1 3 | 1 4 | 1 5 | 1 1 2 3 4 5 6 7 8 9 10

    Regards,
    A Long Time SAS User - like 37+ years

    Charles Patridge

    • Kathryn McLawhorn on

      Charles,

      Thanks for your comments. This type of question would be best addressed by SAS Technical Support where you could submit your specific problem and log. If you have a
      comment or question about the blog post, include that here.

      The PROC FREQ documentation has a useful section on Computational Resources that discusses memory requirements.

      Thanks,
      Kathryn

  3. Running SAS9.3 I'm getting an error on the first example - this appears to be the problem:

    sumlabel='Subtotal' grandtotal_label='Grand Total';

    • Kathryn McLawhorn on

      The SUMLABEL= and GRANDTOTAL_LABEL= options are new for SAS 9.4. The code samples were written with SAS 9.4 since that is the most current release available. The other samples are generic enough to work in any of the SAS 9 releases.

  4. Useful tips!

    Do you know of a routine that will read in ANY data set, and summarise character and numeric data appropriately (without having to specify individual variable names)?

    • Kathryn McLawhorn on

      In PROC MEANS, if you do not specify a VAR statement, all numeric variables will be analyzed. In PROC FREQ, if you do not use a TABLES statement, you will get a one-way frequency table for every variable in the data set. In any procedure, you can use the _CHARACTER_ keyword to represent all character variables in a data set and the _NUMERIC_ keyword to respresent all numeric variables.

      Hope this helps,
      Kathryn

  5. The Proc freq and proc univariate is a common procedures which we use for statistics... Love this article reviewing the basic common statistics procedures 🙂

    • Christina Harvey
      Christina Harvey on

      Thank you for the feedback, Ravi. I'm always glad to hear what kind of content SAS users find useful!

      Christina
      Editor, SAS Users blog

  6. Robert Allison
    Robert Allison on

    Nice/useful post!

    I also like to use Proc Sql (which ships with Base SAS) to calculate basic statistics 🙂

    • Kathryn McLawhorn on

      Thanks. With all the procedures and DATA step, there are so many ways to accomplish the same thing!

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top