Customizing output from PROC MEANS

17

ProblemSolversCustomizing the output data set created using the OUTPUT statement

When you request statistics on the PROC MEANS statement, the default printed output creates a nice table with the analysis variable names in the left-most column and the statistics forming the additional columns.  Even if you create an output data set with no statistics listed on the OUTPUT statement, the default statistics, N, MIN, MAX, MEAN, and STD, are output in a nice table format as values of the _STAT_ variable and the analysis variables form the other columns.  Once you start requesting specific statistics on the OUTPUT statement, the format of the data set changes.  A wide output data set is the result with only one observation and a variable for each analysis variable-statistic combination.  If you use a BY or CLASS statement, you will get multiple observations for each unique value of the BY or CLASS variable combinations, but the analysis variable structure is the same.  The structure of this data set can be hard to view.  It would be nice if the output data set could maintain the format of the printed output or the default output data set.  You can do that with a few simple code modifications using PROC TRANSPOSE and DATA Step logic.

If you want the output data set to look like the default printed output, use the steps below.

/* create a sample data set */
data test;
input x1 x2 x3;
cards;
1 2 3
4 5 6
7 8 9
;
run;
 
/* Request statistics on the OUTPUT statement. */
/* Use the AUTONAME option to create variable names in the form: */
/* variable_statistic */
 
proc means noprint;
var _numeric_;
output out=new(drop=_type_ _freq_) mean= median= std= q1= q3= / autoname;
run;
 
/* Transpose the data set so that each statistic becomes an observation. */
proc transpose data=new out=out;
run;
 
/* Create new variables that contain the statistic name and the */
/* original variable name. */
 
data out1;
set out;
varname=scan(_name_,1,'_');
stat=scan(_name_,2,'_');
drop _name_;
run;
 
proc sort data=out1;
by varname;
run;
 
/* Transpose the data to get one observation for each  */
/* original variable name and one variable for each    */
/* statistic.  This mimics the default printed output. */
 
proc transpose data=out1 out=out2(drop=_name_);
by varname;
id stat;
var col1;
run;
 
proc print data=out2;
title 'Looks like default printed output';
run;

 

Looks like default printed output

varname Mean Median StdDev Q1 Q3
x1 4 4 3 1 7
x2 5 5 3 2 8
x3 6 6 3 3 9

If you want the output data set to look like the default output data set, use the steps below in place of the last three steps above.

proc sort data=out1;
by stat;
run;
 
/* Transpose the data to get one observation for each */
/* statistic name and one variable for each           */
/* original variable.  This mimics the default        */
/* output data set.                                   */
 
proc transpose data=out1 out=out3(drop=_name_);
by stat;
id varname;
var col1;
run;
 
proc print data=out3;
title 'Looks like default output data set';
run;

Looks like default output data set

stat x1 x2 x3
Mean 4 5 6
Median 4 5 6
Q1 1 2 3
Q3 7 8 9
StdDev 3 3 3

 

If your variable names contain underscores (_) in the name, replace the DATA OUT1 step above with the following DATA OUT1 step:

/* Depending on your variable name, you may need to increase */
/* the length $11 assignment below.                          */
 
data out1;
set out;
length varname $11;
lenvar=length(_name_);
lenstat=index(reverse(trim(_name_)),'_')-1;
varname=substr(_name_,1,(lenvar-lenstat)-1);
stat=scan(_name_,-1,'_');
drop _name_ lenvar lenstat;
run;

Customizing the output using PROC TEMPLATE

For the default printed output from PROC MEANS, in the past we said that the only option for formatting the statistics was with the MAXDEC= option which controls the number of decimal places.  We are no longer limited to this level of formatting since we can use PROC TEMPLATE.  We can format the statistics in the printed output by modifying the Base.Summary table template.

/* create sample data */
data test;
input amount ;
cards;
1000
12345
8500
;
run;
 
/* Modify the base.summary table template using */
/* EDIT statements. */
ods path(prepend) work.templat(update);
proc template;
 edit base.summary;
  edit mean;
   format=dollar12.2;
  end;
  edit sum;
   format=dollar12.;
  end;
 end;
run;
 
proc means data=test n mean sum;
var amount;
run;
 
/* restore default template */
proc template;
 delete base.summary;
run;

Custom format for printed output

 

The MEANS Procedure
Analysis Variable : amount
N Mean Sum
3 $7,281.67 $21,845

Customizing the output data using STACKODSOUTPUT

In SAS 9.3, a new option was added that only affects the output data set created using the ODS OUTPUT statement.  This option is STACKODSOUTPUT (alias: STACKODS) and it is placed on the PROC MEANS statement.  This option allows the data set to resemble the default printed output.

/* Use the STACKODSOUTPUT option on the PROC MEANS statement   */
/* and use the ODS OUTPUT statement to create output data set. */
 
ods output summary=with_stackods(drop=_control_);
proc means data=sashelp.class stackodsoutput sum mean std nway;
 class sex;
 var height weight;
run;
 
proc print data=with_stackods noobs;
 title 'Output data set with STACKODSOUTPUT';
run;

Output data set with STACKODSOUTPUT

Sex NObs Variable Sum Mean StdDev
F 9 Height 545.300000 60.588889 5.018328
F 9 Weight 811.000000 90.111111 19.383914
M 10 Height 639.100000 63.910000 4.937937
M 10 Weight 1089.500000 108.950000 22.727186

If you are not satisfied with the output from PROC MEANS, either the default printed output or the output data set, you are just a few steps away from reshaping it into the format you desire.

WANT MORE GREAT INSIGHTS MONTHLY? | SUBSCRIBE TO THE SAS TECH REPORT
Share

About Author

Kathryn McLawhorn

Principal Technical Support Analyst

Kathryn McLawhorn has worked in Technical Support at SAS since 1997. She started as a consultant in the Base Procedures and ODS group, and she is currently a consultant in the CAS/Open Source Languages and SAS Programming group. She primarily supports report writing, ODS, and Base summary procedures. Kathryn has her SAS Certification in both Base Programming for SAS 9 and Advanced Programming for SAS 9.

17 Comments

  1. Hi Kathryn,
    Thanks for the post! I'm a beginner and still not quite understand.. 🙁
    I have a dataset with 15 variables in 8 quarters (2 years) and 19 locations. I want to list the means by each location and the means need to be arranged by quarter so that we can tell the change pattern across time. Could you please write a sample code for this purpose. Thank you so much!!
    Sophie

  2. Hello Kathryn McLawhorn,
    Greetings...!
    Hope you are doing great!

    Your post is much Informative, I just would like to know can we able to import Excel (.XLSX format) file to SAS 9.1?. I tried many codes but aren't working. So, I have changed excel file format XLSX to .CSV and then it's working. But I want .XLSX file as it is in SAS 9.1 without changing the Format. Is there any way to Import .XLSX format to SAS 9.1?

    Thanks in advance.
    Regards,
    Vijay

    • Kathryn McLawhorn on

      That's one of the great things about SAS - there are always multiple ways to approach a task.

      Kathryn

  3. Kathryn McLawhorn on

    The code runs correctly for me. If you are talking about the replacement DATA step if your variable names contain _, then that comes after the PROC TRANSPOSE in the prior code. If you are getting errors, I would suggest opening a tracking entry with Technical Support and include your log.
    Kathryn

  4. The example references _name_ in a data step before it actually exists (it is created by proc transpose). Could you please fix your example so that it works without errors? or, could a moderator delete or fix it?

  5. Kathryn,

    Your post is outstanding and easy to understand. I love it and I am really, but really brand new to SAS and I am learning it by myself with the help of various posts and excellent explanations like yours. However, I wonder if you could guide in the following situation.

    Data: 16 subjects divided in 2 groups ('exper' and 'control') with the same 2 var ('rest' and 'work').

    Proc needed: ttest for the 2 var ('rest' and 'work') in the 'exper' group only.

    I used Proc MEANS to obtained the 5-statistics for each group, and have obtained this table:

    -------------------------------------------------------------------------------------------------------
    Group----variable----label----5-statistics values
    control----rest---------rest-----
    ------------work--------work----
    exper-----rest---------rest-----
    ------------work--------work----

    I know that I can conduct the ttest manually by working out the statistics and obtaining the standard error of the mean, etc., etc.

    However, I would like to create a subset 'name' control' and use the proc ttest for this subset only and obtain the ttest for the variables 'rest' and 'work'.

    How can I do that?

    Thank you for your help.

    Mike

  6. Kathryn McLawhorn on

    Preeti,

    It is not clear to me what you are trying to do. When I look at the output data set created by the ODS OUTPUT statement, I see the Label variable; and I have been able to rename it in a subsequent DATA step. If you have a particular problem, I would suggest opening a tracking entry with Technical Support with your code and the particular issue you are having. I would be happy to help you address it there.
    Kathryn

  7. For some reason, when I use STACKODS, I am not able to use the variable for labels ("Label"). I want to be able to pull in a number of STACKODS datasets and have the variable names and labels show but I can't call the outputted variable for label despite removing variable labels using PROC DATASETS so that I could read the original variable names, and checking the DICTIONARY TABLES. I can see that the variable name is still "Label", it shows up when I do descriptives using _CHAR_, but I cannot directly call it. Do you have any thoughts as to why?

  8. Any reason why I'm getting an error on the yaxis ranges clause??

    31 proc sgplot data=new;
    32 vbar type / response=value;
    33 yaxis ranges=(0-20 200-230);
    ______
    22
    202
    ERROR 22-322: Syntax error, expecting one of the following: ;, DISCRETEORDER, DISPLAY, FITPOLICY,
    GRID, INTEGER, INTERVAL, LABEL, LABELATTRS, LOGBASE, LOGSTYLE, MAX, MIN, MINOR,
    NOTIMESPLIT, OFFSETMAX, OFFSETMIN, REFTICKS, REVERSE, THRESHOLDMAX, THRESHOLDMIN,
    TICKVALUEFORMAT, TYPE, VALUEATTRS, VALUES, VALUESHINT.
    ERROR 202-322: The option or parameter is not recognized and will be ignored.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top