When you’re making a report, how do you choose which procedure to use? The answer is – it depends.
It depends on:
- whether you are doing an ad hoc analysis or creating a final report that many people will see
- whether you will run statistical tests with your data or if you just want to see it
- your level of comfort and if you already have working code
Most of my work involves creating final reports that will be distributed to many users. I am a PROC REPORT specialist, so 90% of the time, I use PROC REPORT to create these reports.
Here is why:
PROC REPORT versus PROC MEANS/PROC SUMMARY
PROC MEANS is a wonderful, powerful procedure in its own right. It is an excellent tool for creating data sets of statistics that get fed into DATA steps or other procedures. I use PROC MEANS extensively, but not as my final reporting procedure.
PROC MEANS can’t:
- give me percentages
- do traffic lighting
- add, subtract, multiply, or divide two variables
- create new variables
PROC MEANS can give you an overall total and a total for CLASS and BY variables, but so can PROC REPORT.
PROC REPORT versus PROC FREQ
PROC FREQ is really good at creating a small data set of counts that I can use later. It is a very quick way of calculating percentages, which might be easier than trying to calculate them in PROC REPORT. However, starting with PROC REPORT is better for me, just in case I am asked to make changes to the final output and the changes require something that PROC FREQ can’t do.
PROC FREQ does not:
- support the STYLE option, so I can’t apply traffic lighting
- calculate simple statistics, like median or mean
- add informative text
PROC REPORT versus PROC PRINT
I like PROC PRINT because it provides a simple, clean way of displaying data. But, in my way of thinking, PROC REPORT can do everything PROC PRINT can do and more.
PROC PRINT can only provide the N statistic and a sum of the numeric variables, either for BY groups or the overall data set.
It cannot:
- calculate other statistics
- add, subtract, multiply, or divide two variables
- create new variables
- apply traffic lighting colors based on the value of another valuable
- apply attributes at the row level (only the column level)
PROC REPORT versus PROC TABULATE
For me, PROC TABULATE is the closest competitor to PROC REPORT. Both are powerful and produce excellent output. However, I think PROC REPORT has a slight advantage over PROC TABULATE because of its DATA step capabilities. PROC REPORT versus other Base SAS Procedures #SAStip Click To Tweet
PROC TABULATE does:
- calculate row, group, and column percentages using a statistical keyword; PROC REPORT does not
- calculate totals for individual CLASS variables, as well as the combination of CLASS variables with very simple code
- stack variables in a column
- apply style changes such as traffic lighting, but not based on the value of another variable
PROC TABULATE does not:
- add, subtract, multiply, or divide two variables (Note: you can divide in some cases)
- have the ability to create new variables
- insert text in the middle of the table
- give much control over the headers
I have found that PROC REPORT offers me more flexibility, so I instinctively start with it. You might have a different opinion of which procedure is best to use. That is ok! If you have a PROC PRINT step that does everything you need then, by all means, keep using the PROC PRINT step (or FREQ/MEANS/TABULATE). Don’t change your code for no reason!
But, if the procedure you are currently using does not give you everything you need, then change it. It might take less time to change the procedure than to write code to restructure the data to fit the current procedure.
Practice using other procedures to expand your programming skills. For example, if you have a PROC MEANS step you use a lot, see if you can create the same output using PROC TABULATE or PROC REPORT. When you become familiar with other procedures, you can switch between them. You will also internalize which procedure will be the best choice for certain tasks.
For more information, visit Which Base procedure is best for simple statistics.
For anyone attending this year’s SAS Global Forum, I will be in the Quad throughout the event. Feel free to find me, I would love to talk with you about this topic!
Also, you can check out my new book, The SAS® Programmer’s PROC REPORT Handbook: Basic to Advanced Reporting Techniques, which is also available at the bookstore in the Quad.
7 Comments
This blog gives very important info Thanks for sharing. Very well-written information.
Thank you for your post sharing with us. Really it's a very helpful post. Hope everybody will be benefited from your post.
Thanks, this is excellent information to have! I have found that choosing which method/proc to use is the most challenging part of SAS programming.
Nico,
I appreciate you reading the blog and taking time to post a comment. I did not mention the Report Writing Interface, which uses the DATA _NULL_ step, but not because it is not powerful! I simply concentrated on procedures. The RWI is awesome and can be used to create table and output structures that the procedures couldn't dream of. However, in my opinion, the RWI has a steeper learning curve than the procedures. That shouldn't let it stop any programmers from learning it though because it is such a handy tool to have in your back pocket. For more information see the documentation page or this SAS Global Forum paper.
Regards,
Jane
Thk you Jane for this great post;
I wanted to see an other SAS base procedure useful for reporting and not mentioned in this article: the data_null_ step (report writing interface), this would have been a great added value to have a comparison with Proc Report .. and not necessarily in favor of the proc report ;-)
Best regards,
Nico
Great post! I often have to make the same decision, and/or advise others about which procedure to use.
I generally agree with everything you say, and I'd like to add some of my observations:
I work with a lot of novice SAS users; based on this, I tend to not suggest PROC REPORT because I find the learning curve to get "easy" results from it (in other words, results that I could easily get from one of the other three) is a lot higher for PROC REPORT.
Most of my environments are interactive and exploratory, so I don't tend to use or recommend PROC PRINT. Usually, just looking at the data on the screen is sufficient.
PROC TABULATE I strongly recommend for cases where someone wants to create a true tabulation output. I agree that PROC REPORT does a good job of this as well, but with the "dimension builder" tool in Enterprise Guide, it's a LOT easier to use TABULATE.
PROC FREQ is great IF you need the specialized statistics that it creates; otherwise, I shy away from it.
PROC MEANS is the "heavy lifter" in my toolkit. If I need a result set that's going to number in the millions or billions (obviously this implies producing a result dataset only, not a "print-friendly" image), PROC MEANS is the only tool that won't fail.
And, circling back to PROC REPORT, if the requirements fall into the large number of things that PROC REPORT does easily, and that are very difficult to do with the other tools, I enthusiastically endorse it (although with a few warnings about learning curve).
In terms of performance, I find that PROC FREQ fails first, then PROC TABULATE, and I've never seen PROC MEANS fail due to volume.
Thanks again for the great post!
Tom
Tom,
Thank you for the comment, I appreciate all the feedback!
Earlier in my career I used PROC MEANS more than any other procedure for the exact reasons you mentioned. It was my heavy lifter that gave me all of the statistics I needed and could add categories using the COMPLETETYPES option. As a matter of fact, PROC MEANS is used heavily in Chapter 5 of my book where I describe some of the report types that require multiple steps.
PROC REPORT does have a learning curve but I hope that even novice users can see how powerful it is and try it out. For some types of reports PROC REPORT can't be beat.
Thanks,
Jane Eslinger