Tried, tested and true -- I’m sure you already practice some, if not all, of these efficient techniques to save resources. I recently shared these 10 techniques with the Wisconsin Illinois SAS users group in Milwaukee. The conference ran smoothly under the incredibly able guidance of Dr. LeRoy Bessler. I’ll blog about my conference experience in a separate post.
In this blog, I’ve tried to summarize the 10 efficiencies for a quick read. For more in depth reading, check out the Top 10 best SAS programming practices PDF presentation.
CPU Saving Techniques - Make room in your brain for the important stuff
1. CPU Saving – Boiling down or reducing your data
The positioning of the subsetting IF can substantially reduce/increase CPU time.
2. CPU Saving – Conditional processing
Much like there are many ways to get results with SAS, there are several techniques for conditional processing. You can use IF statements or IF-THEN/ELSE or SELECT-WHEN. Find out the difference in resource statistics for technique.
3. CPU Saving – Do not reduce the length of numeric variables
Talk of numbers has everyone breaking out into war. I’m sure you know that reducing numeric length can reduce precision. But did you know that this also compromises CPU. More CPU time is needed to expand the reduced number to its full 8 bytes in the PDV.
4. I/O Saving – Reduce multiple passes through data
5. I/O Saving – Modify variable attributes
If all you need to do is change variable attributes, then consider using PROC DATASETS over the data step. For attributes other than variable index DATASETS saves you valuable I/O. It only processes the header portion of the SAS data set, while the data step processes the entire dataset.
6. I/O Saving – Process only necessary observations
7. I/O Saving – Process only necessary variables
8. Space Saving – Store as character data
9. Memory Saving – Use the BY statement instead of the CLASS
BY-group processing holds only one BY group in memory at a time. The CLASS statement accumulates aggregates for all CLASS groups simultaneously in memory.
10. Programmer Time Saving
i. Making the log your active window
ii. Shortcut keys
iii. Using macros to highlight your recent log
To wrap up, I’m sure you already employ several of these practices. I do hope you find this post useful and a helpful reminder for the next time you’re considering resource efficiency in your SAS program. I look forward to hearing from you. What efficiencies do you practice?
photo credit: anna // attribution: creative commons
18 Comments
Pingback: Debugging the difference between WHERE and IF in SAS - The SAS Dummy
A comment to "3. CPU Saving – Do not reduce the length of numeric variables...i'm sure you know that reducing numeric length can reduce precision. But did you know that this also compromises CPU. More CPU time is needed to expand the reduced number to its full 8 bytes in the PDV."
Well, my opinion:
If you shorten variables (e.g. age from 8 bytes to 3 bytes) you may accidentally destroy part of the information in your project. Before you do this shortening, you must be sure that you do not at all change the information, that is stored in these variables.
The price (for you, the project, etc) of destroying the information can be Very high, since it can be very time consuming to recreate the information. On the other hand - the price when using "extra" storage or computer resources is rather small. After all, extra CPU and disc can easily be bought.
I do NOT think that it takes any measurable CPU time to shorten the numeric variables, so they can be written out to disc. The real numbers are stored as exponent and mantissa. What happens is that the last bits in the mantissa are just ignored. This is as internal shift operation in the PDV.
It should be Very fast, compared i.e. with reading a real number from character form, into the exponent + mantissa form. I have never heard this before.
/ Br Anders
Hi Anders,
thanks for your comment. I also replied to your Linkedin question.
Yes, you're right. the truncation time may not be much initially, but when the compressed numeric data hits the PDV, data has to be expanded to its full form, Because numerics are always 8 bytes in the PDC...therefore CPU time is compromised..
and of course you are right.. numeric precision being lost is the biggest problem of all.
A late reply to Charu:
1) I am only discussing truncation - and I strongly suggest NOT to truncate.
2) Charu is discussing compression - of the bytes that are stored on disc. That is partly an entirely different question. She is however correct in the sense, that if you compress, then cpu time is used for the expansion.
I am reconsidering this text, since I refer to this discussion in my teaching in SAS for students in Mathematical Statistics at Stockholm University.
Hi!
If you know that your programs is entirely correct
AND
you know that the program is well documented
AND
the program shall be used several times more
AND
the program takes far too much resources to run
THEN
You should start to apply the tips above - in a careful way, considering all the by effects.
/ Br Anders (SAS user since 1981. Written papers with SAS Institute about NLS and MERGE. Worked with VERY big amounts of data).
Hi Charu,
I am a beginner started working in SAS and i would like to have some tips and some materials for SAS from you.can you please help me in this regard.
Thanks in advance.
Regards,
Ram
Glad you found these helpful Richard. Yes its often a challenge to present these efficiencies as they are dependent on individual environments to a large degree. Yet there are some valuable general practices that I wanted to share in this post. Do you have any that you use time and again?
I think using the by statement for memory savings is simply a trade-off for CPU because the data set needs to be sorted first. Otherwise very good tips, especially #3. I was not aware of the fact that reducing the size of numerical variables compromised CPU.
You're right Marcel. There's always a fine give and take balance that happens when you consider efficiencies. Glad you found tip #3 helpful.
Thanks for a lot for sharing your valuable views to enhance our capabilities.
thanks for your taking the time to comment Vijay.
These are EXCELLENT!
It's not often that I see any sort of posting on the subject of how to make more efficient use of computing resources (or programmer's time!), something that becomes especially important when working with large datasets.
Many thanks, Charu, for passing these along and also for making your slides available.
This is a nice presentaion for a science background guy like me..It is really helpful to know how effiecient and fast we can run the sas program. Thank you charu :)
Thanks Ravi, glad your scientific mind found this explanation helpful :) You'll find SAS is pretty rational as you step deeper & deeper. thanks for taking the time to comment..
I question tip 3 in some circumstances. It seems that if the disk speed is slow or the transmission system is slow and the CPU is fast, then the overall thru-put with the compressed variables could work faster than expanded ones. I wonder that even the use of OS level disk compression might make a bigger difference to make it quicker to be smaller. Unfortunately, I haven't seen different scenarios systematically tested.
Thanks Doc for taking the time to comment and share your experience. Yes, efficiency questions can usually be answered with a 'It depends' answer. Benchmarking under various scenarios can result in different results for individual installations unless best practices are employed. That's an entire concept in the SAS programming 3 class..
Great post, Charu! I enjoyed seeing you at WIILSU and I agree that LeRoy runs a first-class conference.
One of my favorite efficiencies is to use by-subject processing in mixed models analyses with random intercept terms. It is a poor man's sparse matrix technique. It can save lots of time! Maybe later this week I'll post a blog on how it's done.
I always enjoy reading your nuggets of knowledge!
thanks Cat! It was great catching up in person. Looking forward to hearing about your favourite efficiency post..I enjoyed your sharing about how to embed data step type construction in a STAT proc from your WIILSU presentation.