Top 10 SAS coding efficiencies

Tried, tested and true -- I’m sure you already practice some, if not all, of these efficient techniques to save resources.  I recently shared these 10 techniques with the Wisconsin Illinois SAS users group in Milwaukee. The conference ran smoothly under the incredibly able guidance of Dr. LeRoy Bessler. I’ll blog about my conference experience in a separate post.

In this blog, I’ve tried to summarize the 10 efficiencies for a quick read. For more in depth reading, here’s a link to the presentation with code examples and valuable resource statistics to compare and contrast.

CPU Saving Techniques - Make room in your brain for the important stuff

1.        CPU Saving – Boiling down or reducing your data
The positioning of the subsetting IF can substantially reduce/increase CPU time.

2.        CPU Saving – Conditional processing
Much like there are many ways to get results with SAS, there are several techniques for conditional processing. You can use IF statements or IF-THEN/ELSE or SELECT-WHEN. Find out the difference in resource statistics for technique.

3.        CPU Saving – Do not reduce the length of numeric variables
Talk of numbers has everyone breaking out into war. I’m sure you know that reducing numeric length can reduce precision. But did you know that this also compromises CPU. More CPU time is needed to expand the reduced number to its full 8 bytes in the PDV.

4.       I/O Saving – Reduce multiple passes through data

5.       I/O Saving – Modify variable attributes
If all you need to do is change variable attributes, then consider using PROC DATASETS over the data step. For attributes other than variable index DATASETS saves you valuable I/O. It only processes the header portion of the SAS data set, while the data step processes the entire dataset.

6.       I/O Saving – Process only necessary observations

7.       I/O Saving – Process only necessary variables

8.       Space Saving – Store as character data

9.       Memory Saving – Use the BY statement instead of the CLASS
BY-group processing holds only one BY group in memory at a time. The CLASS statement accumulates aggregates for all CLASS groups simultaneously in memory.

10.    Programmer Time Saving

i. Making the log your active window
ii. Shortcut keys
iii. Using macros to highlight your recent log

To wrap up, I’m sure you already employ several of these practices. I do hope you find this post useful and a helpful reminder for the next time you’re considering resource efficiency in your SAS program. I look forward to hearing from you. What efficiencies do you practice?

photo credit: anna // attribution: creative commons

16 Comments

  1. Catherine Truxillo Cat Truxillo
    Posted July 8, 2013 at 10:47 am | Permalink

    Great post, Charu! I enjoyed seeing you at WIILSU and I agree that LeRoy runs a first-class conference.

    One of my favorite efficiencies is to use by-subject processing in mixed models analyses with random intercept terms. It is a poor man's sparse matrix technique. It can save lots of time! Maybe later this week I'll post a blog on how it's done.

    I always enjoy reading your nuggets of knowledge!

    • Posted July 11, 2013 at 10:08 am | Permalink

      thanks Cat! It was great catching up in person. Looking forward to hearing about your favourite efficiency post..I enjoyed your sharing about how to embed data step type construction in a STAT proc from your WIILSU presentation.

  2. Doc Muhlbaier
    Posted July 8, 2013 at 11:39 am | Permalink

    I question tip 3 in some circumstances. It seems that if the disk speed is slow or the transmission system is slow and the CPU is fast, then the overall thru-put with the compressed variables could work faster than expanded ones. I wonder that even the use of OS level disk compression might make a bigger difference to make it quicker to be smaller. Unfortunately, I haven't seen different scenarios systematically tested.

    • Posted July 11, 2013 at 10:43 am | Permalink

      Thanks Doc for taking the time to comment and share your experience. Yes, efficiency questions can usually be answered with a 'It depends' answer. Benchmarking under various scenarios can result in different results for individual installations unless best practices are employed. That's an entire concept in the SAS programming 3 class..

  3. Ravi
    Posted July 8, 2013 at 11:30 pm | Permalink

    This is a nice presentaion for a science background guy like me..It is really helpful to know how effiecient and fast we can run the sas program. Thank you charu :)

    • Posted July 11, 2013 at 10:45 am | Permalink

      Thanks Ravi, glad your scientific mind found this explanation helpful :) You'll find SAS is pretty rational as you step deeper & deeper. thanks for taking the time to comment..

  4. Richard Erickson
    Posted July 17, 2013 at 3:59 pm | Permalink

    These are EXCELLENT!

    It's not often that I see any sort of posting on the subject of how to make more efficient use of computing resources (or programmer's time!), something that becomes especially important when working with large datasets.

    Many thanks, Charu, for passing these along and also for making your slides available.

  5. Vijay
    Posted July 30, 2013 at 1:36 am | Permalink

    Thanks for a lot for sharing your valuable views to enhance our capabilities.

    • Posted July 30, 2013 at 9:39 am | Permalink

      thanks for your taking the time to comment Vijay.

  6. Marcel
    Posted July 30, 2013 at 6:57 am | Permalink

    I think using the by statement for memory savings is simply a trade-off for CPU because the data set needs to be sorted first. Otherwise very good tips, especially #3. I was not aware of the fact that reducing the size of numerical variables compromised CPU.

    • Posted July 30, 2013 at 9:40 am | Permalink

      You're right Marcel. There's always a fine give and take balance that happens when you consider efficiencies. Glad you found tip #3 helpful.

  7. Posted July 30, 2013 at 8:44 am | Permalink

    Glad you found these helpful Richard. Yes its often a challenge to present these efficiencies as they are dependent on individual environments to a large degree. Yet there are some valuable general practices that I wanted to share in this post. Do you have any that you use time and again?

  8. Ram
    Posted December 1, 2013 at 11:26 pm | Permalink

    Hi Charu,

    I am a beginner started working in SAS and i would like to have some tips and some materials for SAS from you.can you please help me in this regard.

    Thanks in advance.

    Regards,
    Ram

  9. Anders Sköllermo
    Posted February 11, 2014 at 3:17 pm | Permalink

    Hi!
    If you know that your programs is entirely correct
    AND
    you know that the program is well documented
    AND
    the program shall be used several times more
    AND
    the program takes far too much resources to run
    THEN
    You should start to apply the tips above - in a careful way, considering all the by effects.

    / Br Anders (SAS user since 1981. Written papers with SAS Institute about NLS and MERGE. Worked with VERY big amounts of data).

  10. Anders Sköllermo
    Posted February 12, 2014 at 5:47 pm | Permalink

    A comment to "3. CPU Saving – Do not reduce the length of numeric variables...i'm sure you know that reducing numeric length can reduce precision. But did you know that this also compromises CPU. More CPU time is needed to expand the reduced number to its full 8 bytes in the PDV."

    Well, my opinion:
    If you shorten variables (e.g. age from 8 bytes to 3 bytes) you may accidentally destroy part of the information in your project. Before you do this shortening, you must be sure that you do not at all change the information, that is stored in these variables.
    The price (for you, the project, etc) of destroying the information can be Very high, since it can be very time consuming to recreate the information. On the other hand - the price when using "extra" storage or computer resources is rather small. After all, extra CPU and disc can easily be bought.

    I do NOT think that it takes any measurable CPU time to shorten the numeric variables, so they can be written out to disc. The real numbers are stored as exponent and mantissa. What happens is that the last bits in the mantissa are just ignored. This is as internal shift operation in the PDV.
    It should be Very fast, compared i.e. with reading a real number from character form, into the exponent + mantissa form. I have never heard this before.
    / Br Anders

    • Posted June 2, 2014 at 8:56 pm | Permalink

      Hi Anders,
      thanks for your comment. I also replied to your Linkedin question.

      Yes, you're right. the truncation time may not be much initially, but when the compressed numeric data hits the PDV, data has to be expanded to its full form, Because numerics are always 8 bytes in the PDC...therefore CPU time is compromised..
      and of course you are right.. numeric precision being lost is the biggest problem of all.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>