Putting the squeeze on your SAS data sets

10

Put your data on a dietI've known several people who were raised during the Great Depression, and I've observed that they are very mindful of waste. My wife's grandmother used to save plastic bags, twist ties, and relatively clean aluminum foil for potential reuse in the household -- because such materials were once scarce. The youth of today, to their credit, are also mindful of waste, but the concern is for the environment and recycling effort, not necessarily material shortages.

In a similar way, those of us who have been in the computer industry for a while can remember when it was critical to scrimp and save for every byte in memory and sector on disk, because storage was scarce in both mediums. But unlike the youth in the real world, the new recruits in the computer industry do not have the same frugality when it comes to use of system resources. Machines have fast-growing capacities for disk space and memory, and not everyone sees the incentive to optimize their use of these resources.

But SAS programmers do. I know this because this SAS note about "shrinking character variables to minimum length required" is popular and highly rated.

I decided to take the popular sample program and extend it into a custom task for SAS Enterprise Guide. The SAS program is a macro that examines each character variable in the data set, measures the length to the longest value within the data, and then adjusts the data set to "shrink" the length of each character variable to just the size that is needed to fit the data. For data sets with lots of observations and grossly overallocated storage, it can result in a significant reduction in the file size.

I also added an option to compress the data set using the COMPRESS= option, which can reduce the data set file size even further. (Because there is overhead associated with compression, it might not always make the data smaller; in fact, it could make it larger.)

Here is an example report that the task will generate as a result, so you can get an idea of the benefit.
Sample report

The task can work with SAS Enterprise Guide 4.2 or 4.3. To download the task, click here to save a ZIP file with the task and a README file. It's simple to deploy and use, even if you're not an administrator on your PC. See the README instructions for details.

If you find the task to be useful, let me know here in the comments section. And if you can suggest changes/improvements, let me know that too.

Share

About Author

Chris Hemedinger

Director, SAS User Engagement

+Chris Hemedinger is the Director of SAS User Engagement, which includes our SAS Communities and SAS User Groups. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies

10 Comments

  1. Can you point us to where all the add-ons for EG 4.2/EG 4.3 reside so that we have them all
    Thank you.

  2. Very usefull add-in, thanks for sharing!
    After running the procedure I get twice the following errors in the log, but the output table and log are created and they are apparently fine.
    Could you please explain me what do the following errors mean and if they are relevant for the reliability of the result?
    Thanks
    Beppe

    ERROR: File CON_IN.ADDITIVES_L2.DATA does not exist.
    ERROR: File CON_IN.ADDITIVES_L2.DATA does not exist.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: File SASUSER.D_CONCISECATEGORY.DATA does not exist.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: File META.LK_CONCETRATIONUNIT.DATA does not exist.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON_STG is not assigned.
    3 The SAS System

    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: File SASUSER.LK_LANGUAGE.DATA does not exist.
    ERROR: File SASUSER.LK_LANGUAGE.DATA does not exist.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: The following columns were not found in the contributing tables: FILE.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON_STG is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.
    ERROR: Libname CON09_IN is not assigned.

  3. Pingback: Custom tasks for SAS Enterprise Guide: Q&A - The SAS Dummy

  4. Is there any way i can modify this and apply it to SAS EG 4.1??
    The place i work at still has this version.
    Thanks for the article.

  5. Pingback: 11 super-useful custom tasks for SAS Enterprise Guide - The SAS Dummy

  6. Is it possible that there is a length limit? The code is producing an error after it determines the length when trying to produce the out data set. I have a variable ~5000 characters long and it's trying to use a format where the number is transformed into a truncated notation with 'E'. See below. It's still useful because I can use the value I read in the log to set my length correctly in the previous steps (this long string is the result of concatenating many strings of varying lengths together so I set it to 10,000 to make sure it holds them all).

    MPRINT(SQUEEZECHARS): format polnum $10.;
    MLOGIC(SQUEEZECHARS): %DO loop index variable I is now 2; loop will iterate again.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: Macro variable I resolves to 2
    SYMBOLGEN: Macro variable LLL2 resolves to length total_msg $ 4798 ;
    MPRINT(SQUEEZECHARS): length total_msg $ 4798 ;
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: Macro variable I resolves to 2
    SYMBOLGEN: Macro variable FFF2 resolves to format total_msg $5E3.;

    91 format total_msg $5E3.;
    ___
    31
    _
    22
    200
    MPRINT(SQUEEZECHARS): format total_msg $5E3.;
    MLOGIC(SQUEEZECHARS): %DO loop index variable I is now 3; loop will not iterate again.
    SYMBOLGEN: Macro variable DSN resolves to WORK.ISSUING_TXN_ONLY
    MPRINT(SQUEEZECHARS): set WORK.ISSUING_TXN_ONLY;
    MPRINT(SQUEEZECHARS): run;
    ERROR 31-185: Format 5E3 is unknown.

    • Chris Hemedinger
      Chris Hemedinger on

      Yes, possibly the macro has a limit here. For now, I'll leave it as an exercise for the reader... :)

Back to Top