Ever had to work with data that had crazy column names, custom formats you didn’t have access to, or text columns sized way too big for the actual data they contained? Annoying, isn’t it?
Well, you’re not alone and, as it turns out, it’s not that hard to fix. Let’s take a peek at renaming, reformatting, relabeling and resizing columns in your data using PROC SQL and PROC DATASETS. PROC SQL could handle most of these tasks, but you can’t rename a column in SQL without creating a new copy of the table. PROC DATASETS can do it all, but the syntax can be trickier. When we resize a column, we're going to have to re-write the data anyway, so I'll use PROC SQL for resizing, and PROC DATASETS for everything else. For example, we have this existing table:
Year | Make | Model | MSRP | Invoice | Engine Size (L) |
2004 | Acura | MDX | $36,945 | $33,337 | 3.5 |
2004 | Audi | A4 1.8T 4dr | $25,940 | $23,508 | 1.8 |
2004 | Buick | Rainier | $37,895 | $34,357 | 4.2 |
With this descriptor portion:
Column Name | Column Label | Column Type | Column Length | Column Format |
Year | num | 8 | ||
Make | char | 13 | ||
Model | char | 40 | ||
MSRP | num | 8 | DOLLAR8. | |
Invoice | num | 8 | DOLLAR8. | |
EngineSize | Engine Size (L) | num | 8 |
I want to modify the table to use European formats for the numeric values, to rename the EngineSize column as Size leaving its label unchanged, and to add descriptive labels for MSRP and Invoice.
The general syntax for this in PROC DATASETS is:
PROC DATASETS LIBRARY=;
MODIFY sas-dataset <(options)>;
FORMAT variable-1 <format-1> <variable-2 <format-2> ...>;
LABEL variable-1=<'label-1' | ' '> <variable-2=<'label-2' | ' '> ...>;
RENAME variable-1=new-name-1 <variable-2=new-name-2 ...>;
After running this code:
proc datasets library=work nolist nodetails; modify cars; format MSRP eurox8. Invoice eurox8. EngineSize commax5.1; label MSRP="Sticker Price" Invoice="Wholesale Price" ; rename EngineSize=Size; run; quit; |
The data now looks like this:
Year | Make | Model | Sticker Price | Wholesale Price | Engine Size (L) |
2004 | Acura | MDX | €36.945 | €33.337 | 3,5 |
2004 | Audi | A4 1.8T 4dr | €25.940 | €23.508 | 1,8 |
2004 | Buick | Rainier | €37.895 | €34.357 | 4,2 |
With this descriptor portion:
Column Name | Column Label | Column Type | Column Length | Column Format |
Year | num | 8 | ||
Make | char | 13 | ||
Model | char | 40 | ||
MSRP | Sticker Price | num | 8 | EUROX8. |
Invoice | Wholesale Price | num | 8 | EUROX8. |
Size | Engine Size (L) | num | 8 | COMMAX5.1 |
In the report, the longest value in the Make column is 5 characters long but the table metadata shows a column width of 13 characters. I can shorten the Make column to 5 characters without truncating the values, making my table size smaller on disk and in memory.
PROC SQL with an ALTER TABLE statement and a MODIFY clause will change a table’s descriptor portion (metadata) for one or more columns. Changing the length of a column will automatically rewrite the data portion of the table, too. The syntax looks like this:
ALTER TABLE table-name MODIFY column-definition-1 <, column-definition-2, ..., column-definition-n>;
After running this code:
proc sql; alter table work.cars modify Make char(5) ; quit; |
The data is still correct:
Year | Make | Model | Sticker Price | Wholesale Price | Engine Size (L) |
2004 | Acura | MDX | €36.945 | €33.337 | 3,5 |
2004 | Audi | A4 1.8T 4dr | €25.940 | €23.508 | 1,8 |
2004 | Buick | Rainier | €37.895 | €34.357 | 4,2 |
And the column size is now smaller:
Column Name | Column Label | Column Type | Column Length | Column Format |
Year | num | 8 | ||
Make | char | 5 | ||
Model | char | 40 | ||
MSRP | Sticker Price | num | 8 | EUROX8. |
Invoice | Wholesale Price | num | 8 | EUROX8. |
Size | Engine Size (L) | num | 8 | COMMAX5.1 |
Now, unless you want to write this type of code by hand every time you need it, you’ll want macros to do the work for you. And macros we’ve got! Check out my SAS tutorial on YouTube titled “Resize, Rename and Reformat Data with SAS Macro” for more details on how this works. The video is also embedded below. Links below the embedded video lead to the data and programs used in the video and to my personal macro trove on GitHub, including a macro that quickly strips labels, formats and informats from a data set.
Until next time, may the SAS be with you!
Mark
DOWNLOAD NOW | VIDEO DATA PACKAGE ON GITHUB GET MACROS NOW | MY MACRO STASH ON GITHUB
6 Comments
how to resize table size in gb
The techniques shown here for reducing the size of character variables will usually provide a nice reduction in file size on disk. If your primary concern is file size, these papers provide some excellent resources:
Programming Tricks For Reducing Storage And Work Space by Curtis A. Smith (https://support.sas.com/resources/papers/proceedings/proceedings/sugi27/p023-27.pdf) |
How to Reduce the Disk Space Required by a SAS® Data Set by Selvaratnam Sridharma (https://www.lexjansen.com/nesug/nesug06/io/io18.pdf)
I love seeing Jedi tricks and tips pop up - makes me feel like I'm still just down the hall from my buddy Mark. Thanks for sharing!
Ah, Tim - good to "see" you, too! Those were some pretty awesome days...
Mark
Very informative SAS trick Mark @SASJedi !
Thank you, sir!