Recoding values is one of the most common data prep tasks that folks need to do before they can analyze and report on data. In SAS, the most elegant way to handle this is by applying a SAS format. A SAS format allows you to "bucket" a bunch of raw data values into more managable categories. For example, a professor can map numerical test scores into letter grades ("A", "B", down to "F"), or even further into simply "pass" or "fail".
You can create a SAS format from scratch, writing a SAS program to include name-value mappings for recoding values, or ranges of values to map to a particular formatted label. But often you already have this information in a data set or spreadsheet, and it would be easier if you could just read in those values and make a SAS format from those recoding rules.
There is plenty of documentation and also conference papers to describe how to use the FORMAT procedure to get this done with SAS programming. The SAS Education team considers it an important topic and they regularly teach it in their beginning SAS courses. The trainers who cover SAS Enterprise Guide have been asking, on your behalf, for an easy way to do this without having to dive into SAS programming.
The SAS educators commissioned this task, and we've built it. It runs as an add-in to SAS Enterprise Guide 4.2.
The remainder of this post describes the details of the task, where you can download it, and how you can use it.
Where to download
The task is a custom task built for use in SAS Enterprise Guide 4.2. (It does not work in SAS Enterprise Guide 4.1 - sorry!) I've posted a version of this on the SAS support site, here. You can download the ZIP file and save it to your machine. The instructions for installing the task for use are in the README.txt file within the ZIP file.
Scenarios supported
For example, imagine you want to categorize US states into their broader regions, like so:
NAME | LABEL |
California | West |
New York | East |
North Carolina | East |
Oregon | West |
Or perhaps recode ranges of values into discrete labels, like this:
LABEL | LOW | HIGH |
Infant | 0 | 2 |
Toddler | 3 | 4 |
Child | 5 | 9 |
PreTeen | 10 | 12 |
Teenager | 13 | 19 |
Screen shots
This is a picture of the task when used to create a "look up" format, where discrete values are mapped to formatted labels. In this example, the data is mapping product IDs to product names.
This is a picture of the task when used to create a range-based based format, where you select lower and upper values in a range to map to a formatted label.
14 Comments
Cool. I built a similar app a few years back for DSUG. It's nuance was that it could take any datasource (OleDb, Excel, etc.) and automatically generate the formats for you. Used drag-and-drop (Infragistics datagrid).
My point was to illustrate that SAS code could be generated outside of macros.
May be a feature to add in at a later point.
Alan
I frequently have layouts of text files or lists of variable values I need formatted in an Excel spreadsheet. I find that simply using the Excel CONCATENATE function that allows me to to write out the SAS code in Excel and then drag that formula down throughout the entire workbook is a very fast way to generate the SAS code I need. I just click on the Excel column in which I generated the SAS code and paste it into SAS.
This works for much more than format statements.
Inelegant perhaps -- but accurate and very fast.
You are my EG super-hero. If I had a nickle for every time I had an EG student ask me for this, I'd be a rich man!
Now I just have to wait until my students upgrade to 4.2
Andy
Credit where credit is due: the task was developed by my colleague in New Zealand:
http://blogs.sas.com/sascom/index.php?/archives/295-Five-questions-with-David-McNamara.html
Chris
I frequently have layouts of text files or lists of variable values I need formatted in an Excel spreadsheet. I find that simply using the Excel CONCATENATE function that allows me to to write out the SAS code in Excel and then drag that formula down throughout the entire workbook is a very fast way to generate the SAS code I need. I just click on the Excel column in which I generated the SAS code and paste it into SAS.
This works for much more than format statements.
What is the point of copying and re-posting verbatim a previous post (Ben Ben-Baruch Posted May 13, 2009 at 3:19 pm)???
Hmm, apparently some comments are worth repeating :)
Pingback: Custom tasks that do the job - The SAS Dummy
Pingback: Do you mind if we dance with your DATEs (or DATETIMEs)? - The SAS Dummy
C:\Users\\AppData\Roaming\SAS\EnterpriseGuide\4.3\Custom
I have copied the file to this location and then under AddIn Manager, I have selected the file and clicked okay.
When I go back to Tools --> AddIn, it is not visible.
Your another addin "Copy Files to/from SAS" is visible.
This task is built into SAS Enterprise Guide 4.3, under the Tasks->Data menu. You don't need to add this as a custom task.
Pingback: New and improved: Importing SPSS data files in SAS Enterprise Guide - The SAS Dummy
Hello Bossman Chris,
I have a question if I have a variable (ActiveAccount) with my custom Y/N format/informat (see below), why does PROC SQL not implicitly convert the value for me? Why must I use INPUT/PUT?
WHERE
t1.ActiveAccount eq "N"
ERROR: Expression using equals (=) has components that are of different data types.
/* works fine: t1.ActiveAccount eq INPUT("N", YesNoIndicator.) */
/* works fine: PUT(t1.ActiveAccount, YesNoIndicator.) eq "N" */
James
PROC DATASETS output
NAME: ActiveAccount
TYPE: 1 (Numeric)
LENGTH: 8
VARNUM: 30
LABEL: ActiveAccount
FORMAT: YESNOINDICATOR
FORMATL: 0
FORMATD: 0
INFORMAT: YESNOINDICATOR
INFORML: 0
INFORMD: 0
PROC FORMAT LIB=WORK;
VALUE YesNoIndicator ( DEFAULT = 1 )
1 = 'Y'
0 = 'N'
. = _SAME_
OTHER = '?';
INVALUE YesNoIndicator ( UPCASE DEFAULT = 1 )
'Y' = 1
'N' = 0
. = _SAME_
OTHER = _ERROR_;
RUN;
Hi James! PROC SQL doesn't filter or group values that have been recoded with SAS formats -- it works on the raw value only. Why? Maybe because it's "SQL first, SAS proc second" in terms of expected behavior.
Other SAS procs will use the formatted value for grouping/classification (BY and CLASS statements), but I think you'll find that filtering (WHERE) still uses the raw value. Consider SAS dates, which are just numbers. You can't compare with the formatted value (unless you use the literal syntax like 'mmddyyyy'd -- which is converted to a number before comparison). User formats work the same.