Creating a SAS format from a data set

14

Recoding values is one of the most common data prep tasks that folks need to do before they can analyze and report on data. In SAS, the most elegant way to handle this is by applying a SAS format. A SAS format allows you to "bucket" a bunch of raw data values into more managable categories. For example, a professor can map numerical test scores into letter grades ("A", "B", down to "F"), or even further into simply "pass" or "fail".

You can create a SAS format from scratch, writing a SAS program to include name-value mappings for recoding values, or ranges of values to map to a particular formatted label. But often you already have this information in a data set or spreadsheet, and it would be easier if you could just read in those values and make a SAS format from those recoding rules.

There is plenty of documentation and also conference papers to describe how to use the FORMAT procedure to get this done with SAS programming. The SAS Education team considers it an important topic and they regularly teach it in their beginning SAS courses. The trainers who cover SAS Enterprise Guide have been asking, on your behalf, for an easy way to do this without having to dive into SAS programming.

The SAS educators commissioned this task, and we've built it. It runs as an add-in to SAS Enterprise Guide 4.2.

The remainder of this post describes the details of the task, where you can download it, and how you can use it.

Where to download

The task is a custom task built for use in SAS Enterprise Guide 4.2. (It does not work in SAS Enterprise Guide 4.1 - sorry!) I've posted a version of this on the SAS support site, here. You can download the ZIP file and save it to your machine. The instructions for installing the task for use are in the README.txt file within the ZIP file.

Scenarios supported

For example, imagine you want to categorize US states into their broader regions, like so:

NAME LABEL
California West
New York East
North Carolina East
Oregon West

Or perhaps recode ranges of values into discrete labels, like this:

LABEL LOW HIGH
Infant 0 2
Toddler 3 4
Child 5 9
PreTeen 10 12
Teenager 13 19

Screen shots

This is a picture of the task when used to create a "look up" format, where discrete values are mapped to formatted labels. In this example, the data is mapping product IDs to product names.


This is a picture of the task when used to create a range-based based format, where you select lower and upper values in a range to map to a formatted label.

Share

About Author

Chris Hemedinger

Director, SAS User Engagement

+Chris Hemedinger is the Director of SAS User Engagement, which includes our SAS Communities and SAS User Groups. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies

14 Comments

  1. Alan Churchill on

    Cool. I built a similar app a few years back for DSUG. It's nuance was that it could take any datasource (OleDb, Excel, etc.) and automatically generate the formats for you. Used drag-and-drop (Infragistics datagrid).

    My point was to illustrate that SAS code could be generated outside of macros.

    May be a feature to add in at a later point.

    Alan

  2. Ben Ben-Baruch on

    I frequently have layouts of text files or lists of variable values I need formatted in an Excel spreadsheet. I find that simply using the Excel CONCATENATE function that allows me to to write out the SAS code in Excel and then drag that formula down throughout the entire workbook is a very fast way to generate the SAS code I need. I just click on the Excel column in which I generated the SAS code and paste it into SAS.

    This works for much more than format statements.

    Inelegant perhaps -- but accurate and very fast.

  3. Andy.Ravenna on

    You are my EG super-hero. If I had a nickle for every time I had an EG student ask me for this, I'd be a rich man!

    Now I just have to wait until my students upgrade to 4.2

    Andy

  4. I frequently have layouts of text files or lists of variable values I need formatted in an Excel spreadsheet. I find that simply using the Excel CONCATENATE function that allows me to to write out the SAS code in Excel and then drag that formula down throughout the entire workbook is a very fast way to generate the SAS code I need. I just click on the Excel column in which I generated the SAS code and paste it into SAS.

    This works for much more than format statements.

    • What is the point of copying and re-posting verbatim a previous post (Ben Ben-Baruch Posted May 13, 2009 at 3:19 pm)???

  5. Pingback: Custom tasks that do the job - The SAS Dummy

  6. Pingback: Do you mind if we dance with your DATEs (or DATETIMEs)? - The SAS Dummy

  7. C:\Users\\AppData\Roaming\SAS\EnterpriseGuide\4.3\Custom

    I have copied the file to this location and then under AddIn Manager, I have selected the file and clicked okay.

    When I go back to Tools --> AddIn, it is not visible.

    Your another addin "Copy Files to/from SAS" is visible.

    • Chris Hemedinger
      Chris Hemedinger on

      This task is built into SAS Enterprise Guide 4.3, under the Tasks->Data menu. You don't need to add this as a custom task.

  8. Pingback: New and improved: Importing SPSS data files in SAS Enterprise Guide - The SAS Dummy

  9. James C Tetterton on

    Hello Bossman Chris,

    I have a question if I have a variable (ActiveAccount) with my custom Y/N format/informat (see below), why does PROC SQL not implicitly convert the value for me? Why must I use INPUT/PUT?

    WHERE
    t1.ActiveAccount eq "N"
    ERROR: Expression using equals (=) has components that are of different data types.
    /* works fine: t1.ActiveAccount eq INPUT("N", YesNoIndicator.) */
    /* works fine: PUT(t1.ActiveAccount, YesNoIndicator.) eq "N" */

    James

    PROC DATASETS output
    NAME: ActiveAccount
    TYPE: 1 (Numeric)
    LENGTH: 8
    VARNUM: 30
    LABEL: ActiveAccount
    FORMAT: YESNOINDICATOR
    FORMATL: 0
    FORMATD: 0
    INFORMAT: YESNOINDICATOR
    INFORML: 0
    INFORMD: 0

    PROC FORMAT LIB=WORK;
    VALUE YesNoIndicator ( DEFAULT = 1 )
    1 = 'Y'
    0 = 'N'
    . = _SAME_
    OTHER = '?';
    INVALUE YesNoIndicator ( UPCASE DEFAULT = 1 )
    'Y' = 1
    'N' = 0
    . = _SAME_
    OTHER = _ERROR_;
    RUN;

    • Chris Hemedinger
      Chris Hemedinger on

      Hi James! PROC SQL doesn't filter or group values that have been recoded with SAS formats -- it works on the raw value only. Why? Maybe because it's "SQL first, SAS proc second" in terms of expected behavior.

      Other SAS procs will use the formatted value for grouping/classification (BY and CLASS statements), but I think you'll find that filtering (WHERE) still uses the raw value. Consider SAS dates, which are just numbers. You can't compare with the formatted value (unless you use the literal syntax like 'mmddyyyy'd -- which is converted to a number before comparison). User formats work the same.

Back to Top