Turn your data set into a DATA step program

Sometimes I need to "disassemble" a SAS data set into a DATA step program. It's kind of like creating a "freeze-dried" version of the data that you can carry around and use anywhere, re-hydrating it in the SAS session where you next need it.

Some example uses for this:

  • Build standalone program examples that include the data "inline", so that you don't have to supply a SAS7BDAT file and a SAS program file together to make it work.
  • Create a portable version of the data that you can transport from one SAS environment and use easily on another environment, without having to worry about platform encoding differences. (Of course, you can also use PROC MIGRATE for this job.)
  • Capture the structure of the data, with the columns, formats, labels, and lengths -- even if you don't need all of the records.

There are various programming approaches that you can use to accomplish this in SAS, but to make it easier I built a custom task called Data set->DATA step. (The task runs in SAS Enterprise Guide 4.3 and later.) This task reads the contents of the active data set, generates a DATA step program with the appropriate ATTRIB statements for each column, and then includes all of the data records as DATALINES.

This screen shows the task user interface, which gives you a preview of the DATA step setup. If you want just the data structure portion of the program, you can click Copy to clipboard and thus capture just that piece.


When you click Run, the task builds a program that contains the DATA step along with all of the DATALINES records. It doesn't actually run the program, since that would only recreate a copy of the data set that you already have, but it does produce a short summary log for output. And the contents of the Code tab in the results will contain the complete DATA step program.


(If you're wondering how I'm "splitting" the code view in this picture, check out this blog post about an editor trick.)

You can download the task from the list of custom task examples that accompany my recent book. The "Data set->DATA step" example is featured in "Chapter 12: Abracadabra: Turn Your Data into a SAS Program". You are not required to own the book in order to use the examples (but I promise that you can learn a lot from the book).

If you're interested in how this task does its work, here are the techniques used in the source code and described within the chapter:

  • Use ADO.NET to connect to and read SAS data. (ADO.NET is the method for working with data sources in .NET programs.)
  • Use SAS DICTIONARY tables to discover data set and column attributes for the source data. Then, use that information to influence task behavior.
  • Use the SASTextEditorCtl control from the SAS.Tasks.Toolkit library to preview the SAS program in the SAS color-coded Program Editor.
  • Encapsulate the meat of the task -- the business logic that reads data and creates a SAS program -- into a separate .NET class. This makes the task more maintainable and enables you to reuse the business logic in other contexts.
  • Use a special task API interface, named ISASTaskExecution, to implement the work of the task. This enables SAS Enterprise Guide to delegate all task processing to your task so that it can perform work that can't be done easily in a SAS program.

More resources:

tags: data sets, data step, SAS custom tasks

11 Comments

  1. Frank Poppe
    Posted January 21, 2013 at 4:35 pm | Permalink

    It's a rather besides the main point of this blog, but I wanted to point at one small caveat. A short while ago I happened to stumble upon a slight flaw (IMHO) in the working of the DSD option. This surfaced when you have an observation that has a single double quote somewhere within a text field.
    The text fields are put quoted in the DATALINES section, and the DSD option removes them again upon reading. Say you have a double quote within the name Barbara, making dataline 24
    "Barb"ara","F",13,65.3,90

    You will find that the data set then will have the value

    "Barb"ar

    , so with the enclosing quotes. For the other observations the enclosing quotes are dropped.
    What made it even more puzzling when I saw this at first was that the last last

    a

    and the closing quote disappeared. After a while I realized that is logical in itself because the variable is 8 characters long.

    • Chris Hemedinger Chris Hemedinger
      Posted January 21, 2013 at 4:39 pm | Permalink

      Good points Frank - and I cannot promise that this task correctly encodes every permutation of quoted strings and special characters. I hope that it proves to be a useful utility for some, but I wouldn't sanction it as a production "data archive" tool...

  2. Jared Prins
    Posted January 30, 2013 at 2:57 pm | Permalink

    To *maybe* add to the list of solutions, if you have SAS/IML, I wonder if R's "dput" command would work?

    • Chris Hemedinger Chris Hemedinger
      Posted January 30, 2013 at 3:05 pm | Permalink

      Using SAS/IML to integrate with R to export SAS data as CSV? That's what we call "going around your elbow to get to your thumb." :)

      There are a number of native BASE SAS methods to achieve this. I like ODS CSV, but simple DATA step and PUT (to FILE) can do the trick as well, with plenty of control over the quoting rules. Here's a discussion forum thread on the topic.

  3. Gaël Hammer
    Posted February 13, 2013 at 2:35 pm | Permalink

    I like your idea! In my field (epidemiology), it is good practice to document and store study data for 10 years, so that results may be independently reproduced, maybe with a different software. This is why I personnaly prefer exporting the data to plain text files as well as a dump of PROC FORMAT CNTLOUT=. I do not yet generate an import program as well, like you do.

    Since study data are subject to change until the database is finally frozen, I append a timestamp to these files. Here is my (admittedly old) macro to store a timestamp into the macro variable DateTime:

    %MACRO DateTime;
    DATA _NULL_;
    CALL SYMPUT("DateTime", TRIM(LEFT(PUT(TODAY(), YYMMDD10.)))||"_"||TRANSLATE(TRIM(LEFT(PUT(TIME(), TIME8.))), "-", ":"));
    RUN;
    %MEND DateTime;

  4. Tom Kari
    Posted February 20, 2013 at 4:29 pm | Permalink

    Another small improvement...will the task as written result in small numeric discrepancies, at the least significant bit level?

    What about encoding and decoding the number to the RB standard, which I believe will result in an EXACTLY IDENTICAL set of numbers (assuming the same processor architecture)?

    For each number, encode it with:

    EncodedNumericVar = put(put(NumericVar, rb8.), $base64x12.);

    and then decode it with:

    NumericVar = input(input(trim(EncodedNumericVar), $base64x12.), rb8.);

    Possibly as an option?

    Tom

    • Chris Hemedinger Chris Hemedinger
      Posted February 20, 2013 at 4:58 pm | Permalink

      Tom,
      The task, as written, doesn't consider all of that. Its main purpose is to be an example custom task, and a convenient method for archiving simple data sets into a text-based program. I'd use it for teaching purposes, or sharing example data with colleagues or tech support. I wouldn't use it as a way to archive enterprise data with complete integrity.

      Your suggestion could be an improvement for integrity (have you been reading up on SAS numeric precision?), although it would detract from the readability of the program and DATALINES records.

      • Tom Kari
        Posted February 21, 2013 at 12:48 pm | Permalink

        I think you're underselling this task, Chris!

        People have asked me how to do this quite frequently, enough so that I think with a little more hardening it could be a real asset in the SAS toolbox. Is there anything left except numeric precision, handling odd character strings, and data representation and coding? (My view is that passwords, indexes, and compression aren't logically a part of this process).

        Just a thought...
        Tom

        P.S. Yes, I've been navigating the numeric precision swamps since System/370. Did you know the old mainframe OS had a sixteen byte floating point data type?

        • Chris Hemedinger Chris Hemedinger
          Posted February 21, 2013 at 1:12 pm | Permalink

          Yes, I think it needs a bit more "hardening". In addition to what you said, it would need to become more efficient for very large data, as it reads all records into memory and builds a character string (the program) all at once. It would need to change to page the records out as it reads them so that we don't hit a memory exception.

          And I think some of these things would need to be optional, as they are tradeoffs against code readability.

  5. Mark MacMullen
    Posted March 22, 2013 at 6:09 pm | Permalink

    I looked over your book and I'm considering it's purchase. However, I have one small question. I have never programmed in .NET and was wondering if that is a major obstacle to actually getting anything out of your book in terms of building my own custom tasks? I'm a SAS System Administrator and this would come in real handy to overcome perceived shortcomings of EG from our inhouse clients.

    • Chris Hemedinger Chris Hemedinger
      Posted March 23, 2013 at 8:58 am | Permalink

      Mark,

      You don't need to be an expert in .NET. It's simple to build a simple task with very little .NET programming. As you might expect, the more complex the task, the more .NET development you'll have to do, but you can learn that in increments as your needs dictate. And it's a valuable skill outside of the SAS world as well.

      If you have any experience at all with app development (in Java, C or C++, or even SAS/AF) you will find that those skills transfer over to the .NET world. Different syntax, but the concepts are the same. It's an object-oriented development framework, and the Microsoft Visual Studio tools are very good.

One Trackback

  1. [...] Try one of Chris Hemedinger’s EG plug-ins, “Data Set to DATA Step”. Chris also wrote a post for it, Turn your data set into a DATA step program. [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>