Do you mind if we dance with your DATEs (or DATETIMEs)?

11

I've been working with date-time data on a recent project, and I've come across a few SAS programs that have "opportunity for improvement" when it comes time to create reports.

(Or maybe I haven't, and I contrived this entire blog post so that I could reference one of my favorite quotes from the movie Animal House.)

Suppose we have a data set that contains a collection of measurements over a one-year period, each with a time stamp that shows when the measure was captured. Here is a DATA step program that creates a sample data set, generating just over 300,000 rows of random measurements (with a bit of variation built in for visual interest later):

/* Create a year's worth of timestamp data */
data measurements (keep=timestamp measure);
 now = datetime();
 /* go back one year - minus number of seconds in a year */
 /* remember, SAS datetime value is number of seconds */
 /* since 01JAN1960 */
 yearago = now-31556926;
 length timestamp 8;
 format timestamp datetime20.;
 do val = yearago to now;
   if (ranuni(0) < 0.01) then
    do;
     /* add a few more records on certain days of week */
     if (weekday(datepart(val)) in (1,2) and ranuni(0)<.5) then do;
      timestamp = val;
      measure = 5 + log(ranuni(0));
      output;
     end;
     timestamp = val;
     measure = 5 - log(ranuni(0));
     output;
   end;
 end;
run;

The "classic" method for reporting on this data is to use a line plot to show the time series, as shown by this program and plot:

ods graphics / width=800 height=300;
proc sgplot data=measurements;
 series x=timestamp y=measure ;
 xaxis minor label="Time of measure";
 yaxis min=-8 max=20 label="Measurement";
run;


But sometimes we want to report further on the characteristics of the data, to answer questions such as "how many measurements were captured for each month? for each quarter? or for each day of the week?"

Creating categories the hard way

Sometimes while creating these reports, programmers decide to prepare the data further by creating new categorical variables (such as a variable for the day of the week, or for the month and year). For example, I've seen snippets such as this:

  /* numeric day of week */
  day = weekday(datepart(timestamp)); 
  /* value such as "2011_3" for March 2011 */
  year_month = cats(year(datepart(timestamp)),'_',month(datepart(timestamp)));

But there are problems with this approach, especially with the "year_month" example. First, it requires another pass through the data and additional storage space for the output. And in the "year_month" case, the result is a character variable, which no longer retains the ability for sorting the output in chronological order. (Yes, with a more clever construction we could make sure that there is a leading zero for single-digit months, and that would help...but there is an easier way.)

Use SAS formats instead of creating new variables

The person who wrote the above code forgot about the power of SAS formats. Using SAS formats, you can "recast" your variable for analysis to create a category from values that are otherwise continuous. You don't have to modify the original data at all, and it doesn't require you to make an additional copy.

For example, to create summary statistics for the measurements classified into month "buckets", you can apply the DTMONYY5. format within a PROC MEANS step. The DTMONYYw. format takes a date-time value and converts it to a month-year appearance. SAS procedures such as MEANS, FREQ, SGPLOT, and most others will use the formatted value for classification. You can see how this program produces a report of statistics classified by month-year:

title "Measures by month";
proc means data=measurements min max mean n;
 /* tells the procedure to bucket values into monthly bins */
 format timestamp dtmonyy5.;
 label timestamp = "Month/Year";
 class timestamp;
 var measure;
run;


With a minor change to use DTYYQC4. instead, we can produce a report of stats by quarter:

title "Measures by quarter";
proc means data=measurements min max mean n;
 /* tells the procedure to bucket values into quarterly bins */
 format timestamp dtyyqc4.;
 label timestamp = "Quarter";
 class timestamp;
 var measure;
run;


This isn't limited to tabular output. Applying a format can also help to simplify plots as well. Here is an example of an SGPLOT step that creates a frequency report of the number of measures for each day of the week, thanks to the DTWKDATX8. format:

title "Number of Measures by Day of Week";
ods graphics / width=400 height=300;
proc sgplot data=measurements;
 /* tells the procedure to bucket values into day-of-week bins */
 format timestamp dtwkdatx8.;
 label timestamp = "Day";
 vbar timestamp;
run;

The next time that you find yourself creating a new variable to serve as a category, check whether there might be a SAS format that you can apply instead. Even if there isn't, it might still be more efficient to create a new format rather than calculate a new variable, especially if it's a category that you need to use for multiple data sources.

Share

About Author

Chris Hemedinger

Senior Manager, SAS Online Communities

+Chris Hemedinger is the manager of SAS Online Communities. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies.  He also hosts the SAS Tech Talk webcasts each year from SAS Global Forum, connecting viewers with smart people from SAS R&D and the impressive work that they do.

Related Posts

11 Comments

  1. Rick Wicklin

    1) When I was learning SAS, I would sometimes confuse the date (a numeric variable) with it's formatted value (a character string). One place where this comes up is in computing statistics for date/time variables. For example, at first the following PROC MEANS output looks surprising:

    proc means data=measurements min median max;
     var timestamp;
     output out=stats min=min median=median max=max;
     run;

    I don't know how to get PROC MEANS to produce formatted output, do you? I always write to a data set and then use the format on PROC PRINT:

    proc print data=stats;
    var min median max;
    run;

    2) I included some references on SAS dates and times (SUGI papers, books,...) at the end of my data/time post from a few months ago. They are filled with lots of useful tidbits.

    • Chris Hemedinger
      Chris Hemedinger on

      Good comments Rick. I don't have a good answer, but I do have more information to throw at you and perhaps keep you busy for a while.
      You can also get data set output via ODS OUTPUT, with a statement like this:

      ods output summary=odsStats;
      proc means data=measurements min median max;
       var timestamp;
       run;

      You can control the layout of the ODS OUTPUT data a bit more in SAS 9.3 using the STACKODSOUTPUT option.

      ods output summary=odsStats;
      proc means data=measurements min median max STACKODSOUTPUT;
       var timestamp;
       output out=stats min=min median=median max=max  ;
       run;

      If you look at default data set layout when you using the OUTPUT statement in PROC MEANS, all measures are stacked in a single column. That may explain why the formats of the source variables are not inherited in the output data when it comes to printing them. I often use the technique you cite: save the output data and then PROC PRINT it.

  2. Totally agree with your comments on the use of date formats in reporting. In answer to Rick's question, I don't believe there is a way of returning the formatted value in Proc Means. You can, however, use Proc Tabulate to do the same calculations and show the formatted values.

    proc tabulate data=measurements;
    var timestamp;
    table timestamp*f=datetime20.*(min median max);
    run;

  3. Thanks for the valuable information.
    Looks like you left your comments the same when changing the size of the bins....

  4. Pingback: Just-in-time data prep in SAS Enterprise Guide - The SAS Dummy

  5. Allison George on

    I am not a blogger, but I have been using SAS for about six years. Let me know if there is a better forum for this question.

    Can you help me optimize these variables?

    I just recently took the Macro Language class and I am very familiar with SQL. I am converting others code and trying to optomize the macro variables used within the code.

    We do our reporting of data based on the previous month on data with effective dating. If I use SQL, the dates are in one format, if data step, they must be in the datetime, if used as part of a file name, they are in yet another format. How best can I use today's date to create all the variables I need for my process without rewriting all the code?

    The following code is used throughout the process. We are looking at last month as a report month and looking a one year's worth of data.

    /* ---------------------------------------------------------------------------------- */
    /*     January 2013                                                                   */
    /* ---------------------------------------------------------------------------------- */
    /* MC_RE_Passthrough */
    %LET BEGMonth = 01;
    %LET YEAR = 2013;
    
    LIBNAME ONE 'C:\TBD\&amp;year.&amp;BEGMonth';
    LIBNAME TWO 'C:\QR7\&amp;year.&amp;BEGMonth';
    
    /* Data Clean-up RE and Data Clean-up QR */ 
    %LET CURMONTH='01JAN2013'd; /* Report Month. */
    /* Web Report - RE and QR */
    %LET RMonth=January; /* Report Month in words */
    %let BACK6 = -6;
    
    /* All Graphs RE and All Graphs QR */
    %LET FUTR6MON='01FEB2013'd; /* Six months in the future. Remains at Feb 2013 until 6 months have passed.*/
    %LET FRST6MON='01AUG2012'd; /* First overall report month of six. Remains at Aug until Feb 2013 */
    %LET L12MONTH='01DEC2012'd; /* Ending of 12 months; current redetermination due month. */
    %LET B12MONTH='01JAN2012'd; /* Beginning of 12 months; first reporting month. */
    
    /* QR7_Submit_Status */
    %LET EFF_FDay           ='01Dec2012:0:0:0'dt;   /* Effective Month = Reports DUE in Month of */
    %LET EFF_FDaySQL        ='12/01/2012';          /* Report Month */
    
    /* Reporting Month */
    %LET FDaySQL            ='01/01/2013'; /* First Day of the Month in which you're reporting */
    %LET FDayFSQL           ='02/01/2013'; /* First Day FOLLOWING the Month in which you're reporting */
    %LET FDay               ='01Jan2013:0:0:0'dt;
    %LET FDayF              ='01Feb2013:0:0:0'dt;
    /*    Beginning Day of the Past 12 month period. This period of time is &gt; Begin Month and &lt; FDaySQL. */
    %LET BeginMonth         ='01Jan2012:0:0:0'dt;
    

  6. How to convert dd-mm-yyyy example , 26-11-2015 into 26112015 or the binary date format ??
    Thanks in advance and apologies for the naive question..

    • Chris Hemedinger
      Chris Hemedinger on

      Srikanth,

      Use the ANYDTDTEw. informat to interpret this date. The date's format is DAY-MONTH-YEAR, and you can force that interpretation with the DATESTYLE option. Example:

      options datestyle=dmy;
      data _null_;
       dt = input("26-11-2015",anydtdte10.);
       put dt date9.;
      run;

      Output will be: 26NOV2015

      Since date values can be sometimes ambiguous depending on the style (08-11-2015 can be August 11 in the USA, or November 8 in Europe), the DATESTYLE option is important. You can set the DATESTYLE option to LOCALE, which will use your system locale settings to drive the behavior. Use DATESTYLE=DMY when you want to ensure the DAY-MONTH-YEAR interpretation.

Leave A Reply

Back to Top