One of my earlier blogs on handling percent (%) values in SAS was very popular (it's been viewed over 34,000 times!), so I thought I'd write a similar blog on handling time (mm:ss.ss) values in SAS ...
This past weekend I was in a dragonboat race (that's me in the red hat)...
After the race, I decided to create some graphs with all the times from the race day, to see how our team compared to all the other teams. But I found that some of the average times I was calculating with SAS didn't always match the average times listed on the race website. I double- and triple-checked my calculations (even calculating a few by hand), and finally contacted the people in charge of the race. They sent me a copy of the spreadsheet they had used to calculate the averages, and I found that they were treating the digits of the time values as decimal digits rather than actual time values (the tricky thing is that most of the averages looked correct ... but that was just by luck!)
I thought this an interesting enough problem that I should create an example, to hopefully steer data analysts in the right direction when working with time data ... and avoid the data-integrity problems that result in (mis)handling time digits as decimal numbers.
For this demonstration, I have 4 teams, each running 3 races/heats. If a race time is 1 minute 21.85 seconds, then you might be tempted to (incorrectly) enter the data into a spreadsheet as the decimal digits 0121.85. Here's a table of four teams' data, stored in this (incorrect) way:
And if I plot these decimal values, and their average (in red), I get the following graph. It looks pretty nice, and the averages seem reasonable ... at first glance.
But upon closer examination, it seems odd that one of Team 4's races is so much slower than their other two races. And if you look at Team 4's average - what is 1:71.70 (how can you have a '71' in the minutes place?!?) And if you look at the horizontal/time axis, you'll notice that it shows a range for '1:60' to '2:00' ... which are impossible times.
Now that you've seen the problem with handling the time values the wrong way, let's do it the right way! First, I input the time values into a SAS dataset, as proper time values (behind the scenes, SAS is storing the values as the number of seconds, and you can use any of SAS' time formats to have the values print out in whatever time format you want, such as mm:ss.ss):
data my_data_time; informat race1 race2 race3 stimer8.; input team $ 1-6 race1 race2 race3; average=mean(race1, race2, race3); datalines; Team 1 01:21.85 01:23.57 01:27.63 Team 2 01:28.31 01:29.66 01:29.88 Team 3 01:40.06 01:39.25 01:45.16 Team 4 01:55.72 01:53.38 02:06.00 ; run;
And now the table of data looks like this (notice the average time for Team 4 is 1:58.37, rather than 1:71:70).
And when I plot the data, there is no bogus range of times, and Team 4's values are much closer together. (And, most importantly, all the values are correct!)
Here is a link to the full SAS code, if you'd like to download it and experiment with it.
(Note: I've updated this example to use the stimer8. informat, rather than the time11.2 informat, based on KSharp's suggestion - either of the informats would get the job done, but stimer seems a bit more convenient to use in this particular situation.)