Last week I attended a meeting of the Toronto Area SAS Society. (Okay, I didn't just attend; I was a presenter as well.) This user group meeting contained a feature that I had never seen before: "Solutions to the Posed Problem".
Weeks before the meeting, an "open problem" was posted to the TASS community group site. The problem included 134 records of time series data and an objective: create a data set that contains just the most recent record (according to the date and time values) for each unique identifier in the set.
Here's a snapshot of a data sample, with the desired output records highlighted.
It's a simple little puzzle similar to those that SAS programmers must deal with every day. But the instructions for this particular problem were to "use an interface product" to solve it. The implication was "use SAS Enterprise Guide", but the entrants got creative with that requirement. Some used SAS Enterprise Guide to import the data (which was provided as an Excel file), but then wrote a program to do the rest. Some simply opened a program node and wrote code to do the whole shebang.
Art Tabachneck, one of the TASS leaders and a frequent SAS-L contributor, tried to use just "point-and-click" to solve the problem without resorting to a program node, but he didn't manage it. He knew that the Query Builder was the best chance for an SQL solution, and that he would need a HAVING clause to make it work. He was on the right track! But in the limited time that he had to devote to the problem, he couldn't bend the Query Builder to his will. In the end, Art wound up working within the program node, just like most other participants.
When I returned home, I made it my mission to devise a "pure" task-based solution. And here it is:
Step 1: Import the Excel file
That's easy. Even as Art observed, you simply select File->Import Data, point to the XLS file on your local machine, and then click Finish when the wizard appears. The default behavior produces a perfect SAS data set.
Step 2: Start with the query
With the output from the import step, select the Query Builder task. We need the all of the columns represented, so drag all columns over to the Select Data tab.
Step 3: Create a computed column for a combined date-time value
All of the successful solutions did this step somehow. I borrowed from the most elegant of these and created an "Advanced expression"-based column named "Test_DateTime" as:
Step 4: Create a MAX aggregation version of that computed column
Create another column based on Test_DateTime column, and this time apply the MAX summarization. Name the new column MAX_of_Test_DateTime.
This is the trick! As soon as you have an aggregated measure as part of the query, the Filter tab will divide into two sections, revealing a "Filter the summarized data" section. That's the piece that allows you to generate a "Having" structure.
Step 5: Create a Having filter based on the summarized column
The filter is effectively:
CALCULATED MAX_of_Test_DateTime = CALCULATED Test_DateTime
The Query Builder generates more verbose SQL than the above, but you get the idea. Here's a screen shot that shows the Having filter in place.
Step 6: Change the Grouping to include just SAMPLE_ID
When you have a summarized measure in the query the Query Builder provides control over the grouping behavior. By default, the query is "grouped by" all non-summarized columns. But for this query, we want to group only by each value of the Sample_ID column. On the Select Data tab, uncheck "Automatically select groups". Then click the Edit Groups button and exclude all except for t1.Sample_ID.
Here's the Select Data tab with the proper grouping in place:
When you run the query, you should get the desired result. Here's the SQL that was generated:
PROC SQL; CREATE TABLE WORK.LATEST_TIME_PER_ID AS SELECT DISTINCT t1.Sample_id, t1.Test_Date, t1.test_time, t1.test_value FROM WORK.OPEN_QUESTIONS t1 GROUP BY t1.Sample_id HAVING (MAX(DHMS(t1.Test_Date, 0, 0, t1.test_time))) = (DHMS(t1.Test_Date, 0, 0, t1.test_time)); QUIT;
I want to thank the members of TASS for this little puzzle. I actually learned some new things by reviewing the various solutions that were entered -- and I think that's the point of the exercise. If you're curious, here's a link to the complete "report" of submitted solutions.