Getting things in order with PROC SQL, macro and a SAS function

Order must be the most frequent cry for help in the SAS classroom. “HELP,” said my student in the classroom. “I work with messy health data. My users want to see data in this order.”

T1.col1, t1.col2, t1.col3, t1.col4, t2.col5, t1.col6 and list the remaining columns in column position from table 1.

My 2 tables, t1 and t2, have hundreds of columns. They have a common column for me to join the tables on and I really don’t want to spend manually typing each and every column name in the SELECT statement. That’s way too much time on manual work, time that could be spent doing more productive work. Can you help make this process dynamic?

How was I going to help this customer?  I knew I could rely on PROC SQL dictionary tables to grab metadata, in this case the column names without hardcoding them. I also knew that the column names that dictionary tables return could be stored in a macro. Making it easy for me to write a SELECT.

Given that I only had the problem but no data to work with, I turned to the SASHELP library (PDF). It’s a great repository of over 200 sample datasets. I went ahead and submitted this code to query the dictionary tables called columns to find out which datasets I could use to answer the customer question. I wanted datasets that had the same name and type to perform the join.

*Step 1 : locating columns that have the same name & type to perform the equijoin;
proc sql;
select name, memname, type, length
from dictionary.columns
where libname ='SASHELP'
group by name
having count(name) > 1 AND COUNT(DISTINCT TYPE) =1
order by name;
quit;

date columns

I started writing code to store the column names from the 2 tables in the desired order in macro variables:
For example:

*Step 2 : storing column names in macro variables in the order specified. 
Macro variable first4 stores the first 4 column names from the STOCKS dataset
Macro variable fifth stores the 5th column name from the SNACKS dataset
Macro variable after5 stores column names after 5th position from the STOCKS dataset;
proc sql noprint;
select name into : first4 separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="STOCKS" and
varnum < 5;
 
select name into : fifth separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="SNACKS" and
varnum = 5;
 
select name into : after5 separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="STOCKS" and
varnum > 5
;
Quit;
 
*just confirming the order of variables;
%put &first4, &fifth, &after5;
Stock, Date, Open, High, Date, Close, Volume, AdjClose
 
*Step 3 : joining the 2 tables and selecting columns in the order defined in the macro variables. But why doesn't this work? Why the Error?
 
proc sql noprint;
select &first4, &fifth, &after5 from
sashelp.SNACKS as t1, sashelp.STOCKS AS t2
where t1.DATE=t2.DATE;
ERROR: Ambiguous reference, column Date is in more than one table.
ERROR: Ambiguous reference, column Date is in more than one table.

Well of course! The Date column appears twice, once from each table. It’s that beautifully flexible thing that SQL does. But how do you overcome this?

This is when I turned to our instructor world-wide group. Did you know that? When you sign up for SAS training you get not one, but hundreds of SAS instructors chipping in to help solve your business problems.

I got this amazing suggestion from one of the instructors.

cats('P.',name) into : first4 separated by ', '
cats('Z.',name) into : after5 separated by ', '

And decided to re-write my original code.

*Step 2 : storing column names in macro variables in the order specified. 
Macro variable first4 stores the first 4 column names from the STOCKS dataset
Macro variable fifth stores the 5th column name from the SNACKS dataset
Macro variable after5 stores column names after 5th position from the STOCKS dataset;
proc sql noprint;
select cats('t1.',name) into : first4 separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="STOCKS" and
varnum < 5;
 
select cats('t2.',name) into : fifth separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="SNACKS" and
varnum = 5;
 
select cats('t1.',name) into : after5 separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="STOCKS" and
varnum > 5
;
*just confirming the order of variables;
%put &first4, &fifth, &after5;
 
*Step 3 : joining the 2 tables and selecting columns in the order defined in the macro variables. And this works! YAY!
reset print;
select &first4, &fifth, &after5 from
sashelp.STOCKS as t1, sashelp.SNACKS AS t2
where t1.DATE=t2.DATE;

And it worked like a charm!

Customer has the columns in the order in which they want it. Bonus: No hard coding of column names.

Don’t you just love the marriage of PROC SQL, the macro language and the SAS CATS function to deliver just what was required.

There is so much more you can do with PROC SQL dictionary tables. Here’s a handy reference paper (PDF) that you might like to check out.

What other ideas do you have to solve this problem? Write to me. I’d love to hear from you.

tags: learn sas, proc sql, SAS macro

One Comment

  1. jaap karman
    Posted July 12, 2014 at 6:19 am | Permalink

    You solved it in the classic program coding way.
    Using DI and SAS metadata that coding has been made graphical (click your mouse).
    Using the query builder (eg EGuide) is another one that the approach is graphical
    There are a lot of SAS persons telling "they should not code, but use our menu-s"

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>