Getting things in order with PROC SQL, macro and a SAS function

5

Order must be the most frequent cry for help in the SAS classroom. “HELP,” said my student in the classroom. “I work with messy health data. My users want to see data in this order.”

T1.col1, t1.col2, t1.col3, t1.col4, t2.col5, t1.col6 and list the remaining columns in column position from table 1.

My 2 tables, t1 and t2, have hundreds of columns. They have a common column for me to join the tables on and I really don’t want to spend manually typing each and every column name in the SELECT statement. That’s way too much time on manual work, time that could be spent doing more productive work. Can you help make this process dynamic?

How was I going to help this customer?  I knew I could rely on PROC SQL dictionary tables to grab metadata, in this case the column names without hardcoding them. I also knew that the column names that dictionary tables return could be stored in a macro. Making it easy for me to write a SELECT.

Given that I only had the problem but no data to work with, I turned to the SASHELP library (PDF). It’s a great repository of over 200 sample datasets. I went ahead and submitted this code to query the dictionary tables called columns to find out which datasets I could use to answer the customer question. I wanted datasets that had the same name and type to perform the join.

*Step 1 : locating columns that have the same name & type to perform the equijoin;
proc sql;
select name, memname, type, length
from dictionary.columns
where libname ='SASHELP'
group by name
having count(name) > 1 AND COUNT(DISTINCT TYPE) =1
order by name;
quit;

date columns

I started writing code to store the column names from the 2 tables in the desired order in macro variables:
For example:

*Step 2 : storing column names in macro variables in the order specified. 
Macro variable first4 stores the first 4 column names from the STOCKS dataset
Macro variable fifth stores the 5th column name from the SNACKS dataset
Macro variable after5 stores column names after 5th position from the STOCKS dataset;
proc sql noprint;
select name into : first4 separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="STOCKS" and
varnum < 5;
 
select name into : fifth separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="SNACKS" and
varnum = 5;
 
select name into : after5 separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="STOCKS" and
varnum > 5
;
Quit;
 
*just confirming the order of variables;
%put &first4, &fifth, &after5;
Stock, Date, Open, High, Date, Close, Volume, AdjClose
 
*Step 3 : joining the 2 tables and selecting columns in the order defined in the macro variables. But why doesn't this work? Why the Error?
 
proc sql noprint;
select &first4, &fifth, &after5 from
sashelp.SNACKS as t1, sashelp.STOCKS AS t2
where t1.DATE=t2.DATE;
ERROR: Ambiguous reference, column Date is in more than one table.
ERROR: Ambiguous reference, column Date is in more than one table.

Well of course! The Date column appears twice, once from each table. It’s that beautifully flexible thing that SQL does. But how do you overcome this?

This is when I turned to our instructor world-wide group. Did you know that? When you sign up for SAS training you get not one, but hundreds of SAS instructors chipping in to help solve your business problems.

I got this amazing suggestion from one of the instructors.

cats('P.',name) into : first4 separated by ', '
cats('Z.',name) into : after5 separated by ', '

And decided to re-write my original code.

*Step 2 : storing column names in macro variables in the order specified. 
Macro variable first4 stores the first 4 column names from the STOCKS dataset
Macro variable fifth stores the 5th column name from the SNACKS dataset
Macro variable after5 stores column names after 5th position from the STOCKS dataset;
proc sql noprint;
select cats('t1.',name) into : first4 separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="STOCKS" and
varnum < 5;
 
select cats('t2.',name) into : fifth separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="SNACKS" and
varnum = 5;
 
select cats('t1.',name) into : after5 separated by ', '
from dictionary.columns
where libname="SASHELP" and
memname="STOCKS" and
varnum > 5
;
*just confirming the order of variables;
%put &first4, &fifth, &after5;
 
*Step 3 : joining the 2 tables and selecting columns in the order defined in the macro variables. And this works! YAY!
reset print;
select &first4, &fifth, &after5 from
sashelp.STOCKS as t1, sashelp.SNACKS AS t2
where t1.DATE=t2.DATE;

And it worked like a charm!

Customer has the columns in the order in which they want it. Bonus: No hard coding of column names.

Don’t you just love the marriage of PROC SQL, the macro language and the SAS CATS function to deliver just what was required.

There is so much more you can do with PROC SQL dictionary tables. Here’s a handy reference paper (PDF) that you might like to check out.

What other ideas do you have to solve this problem? Write to me. I’d love to hear from you.

Share

About Author

Charu Shankar

Technical Training Specialist

Charu Shankar has been a Technical Training Specialist with SAS since 2007. She started as a programmer, and has taught computer languages, business and English Language skills. At SAS, Charu teaches the SAS language, SQL, SAS Enterprise guide and Business Intelligence. She interviews clients to recommend the right SAS training to help them meet their needs. She is helping build a center for special needs kids in this project. http://www.handicareintl.org/pankaja/pankaja.swf

Related Posts

5 Comments

  1. donna thompson on

    Charu - I am from the SAS EG Kingston training with the Ministry of Health. You were on the last day trying to show me the statement for pull age out of the data set and not birthdate using the "function"

  2. thanks for your question Glen. Take a look at the Information Schema Views here http://msdn.microsoft.com/en-us/library/ms186778.aspx:

    An information schema view is one of several methods SQL Server provides for obtaining metadata. Information schema views provide an internal, system table-independent view of the SQL Server metadata. Information schema views enable applications to work correctly although significant changes have been made to the underlying system tables. The information schema views included in SQL Server comply with the ISO standard definition for the INFORMATION_SCHEMA.

    A jump start that gets everything would be...

    SELECT * FROM INFORMATION_SCHEMA.TABLES

    SELECT * FROM INFORMATION_SCHEMA.COLUMNS

    Then add your where clauses to narrow it down to tables and columns from tables you’re interested in.

  3. "dictionary.columns" provides a great way to get information from DB2. For example, I can conduct impact analysis after a change by storing the names of all tables and fields in a SAS dataset, then running a macro to determine which were affected. However, I was wondering if there is any equivalent for SQL/Server databases? "dictionary.columns" returns no results.

  4. You solved it in the classic program coding way.
    Using DI and SAS metadata that coding has been made graphical (click your mouse).
    Using the query builder (eg EGuide) is another one that the approach is graphical
    There are a lot of SAS persons telling "they should not code, but use our menu-s"

Back to Top