Over the holidays I was having a discussion with my cat, Ms. Trixie Lou. A question that often arises during the first programming class is the following: how do I find the variables that are in common to these two or three data sets? As it turns out, Ms. Trixie has just started programming in SAS and had some useful input. She suggested using the CONTENTS procedure to get a list of variables in each table.
As with most students, this was an excellent start. Except the CONTENTS procedure also gives a whole bunch of other information that you have to wade through before you can find just the variable names. Here was my suggestion:
In this example I am trying to determine if the titanic and german data sets (both in the sasuser library) have any variables in common. We use the short option with the CONTENTS procedure to get an alphabetical listing of just the variable names, nice right? But let's not stop there.
Let's add the null title and footnote statements to clear out any leftover titles and footnotes you may have from previous reports.
Also, set the noproctitle option through the output delivery system to suppress the default title of "The Contents Procedure" in our output report (see below.) We do not need to see it.
Finally, by using the two additional ODS statements we create a single nice HTML report so that all the variables appear in the same output window. By default, the variable lists can print to separate reports making them difficult to compare.
It is now very easy to spot that both data sets contain the variable Gender. By the way, if you ran this code in either SAS Enterprise Guide or the latest 9.3 version of SAS, you might not require the ODS statements. Both of those interfaces produce output where your reports could end up in the same window by default.
Now the truth of the matter is there are plenty of fancier ways of answering this same question of finding common variables in data sets. For example, there is a COMPARE procedure that compares data sets. There are also these really cool DICTIONARY tables that you can query with the SQL procedure. But unfortunately Ms. Trixie Lou is telling me that it is her nap time and is demanding a tummy rub. So I will have to see you next time.
Until then, keep on coding better than my kitty.
4 Comments
And if you want to generalize the problem, SAS supports several ways to find variables in common to mulitple data sets.
Thanks for your excellent blog, Rick. Did you know that I tested SAS/IML when I first started working at SAS over twenty years ago and it never even crossed my mind to solve this problem using it. Of course, the flexibilty of SAS is one of its strong points. I'm working on my next blog, otherwise I'd take one of these programs and throw them into a macro to generalize the solution for any number of data sets. Or is somebody else already cooking that up as a reply? :)
Here's one way to do it programmatically, using those fancy dictionary tables you mentioned! :)
proc sql;
create table info1 as select * from dictionary.columns
where libname=upcase('sashelp') and memname=upcase('class');
create table info2 as select * from dictionary.columns
where libname=upcase('SASHELP') and memname=upcase('fish');
create table in_both as select unique name from info1
where info1.name in (select unique name from info2);
quit;
title "The following variables are in both data sets:";
proc print data=in_both;
run;
Pingback: Find variables common to multiple data sets - The DO Loop