An easy way to find variables in common to data sets

4

My CatOver the holidays I was having a discussion with my cat, Ms. Trixie Lou. A question that often arises during the first programming class is the following: how do I find the variables that are in common to these two or three data sets? As it turns out, Ms. Trixie has just started programming in SAS and had some useful input. She suggested using the CONTENTS procedure to get a list of variables in each table.

As with most students, this was an excellent start.  Except the CONTENTS procedure also gives a whole bunch of other information that you have to wade through before you can find just the variable names.  Here was my suggestion:

SAS code

In this example I am trying to determine if the titanic and german data sets (both in the sasuser library) have any variables in common.  We use the short option with the CONTENTS procedure to get an alphabetical listing of just the variable names, nice right? But let's not stop there.

Let's add the null title and footnote statements to clear out any leftover titles and footnotes you may have from previous reports.

Also, set the noproctitle option through the output delivery system to suppress the default title of "The Contents Procedure" in our output report (see below.)  We do not need to see it.

Finally, by using the two additional ODS statements we create a single nice HTML report so that all the variables appear in the same output window.  By default, the variable lists can print to separate reports making them difficult to compare.

Variable Lists from two data sets

It is now very easy to spot that both data sets contain the variable Gender.  By the way, if you ran this code in either SAS Enterprise Guide or the latest 9.3 version of SAS, you might not require the ODS statements.  Both of those interfaces produce output where your reports could end up in the same window by default.

Now the truth of the matter is there are plenty of fancier ways of answering this same question of finding common variables in data sets.  For example, there is a COMPARE procedure that compares data sets.  There are also these really cool DICTIONARY tables that you can query with the SQL procedure.  But unfortunately Ms. Trixie Lou is telling me that it is her nap time and is demanding a tummy rub.  So I will have to see you next time.

Until then, keep on coding better than my kitty.

Share

About Author

Andy Ravenna

Principal Technical Training Consultant

Andy has worked at SAS for nearly thirty years. He has happily spent more than 15 of those years in the Education department training customers and writing courses. Over that time Andy has become certified to teach many programming courses in both the core and advanced curriculum; He now focuses in Advanced Analytics and Statistical Training. Interested in connecting with customers, Andy has written and delivered several papers at various conferences. Andy is not only fluent in SAS programming but is also fluent in English and Spanish. He holds a bachelor's degree in chemical engineering and a master's degree in applied mathematics (minor in statistics) from North Carolina State University, where SAS was born. While there, he taught Algebra and Calculus for 3 years and received the Maltbie Award for Superior Teaching Ability. Andy enjoys running marathons, photography, traveling for vacation, and taking advantage of everything New York City has to offer.

Related Posts

4 Comments

    • Andy Ravenna

      Thanks for your excellent blog, Rick. Did you know that I tested SAS/IML when I first started working at SAS over twenty years ago and it never even crossed my mind to solve this problem using it. Of course, the flexibilty of SAS is one of its strong points. I'm working on my next blog, otherwise I'd take one of these programs and throw them into a macro to generalize the solution for any number of data sets. Or is somebody else already cooking that up as a reply? :)

  1. Robert Allison
    Robert Allison on

    Here's one way to do it programmatically, using those fancy dictionary tables you mentioned! :)

    proc sql;

    create table info1 as select * from dictionary.columns
    where libname=upcase('sashelp') and memname=upcase('class');

    create table info2 as select * from dictionary.columns
    where libname=upcase('SASHELP') and memname=upcase('fish');

    create table in_both as select unique name from info1
    where info1.name in (select unique name from info2);

    quit;

    title "The following variables are in both data sets:";
    proc print data=in_both;
    run;

  2. Pingback: Find variables common to multiple data sets - The DO Loop

Back to Top