Multi-way lookup tables

2

A lookup table is a programming technique where one or more values can be used to retrieve another value. For example, many years ago, I had benzene exposure estimates for 10 years (1940 to 1949) for each of five locations in a factory. Given a year and a job location, I needed to know the benzene concentration.

I would be terribly embarrassed today if anyone saw the first program I wrote to solve the problem! This blog shows a better way that uses temporary arrays to create an n-way lookup table. To keep the example simple, let's use five years of data (1944 to 1948) and four locations (1 to 4).

Temporary arrays

Before we get into the program, let's discuss temporary arrays, one of my favorite SAS tools. Here is an example of a one-dimensional temporary array:

Data Pass_Fail;
   input ID $ Grade1 - Grade5;
   array PF[5] _temporary_ (65 70 55 65 55);
   array Grade[5]; *If you leave off the variable list SAS will use the
                    array name with numbers 1-5 added. In this example
                    the variables will be Grade1, Grade2, etc.;
   array Pass_or_Fail[5] $ 4;
   do i = 1 to 5;
      if Grade[i] ge PF[i] then Pass_or_Fail[i] = 'Pass';
      else if not missing(Grade[i]) then Pass_or_Fail[i] = 'Fail';
   end;
   drop i;
datalines;
001 90 68 52 70 72
002 56 69 72 75 88
;
Title "Listing of Data Set Pass_Fail";
Proc print data=Pass_Fail noobs;
Run;

In this example, the temporary array is called PF (pass fail values), and it has 5 elements. There are no actual variables PF1, PF2, and so on, only array elements PF[1], PF[2], and so on. The initial values of the five passing grades are placed in parentheses following the key word _temporary_. In many situations, you load the values of the temporary array from a data file.

To keep this first example easy to understand, we will put the initial values in the array statement. You can now compare each student's grade for every test and assign a value of "Pass" or "Fail."

Here is the output:

Note: You can read a blog that I wrote years ago on temporary arrays for another example. 

Example

Now for the two-way table lookup example.

*Two-dimensional table lookup using a temporary array;
data Lookup;
   array Benzene[1944:1948,4] _temporary_; ①
 
   /* Populate the array */
   if _n_ = 1 then do Year = 1944 to 1948; ②
       do Location = 1 to 4;
	      input Benzene[Year,Location] @; ③
	   end;
	end;
 
   input Subj $ Year Location;
   Benzene_Level = Benzene[Year, Location]; ④
datalines;
250 200 150 130
90 180 155 90
95 35 170 140 
80 50 45 100 
40 50 25 15
001 1944 3
002 1948 1
003 1945 4
;
title "Listing od Data Set Lookup";
proc print data=Lookup noobs;
run;

① This ARRAY statement creates an array with two dimensions (you use a comma to create multiple dimensions). To make programming easier to understand, the first dimension of the array uses subscripts 1944 to 1948, rather than 1 to 5 (the colon enables you to specify the lower and upper bounds of an array). Also notice that there are no initial values in this statement—they will be read from data.
② This section of code populates the values in the Benzene temporary array. You use the statement if _n_ = 1 to ensure that this section of code executes only once.
③ The INPUT statement reads in a value for Year and Location. The single trailing @ sign prevents SAS from going to a new line each time to DO loop iterates.
④ Notice how easy it is to retrieve an exposure value, given a value of Year and Location. The first five lines of data are the values used to populate the temporary array.

You can read more about temporary arrays in my book, Learning SAS by Example: A Programmers Guide, Second Edition.

Comments on this blog are welcome.

Share

About Author

Ron Cody

Private Consultant

Dr. Ron Cody was a Professor of Biostatistics at the Rutgers Robert Wood Johnson Medical School in New Jersey for 26 years. During his tenure at the medical school, he taught biostatistics to medical students as well as students in the Rutgers School of Public Health. While on the faculty, he authored or co-authored over a hundred papers in scientific journals. His first book, Applied Statistics and the SAS Programming Language, was first published by Prentice Hall in 1985 and is now in its fifth edition. Since then, he has published over a dozen books on SAS programming and statistical analysis using SAS. His latest book, A Gentle Introduction to Statistics Using SAS Studio was published this year. Ron has presented numerous papers at SAS Global forums, regional conferences, as well as local user groups. He is presently a contract instructor for SAS Institute and continues to write books on SAS and statistical topics.

2 Comments

  1. Hi Peter.
    Thanks for your comments. I learned SAS before SQL even existed and usually like DATA step solutions. I like my solution because the benzene values are stored in memory and the program is very efficient. For me, it is easier to understand. My SQL skills are pretty weak.

  2. Peter Lancashire on

    PROC SQL SELECT with a join to the lookup table is an alternative way to do this kind of thing. It is generally more reliable because it uses declarative statements with no looping. It is also more data-driven, so changes can often be made in the data alone: the SQL code remains the same. SQL is more portable and familiar to those new to SAS. I reduce my use of the DATA step to a minimum; mostly just complex within-row calculations or order-dependent between-row calculations.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top