The COALESCE function: PROC SQL compared with PROC IML

0

When Charlie H. posted an interesting article titled "Top 10 most powerful functions for PROC SQL," there was one item on his list that was unfamiliar: the COALESCE function. (Edit: Charlie's blog no longer exists. The article used to be available at http://www.sasanalysis.com/2011/01/top-10-most-powerful-functions-for-proc.html)

Ever since I posted my first response, "SAS/IML Operations Compared with PROC SQL: Part I," I had been dreading writing an article on the COALESCE function. Why? Because SAS/IML does not have a COALESCE function, so I assumed I'd have to write a SAS/IML module that mimics the functionality. Although writing a module is easy, I wasn't sure what the COALESCE function did, so I would have to do research.

However, a recent Tip of the Day by Mike Zdeb at SASCommunity.org reminded me of a programming tip that I always recommend to others: you can call Base SAS functions from the SAS/IML language. In particular, Zdeb showed me that the DATA step supports the COALESCE function. My problem was solved!

The COALESCE Function

The COALSCE function returns the first nonmissing value in a list of variables. Why this is useful is best illustrated with an example. Suppose that you have a data set that contains information about contact information for students. Each row represents a student. Variables include a column for the mother's cell-phone number, the father's cell-phone number, the phone number at home, and various work numbers. Some of these phone numbers might be missing. The school administrator wants you to write a program that creates a new column that contains one of the phone numbers, based on the following algorithm:

  1. If the mother's cell-phone number is not missing use it.
  2. Otherwise, if the father's cell-phone number is not missing use it.
  3. Otherwise, if the home-phone number not missing use it.
  4. ...(and so forth with the other phone numbers)...

The COALESCE function enables you to process these data with a single function call instead of writing multiple IF-THEN/ELSE statements.

The following program provides a small example of the COALESCE function. The data set B has 10 observations and two variables. The PROC SQL statements uses the COALESCE function to report the value of x1 if that value is not missing. Otherwise, it reports the value of x2 if x2 is nonmissing. If both values are missing, then the COALESCE function returns a missing value.

data B;
input x1 x2;
datalines;
 . 11  
 .  . 
 3 13
 4  . 
 . 15
 6 16
;
run;
 
proc sql; 
title 'Coalesce() -- Combine column values';
select Monotonic() as obs, 
     coalesce(x1, x2) 
from B;
quit;

Because the COALESCE function is part of Base SAS, you can call it from PROC IML:

proc iml;
use B;
read all var {x1 x2};
close B;
y = coalesce(x1, x2);

The vector y contains the same values computed by PROC SQL. Notice that it is not necessary to write a loop: the COALESCE function automatically applies to each row of the argument vectors. (If you have character data, use the COALESCEC function instead.)

The lesson to learn is this: although about 300 functions and subroutines appear in the SAS/IML User's Guide, you can also use the 500 or so functions that appear in Base SAS software. And use all the formats, informats, and global statements. This is one reason that the SAS/IML language is a natural progression for a SAS programmer who wants to learn matrix programming: a SAS programmer can leverage existing DATA step knowledge when writing an analysis in PROC IML.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top