The COALESCE function: PROC SQL compared with PROC IML

When Charlie H. posted an interesting article titled Top 10 most powerful functions for PROC SQL, there was one item on his list that was unfamiliar: the COALESCE function.

Ever since I posted my first response, "SAS/IML Operations Compared with PROC SQL: Part I," I had been dreading writing an article on the COALESCE function. Why? Because SAS/IML does not have a COALESCE function, so I assumed I'd have to write a SAS/IML module that mimics the functionality. Although writing a module is easy, I wasn't sure what the COALESCE function did, so I would have to do research.

However, a recent Tip of the Day by Mike Zdeb at SASCommunity.org reminded me of a programming tip that I always recommend to others: you can call Base SAS functions from the SAS/IML language. In particular, Zdeb showed me that the DATA step supports the COALESCE function. My problem was solved!

The COALESCE Function

The COALSCE function returns the first nonmissing value in a list of variables. Why this is useful is best illustrated with an example. Suppose that you have a data set that contains information about contact information for students. Each row represents a student. Variables include a column for the mother's cell-phone number, the father's cell-phone number, the phone number at home, and various work numbers. Some of these phone numbers might be missing. The school administrator wants you to write a program that creates a new column that contains one of the phone numbers, based on the following algorithm:

  1. If the mother's cell-phone number is not missing use it.
  2. Otherwise, if the father's cell-phone number is not missing use it.
  3. Otherwise, if the home-phone number not missing use it.
  4. ...(and so forth with the other phone numbers)...

The COALESCE function enables you to process these data with a single function call instead of writing multiple IF-THEN/ELSE statements.

The following program provides a small example of the COALESCE function. The data set B has 10 observations and two variables. The PROC SQL statements uses the COALESCE function to report the value of x1 if that value is not missing. Otherwise, it reports the value of x2 if x2 is nonmissing. If both values are missing, then the COALESCE function returns a missing value.

data B;
input x1 x2;
datalines;
 . 11  
 .  . 
 3 13
 4  . 
 . 15
 6 16
;
run;
 
proc sql; 
title 'Coalesce() -- Combine column values';
select Monotonic() as obs, 
     coalesce(x1, x2) 
from B;
quit;

Because the COALESCE function is part of Base SAS, you can call it from PROC IML:

proc iml;
use B;
read all var {x1 x2};
close B;
y = coalesce(x1, x2);

The vector y contains the same values computed by PROC SQL. Notice that it is not necessary to write a loop: the COALESCE function automatically applies to each row of the argument vectors. (If you have character data, use the COALESCEC function instead.)

The lesson to learn is this: although about 300 functions and subroutines appear in the SAS/IML User's Guide, you can also use the 500 or so functions that appear in Base SAS software. And use all the formats, informats, and global statements. This is one reason that the SAS/IML language is a natural progression for a SAS programmer who wants to learn matrix programming: a SAS programmer can leverage existing DATA step knowledge when writing an analysis in PROC IML.

tags: Data Analysis, SAS Programming

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>