When you pass a matrix as an parameter (argument) to a SAS/IML module, the SAS/IML language does not create a copy of the matrix. That approach, known as "calling by value," is inefficient. It is well-known that languages that implement call-by-value semantics suffer performance penalties.
In the SAS/IML language, matrices are passed to modules "by reference." This has two important implications:
- You can pass large matrices into SAS/IML modules without hurting performance.
- If you change or redefine an argument inside the module, the corresponding matrix also changes outside the module.
The second point means that you should be careful: when you modify the value or shape of matrices in a SAS/IML module, the change percolates up to the calling environment.
An Example of Call-By-Reference
A simple example can make these ideas clearer. Suppose that you write a module that sorts an input argument as part of a computation, such as returning a cumulative sum of sorted data:
proc iml; start MyCumSum( x ); call sort(x,1); /** compute with the sorted data **/ return( cusum(x) ); finish; |
When you call that module from a SAS/IML program, it actually sorts the matrix that is sent in:
y = {3,2,1,1,4,2}; cs = MyCumSum( y ); print y cs; |
The Advantages of Call-By-Reference Semantics
An advantage of call-by-reference semantics is that your program doesn't waste time and memory copying data from the global environment into the local variable. Instead, the local variable inside the module is just an alias to the same matrix that is passed into the module. (C programmers are familiar with the concept of a pointer. Same idea.)
Call-by-reference semantics also enables you to return multiple results from a single module call. For example, if you know that you need both the mean and the standard deviation of data, you might decide to write a module that computes both quantities at the same time:
start GetDescriptive(Mean, StdDev, /** output arguments **/ x); /** input argument **/ Mean = x[:]; StdDev = sqrt( ssq(x-Mean)/(nrow(x)-1) ); finish; run GetDescriptive(m, s, y); print m s; |
Notice that you don't have to allocate m or s before you call the module.
A Tip for Using Call-By-Reference Semantics
At the time that you write the MyCumSum module, it might not matter to you that the module changes the order of the data. Maybe the rest of the program never refers to the data again. Or maybe it only computes statistics that do not depend on the order of the data (such as the mean and standard deviation).
But what happens if you share the module with a colleague?
Murphy's Law predicts that your colleague will get burned.
Your colleague will insert your module call into a working program, and suddenly some computation that used to be correct will now give a wrong answer. He will spend hours debugging his program and will eventually discover that the first value in his y matrix no longer corresponds to the first observation in the data set. Your colleague might be tempted to angrily march into your office and shout, "your $!@#$% module changed the #!%$@ order of one of my variables!"
In order to maintain a healthy working relationship with your colleagues, a good programming practice is to observe the following convention:
Programming Tip: Do not write a module that modifies an input argument.
When I write a module that needs to change the values or shape of an input argument, I make an explicit copy of it and then modify the copy. For example, I might rewrite the MyCumSum module as follows:
start MyCumSum( _x ); x = _x; /** copy input argument **/ call sort(x,1); return( cusum(x) ); finish; |
I discuss these issues and give additional examples in my book, Statistical Programming with SAS/IML Software.
[Editor's Note: This article was revised on 07NOV2012 to correct a confusing sentence. Thanks to Ross Bettinger for the sharp eyes and suggestion.]
11 Comments
Many thanks for this blog post; explains the concepts very well I think. Could this by any chance have been motivated by my recent blog comment
Maybe it would be worth mentioning that the 'sort' call in SAS prior to 9.22 requires (at least) two arguments, so I had to write:
call sort(x, 1);
Yes, indeed, and I thank you for your comments. Keep them coming!
Also, thanks for pointing out the problem with the SORT call in the code that I originally posted. My code originally said CALL SORT(x). I changed it to CALL SORT(x,1) so that it works for SAS 9.2 and prior releases.
Hi, this is a nifty blog. I just thought I'd make a slight nitpick, regarding the statement
"That approach, known as "calling by value," is inefficient. It is well-known that languages that implement call-by-value semantics suffer performance penalties."
This is a bit misleading. Modern languages can implement a form of call-by-reference behind the scenes, where rather than make a new copy of an argument, they keep track of whether its value is modified. If the argument doesn't change, a new copy is not made. Semantically speaking this is equivalent to call-by-value, but without the performance hit of copying everything.
I agree. Modern languages that implement call-by-value usually try to be smart about when (and whether) the arguments are copied. The two links in my blog both point to a Wikipedia page where about a dozen evaluation strategies are discussed and particular languages (Java, Ruby, C, FORTRAN, ...) are mentioned.
http://en.wikipedia.org/wiki/Evaluation_strategy
Pingback: Eight tips to make your simulation run faster - The DO Loop
Pingback: Row vectors versus column vectors - The DO Loop
Pingback: How to return multiple values from a SAS/IML function - The DO Loop
Pingback: Oh, those pesky temporary variables! - The DO Loop
Pingback: Understanding local and global variables in the SAS/IML language - The DO Loop
Pingback: On passing a list to a SAS/IML module - The DO Loop
Pingback: Everything you wanted to know about writing SAS/IML modules - The DO Loop