Convenient functions vs. efficient subroutines: Your choice

2

I've pointed out in the past that in the SAS/IML language matrices are passed to modules "by reference." This means that large matrices are not copied in and out of modules but are updated "in place." As a result, the SAS/IML language can be very efficient when it computes with large matrices. For example, when running large simulations, the SAS/IML language is often much faster than competing software that supports only function calls.

However, subroutines (which are called by using the CALL or RUN statements) are sometimes less convenient for a programmer to use. Each call to a subroutine is a separate statement. In contrast, you can nest function calls and use several function calls within a single complicated expression.

The convenience of functions is evident in some simulation studies. For example, you can regard the negative binomial distribution as a Poisson(λ) distribution, where λ is a gamma-distributed random variable. A distribution that has a random parameter is called a compound distribution. In a previous blog post I discussed the following program, which calls the RANDGEN subroutine to simulate data from a compound distribution:

proc iml;
call randseed(12345);
N = 1000; a = 5; b = 2;
/* Simulate X from a compound distribution: 
   X ~ Poisson(lambda), where lambda ~ Gamma(a,b) */
lambda = j(N,1); 
call randgen(lambda, "Gamma", a, b); /* lambda ~ Gamma(a,b) */
x = j(N,1);
call randgen(x, "Poisson", lambda);  /* pass vector of parameters */

After setting up the parameters, it takes four statements to simulate the data: two to allocate vectors to hold the results and two calls to the RANDGEN subroutine to fill those vectors with random values. Fewer statements are needed if a language supports a function that returns N random values.

Well, as of SAS/IML 12.3 (shipped with SAS 9.4), you can use the RANDFUN function when you want the convenience of calling a function that returns random values from a distribution. The syntax is
   x = RANDFUN(N, "Distribution" <,param1> <,param2> <,param3>);
The following statements call the RANDFUN function to simulate data from a compound distribution:

lambda = randfun(N, "Gamma", a, b);    /* lambda ~ Gamma(a,b) */
x = randfun(N, "Poisson", lambda);     /* X ~ Poisson(lambda) */

If you want to nest the calls to the RANDFUN function, you can simulate the data with a single statement:

x = randfun(N, "Poisson", randfun(N, "Gamma", a, b)); /* compound distribution */

If you are simulating data inside a DO loop, I strongly suggest that you continue to use the more efficient RANDGEN subroutine. However, in applications where performance is not a factor, you can enjoy the convenience of simulating data with a function call by using the new RANDFUN function.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

2 Comments

  1. Pingback: Simulating data for a logistic regression model - The DO Loop

  2. Pingback: Overview of new features in SAS/IML 12.3 - The DO Loop

Leave A Reply

Back to Top