This article show how to run a SAS program in batch mode and send parameters into the program by specifying the parameters when you run SAS from a command line interface. This technique has many uses, one of which is to split a long-running SAS computation into a series of less time-intensive computations. There is a large class of problems that naturally divide into smaller pieces: programs in which the same computation is repeated many times for different values of parameters. A computation such as this is an example of an "embarrassingly parallel" computation.
To give a concrete example, suppose that you are running a simulation that involves generating many samples from the Beta(a,b) distribution over a wide range of (a,b) values, such as 0 < a ≤ 4 and 0 < b ≤ 4. You could write a single program that loops over a regular grid of (a,b) values, but here are some reasons that you might want to divide the computation into smaller pieces:
- Some parameter values might require more iterations than others in order to obtain accurate estimates.
- Certain parameter values might require a different estimation technique or an asymptotic approximation.
- An error in one portion of your program (for example, when a parameter is zero) will not result in the loss of earlier computations.
- You can begin analyzing the first set of results even while you are computing more results.
- If you have access to multiple computers, you can submit different parameter values to each computer, thus achieving parallel processing with little effort and no cost.
With apologies to Neil Sedaka, breaking up (a program) is easy to do.
An example SAS program
A simple DATA step is sufficient to illustrate the process. It lacks the motivation (there is no actual need to break up the program), but it illustrates the main ideas in a simple way. Start the break-up process by using macro variables to replace the upper and lower limits of the parameters, as follows:
%let aMin = 0.1; /* lower limit of parameter 'a' */ %let aMax = 4; /* upper limit of parameter 'a' */ %let bMin = 0.1; /* lower limit of parameter 'b' */ %let bMax = 4; /* upper limit of parameter 'b' */ %let DSName = Out1; /* name of data set that contain results for params */ libname dest "."; /* put results in current working directory */ data dest.&DSName(keep = a b kurtosis); do a = &aMin to &aMax by 0.1; /* loop over a */ do b = &bMin to &bMax by 0.1; /* loop over b */ /* compute kurtosis of Beta(a,b) distribution */ numer = 6*((a-b)**2 * (a+b+1)-a*b*(a+b+2)); denom = a*b*(a+b+2)*(a+b+3); kurtosis = numer / denom; output; end; end; run;
The program computes the kurtosis of the Beta distribution for each (a,b) value on a regular grid 0.1 ≤ a ≤ 4 and 0.1 ≤ b ≤ 4. Notice that the DEST libref ensures that the results are stored in the current directory.
Breaking up the computation into smaller pieces
The %LET statements define the range of the a and b parameters and the name of the output data set. (You can also use this technique to write a simulation that accepts a seed value as a parameter to the RAND function.) I can use the values of the macro variable to divide the computation into a series of smaller computations. For example, I could divide the computation into the following four smaller computations:
- Compute the results on 0.1 ≤ a ≤ 2 and 0.1 ≤ b ≤ 2. Store these results in the data set Out1.
- Compute the results on 0.1 ≤ a ≤ 2 and 2.1 ≤ b ≤ 4. Store these results in Out2.
- Compute the results on 2.1 ≤ a ≤ 4 and 0.1 ≤ b ≤ 2. Store these results in Out3.
- Compute the results on 2.1 ≤ a ≤ 4 and 2.1 ≤ b ≤ 4. Store these results in Out4.
You could change the parameter values manually, but SAS provides a feature that makes it simple to specify parameter values on the command line. At the top of the program, you can replace the simple %LET statements with calls to the %SYSGET function, as follows:
/* get values of environment variables from SAS command line */ %let aMin = %sysget(aMin); %let aMax = %sysget(aMax); %let bMin = %sysget(bMin); %let bMax = %sysget(bMax); %let DSName = %sysget(DSName);
With this change, you can run the DATA step in batch mode and use the -SET option on the SAS command line to change the parameter values for each invocation. For example, if the program is stored in the file PassParams.sas then the first invocation from a Windows command prompt could be
> "C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" PassParams.sas -set DSName "Out1" -set aMin 0.1 -set aMax 2 -set bMin 0.1 -set bMax 2
When the SAS program runs, the %SYSGET function gets the values that you specified by using the -SET command line option. Notice that the options are specified as keyword/value pairs such a -set aMin 0.1. SAS runs the program in batch mode and creates a SAS data set Out1.sas7bdat in the current directory. In a similar way you can run the program three more times to create the data sets Out2, Out3, and Out4.
After all the data sets are created, you can concatenate them together to form the complete data set, which contains results for the complete range of parameter values:
data All; set dest.Out:; /* use colon to specify Out1, Out2, Out3, and Out4 */ run; /* analyze the data and/or make plots */ proc sgrender data=All template=ContourPlotParm; dynamic _TITLE="Kurtosis of Beta(a,b) Distribution" _X="a" _Y="b" _Z="Result"; run;
The call to the SGRENDER procedure uses a GTL template from a previous article about how to create a contour plot in SAS. The graph shows that the kurtosis of the beta distribution is small when the values of a and b are approximately equal, but the kurtosis can be arbitrarily large when a or b are close to zero.
Although I demonstrated this technique for a simple DATA step, you can use the approach to control the parameters that are passed to any SAS procedures. For example, the following PROC IML statements might begin a long-running simulation in the SAS/IML language:
proc iml; aGrid = do(&aMin, &aMax, 0.1); /* evenly spaced [aMin, aMax] */ bGrid = do(&bMin, &bMax, 0.1); /* evenly spaced [bMin, bMax] */