How to pass parameters to a SAS program

7

This article show how to run a SAS program in batch mode and send parameters into the program by specifying the parameters when you run SAS from a command line interface. This technique has many uses, one of which is to split a long-running SAS computation into a series of less time-intensive computations. There is a large class of problems that naturally divide into smaller pieces: programs in which the same computation is repeated many times for different values of parameters. A computation such as this is an example of an "embarrassingly parallel" computation.

To give a concrete example, suppose that you are running a simulation that involves generating many samples from the Beta(a,b) distribution over a wide range of (a,b) values, such as 0 < a ≤ 4 and 0 < b ≤ 4. You could write a single program that loops over a regular grid of (a,b) values, but here are some reasons that you might want to divide the computation into smaller pieces:

  • Some parameter values might require more iterations than others in order to obtain accurate estimates.
  • Certain parameter values might require a different estimation technique or an asymptotic approximation.
  • An error in one portion of your program (for example, when a parameter is zero) will not result in the loss of earlier computations.
  • You can begin analyzing the first set of results even while you are computing more results.
  • If you have access to multiple computers, you can submit different parameter values to each computer, thus achieving parallel processing with little effort and no cost.

With apologies to Neil Sedaka, breaking up (a program) is easy to do.

An example SAS program

A simple DATA step is sufficient to illustrate the process. It lacks the motivation (there is no actual need to break up the program), but it illustrates the main ideas in a simple way. Start the break-up process by using macro variables to replace the upper and lower limits of the parameters, as follows:

%let aMin = 0.1;         /* lower limit of parameter 'a' */
%let aMax = 4;           /* upper limit of parameter 'a' */
%let bMin = 0.1;         /* lower limit of parameter 'b' */
%let bMax = 4;           /* upper limit of parameter 'b' */
%let DSName = Out1;      /* name of data set that contain results for params */
 
libname dest ".";        /* put results in current working directory */
data dest.&DSName(keep = a b kurtosis);
do a = &aMin to &aMax by 0.1;                 /* loop over a */
   do b = &bMin to &bMax by 0.1;              /* loop over b */
      /* compute kurtosis of Beta(a,b) distribution */
      numer = 6*((a-b)**2 * (a+b+1)-a*b*(a+b+2));
      denom = a*b*(a+b+2)*(a+b+3);
      kurtosis = numer / denom;
      output;
   end;
end;
run;

The program computes the kurtosis of the Beta distribution for each (a,b) value on a regular grid 0.1 ≤ a ≤ 4 and 0.1 ≤ b ≤ 4. Notice that the DEST libref ensures that the results are stored in the current directory.

Breaking up the computation into smaller pieces

The %LET statements define the range of the a and b parameters and the name of the output data set. (You can also use this technique to write a simulation that accepts a seed value as a parameter to the RAND function.) I can use the values of the macro variable to divide the computation into a series of smaller computations. For example, I could divide the computation into the following four smaller computations:

  1. Compute the results on 0.1 ≤ a ≤ 2 and 0.1 ≤ b ≤ 2. Store these results in the data set Out1.
  2. Compute the results on 0.1 ≤ a ≤ 2 and 2.1 ≤ b ≤ 4. Store these results in Out2.
  3. Compute the results on 2.1 ≤ a ≤ 4 and 0.1 ≤ b ≤ 2. Store these results in Out3.
  4. Compute the results on 2.1 ≤ a ≤ 4 and 2.1 ≤ b ≤ 4. Store these results in Out4.

You could change the parameter values manually, but SAS provides a feature that makes it simple to specify parameter values on the command line. At the top of the program, you can replace the simple %LET statements with calls to the %SYSGET function, as follows:

/* get values of environment variables from SAS command line */
%let aMin = %sysget(aMin);
%let aMax = %sysget(aMax);
%let bMin = %sysget(bMin);
%let bMax = %sysget(bMax);
%let DSName = %sysget(DSName);

With this change, you can run the DATA step in batch mode and use the -SET option on the SAS command line to change the parameter values for each invocation. For example, if the program is stored in the file PassParams.sas then the first invocation from a Windows command prompt could be

> "C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" PassParams.sas 
    -set DSName "Out1" 
    -set aMin 0.1 -set aMax 2 -set bMin 0.1 -set bMax 2

When the SAS program runs, the %SYSGET function gets the values that you specified by using the -SET command line option. Notice that the options are specified as keyword/value pairs such a -set aMin 0.1. SAS runs the program in batch mode and creates a SAS data set Out1.sas7bdat in the current directory. In a similar way you can run the program three more times to create the data sets Out2, Out3, and Out4.

After all the data sets are created, you can concatenate them together to form the complete data set, which contains results for the complete range of parameter values:

data All;
set dest.Out:;           /* use colon to specify Out1, Out2, Out3, and Out4 */
run;
 
/* analyze the data and/or make plots */
proc sgrender data=All template=ContourPlotParm;
dynamic _TITLE="Kurtosis of Beta(a,b) Distribution"
        _X="a" _Y="b" _Z="Result";
run;
passparams

The call to the SGRENDER procedure uses a GTL template from a previous article about how to create a contour plot in SAS. The graph shows that the kurtosis of the beta distribution is small when the values of a and b are approximately equal, but the kurtosis can be arbitrarily large when a or b are close to zero.

Although I demonstrated this technique for a simple DATA step, you can use the approach to control the parameters that are passed to any SAS procedures. For example, the following PROC IML statements might begin a long-running simulation in the SAS/IML language:

proc iml;
aGrid = do(&aMin, &aMax, 0.1); /* evenly spaced [aMin, aMax] */
bGrid = do(&bMin, &bMax, 0.1); /* evenly spaced [bMin, bMax] */
So next time you have a long-running SAS program that you want to run during lunch (or overnight), think about using the -SET command line option to specify parameter values that are passed to your program. Inside your program, use the %SYSGET function to receive the values. This technique enables you to run SAS programs in batch mode and pass in program parameters. It also enables you to break up a long-running program into smaller, less time-intensive programs.
Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

7 Comments

  1. Rick -- awesome tip! I've looked for this before and haven't found such a nice write-up. Have you used this with the systask command before? I run some similar jobs that I'd like to split and run in parallel and think a combination of systask + sysget will do the trick nicely.

  2. You learn a new thing every day!

    I have a number of SAS jobs that are set off as part of Powershell scripts. Currently I pass parameters to them via the sysparm option, using a delimiter within the (single) parameter, as in:

    # Powershell code
    $prog = "C:\Program Files\SASHome\SASFoundation\9.3\sas.exe"
    $arg = "-sysin $base_dir\ccjsu11r.sas -log $base_dir\ccjsu11r.log -sysparm $p_month~$p_year"
     
    $sasjob = New-Object System.Diagnostics.Process
    $sasjob.StartInfo = New-Object System.Diagnostics.ProcessStartInfo
    $retcode = $false
     
    $sasjob.StartInfo.FileName = $prog
    $sasjob.StartInfo.Arguments = $arg
    $sasjob.StartInfo.UseShellExecute = $shell
    $sasjob.StartInfo.WindowStyle = "Hidden"
    $sasjob.StartInfo.RedirectStandardOutput = $true
     
    $null = $sasjob.Start()
    $sasjob.WaitForExit()
    $RetCode=$sasjob.exitcode

    and

    /* SAS Code */
    %let p_month=%scan(&sysparm,1,~);
    %let p_year=%scan(&sysparm,2,~);

    -set and %sysget will do that in a much clearer way.

  3. I'm using multiple -SET options and invoking a SAS script. I'm getting an ERROR when there is NULL value in one of the -SET value.

    Shell Script:::::
    FILE_PATH=/temp/12816049.csv
    ERROR_MSG=
    ID=
    /sasapp/uat/sas94/config/comp/Lev1/SASApp/sas.sh /temp/Metrics.sas -SET FILE_PATH "$FILE_PATH" -SET ERROR_MSG "$ERROR_MSG" -SET ID "$ID" -log /temp/Metrics.log

    ERROR:::::
    ERROR: INVALID OPTION VALUE, FOR SAS OPTION SET

  4. Rick,

    This is just absolutely brilliant (and just what I needed). I honestly had no idea that SAS could go beyond the 200 byte limit of SYSPARM. This is just what I needed to enable multi-threaded reads of large datasets. I've now got a read of a 17 million row dataset down from 25 minutes to under 10. Thank you so much for posting this.

    Jim

Leave A Reply

Back to Top