SAS programmers use the SAS/IML language for many different tasks. One important task is computing an integral. Another is optimizing functions, such as maximizing a likelihood function to find parameters that best fit a set of data.

Last week I saw an interesting problem that combines these two important tasks. The problem involved optimizing a function, which in turn required evaluating an integral. There are two kinds of integrals to consider:

- The cumulative distribution function (CDF) is the integral of a probability density. In many applications you can numerically integrate a probability density by calling the CDF function in SAS.
- If the integral is not a CDF that is supported by SAS, you can use the QUAD subroutine to integrate an arbitrary user-defined function.

### Computing probability by integrating a density function

You can compute a probability by integrating a probability density function (PDF). However, for most common probability distributions, you don't need to compute the integral yourself because SAS provides the CDF function. The CDF is the anti-derivative of the PDF. For example, to integrate the standard normal PDF on the interval [-1,1], you can call the CDF function as follows:

prob = cdf("Normal", 1) - cdf("Normal", -1); /* integral of phi(x) on [-1, 1] */ |

The answer is 0.68, which is the probability that a random normal variable falls within one standard deviation of the mean.

Distributions often contain parameters, and a common task in statistical programming is finding parameter values that satisfy some condition (a root-finding problem) or that optimize some criterion. This article describes how to optimize an integral that depends on two parameters.

### The Beta(a,b) distribution

As an example, consider the Beta(a,b) distribution, which is a function of two shape parameters. Suppose that you want to find the parameter values *a* > 0 and 1 ≤ *b* ≤ 3 that maximize the integral of the Beta(a,b) PDF over the interval [0.6, 0.8].

Several Beta(a,b) curves are shown in the figure to the left. (Click to enlarge.) The Beta(4,2) curve (teal color) and the Beta(6,2) curve (magenta color) both have large areas on the interval [0.6, 0.8], so those parameter values would be good starting points for a search that attempts to find an optimal parameter value.

### Optimizing the parameters of the Beta(a,b) distribution

To maximize the Beta(a,b) density, I will call the SAS/IML NLPNRA subroutine, which performs Newton-Raphson optimization. The NLP subroutine requires a user-defined module that can evaluate the objective function. The following SAS/IML module evaluates the integral of the Beta(a,b) function on [0.6, 0.8]. The parameters (a,b) are the elements of the parameter vector.

proc iml; /* integral of the Beta(a,b) density on [0.6, 0.8] */ start BetaProb(parm); a = parm[,1]; b = parm[,2]; prob = cdf("beta", 0.8, a, b) - cdf("beta", 0.6, a, b); return( prob ); finish; val = BetaProb( {4 2} ); print val; |

The call to the BetaProb function for the parameters (a,b)=(4,2) results in an integral of 0.4, which looks correct from the figure. Because this is a constrained optimization problem, the next step is to specify the linear constraints for the parameters. You can then provide an initial guess and call the NLPNRA subroutine, as follows:

/* Set up the constraints a > 0 and 1 <= b <= 3 */ /* a b */ con = {0 1, /* lower bounds */ . 3}; /* upper bounds */ p = {4 2}; /* initial guess for solution */ opt = {1, /* find maximum of function */ 2}; /* print iteration history and optimal parameters */ call nlpnra(rc, OptParm, "BetaProb", p, opt) blc=con; /* OptParm is solution */ |

The NLPNRA subroutine computed that the parameters that maximize the integral of the Beta(a,b) density function over [0.6, 0.8] (subject to the constraints that 1 ≤ b ≤ 3) are (a,b)=(7.717, 3). The optimal Beta density is shown below.

For this problem, the fourth column of the "Optimization Results" table shows that the gradient is nonzero at the solution. This indicates that the solution is not a global maximum, but is only a maximum within the constrained parameter domain. The following contour plot shows contours of the integral as a function of the parameters. The optimal parameter value is displayed along the top edge of the graph.

In summary, this article demonstrates three things:

- You can use SAS/IML software to optimize a function that is defined by an integral.
- You can integrate a probability density by using the CDF function. There is no need to use the QUAD function.
- You can constrain the parameter values. As this example demonstrates, sometimes the optimal parameter value occurs on the boundary of the parameter domain.

Next week I will show an example of optimizing a function that requires calling the QUAD function to compute an integral.

## 3 Comments

Here's one way to solve the problem by using PROC OPTMODEL in SAS/OR:

Pingback: Optimizing a function of an integral - The DO Loop

Pingback: Create a density curve with shaded tails - The DO Loop