An application of Latin hypercube sampling to optimization

A previous article discusses a "Catch-22" paradox for fitting nonlinear regression models: You can't estimate the parameters until you fit the model, but you can't fit the model until you provide an initial guess for the parameters! If your initial guess for the parameters is not good enough, the nonlinear optimization algorithm that tries to maximize the loglikelihood might not converge. The previous article shows how to specify a grid of initial parameter values in PROC NLIN and PROC NLMIXED. The procedures evaluate the loglikelihood (LL) of the model on each tuple of parameters in the grid, then uses the one that has the largest loglikelihood as the initial guess for the optimization. Although we often speak about MAXIMUM likelihood estimation, many routines minimize the NEGATIVE loglikelihood. Thus, in this article, the better parameters are those that have smaller values for the negative LL.

For example, suppose you have five parameters in the model. If you specify 10 possible values for each parameter, then the grid of all parameters is the Cartesian product, which has 10⁵ = 100,000 tuples of parameters. The fitting algorithm must evaluate the LL at all these points before it starts the optimization process. If your data set is large and the model is complex, this can be an expensive computation.

There is another option. The PARMS statement in PROC NLMIXED supports a DATA= option that enables you to specify a data set for which each row is a set of parameter values. (In PROC NLIN and other SAS procedures, use the PDATA= option.) You can use Latin Hypercube Sampling (LHS) in SAS to create a much smaller set of points that nevertheless explores the space of parameters. In theory, you can achieve a similar LL value by using many fewer tuples of parameters. This article shows how to use LHS to choose initial guesses for an optimization. This technique is often used in Machine Learning and Deep Learning during a "hypertuning" step, which sets the parameters for an algorithm.

A nonlinear regression model

Let's use the same nonlinear regression problem from the previous article. The problem has five parameters. We want to choose an initial set of parameters in the following ranges: logsig in in [0, 1], beta1 and beta2 are in [-1, 1], alpha1 is in [1, 10], and alpha2 is in [1, 5]. The Appendix defines and stores the SAS IML modules for performing Latin hypercube sampling. You can load these modules and call them to create an LHS sample. Instead of hundreds or thousands of samples, let's generate 50 initial guesses, as follows:

proc iml;
/* load modules from Appendix. See also https://blogs.sas.com/content/iml/2024/12/09/latin-hypercube-sampling-sas.html */
load module=(UnifSampleSubIntervals PermuteRows LatinHyperSample); 
call randseed(1234);
/*           min  max     ParmName */
intervals = { 0    1,   /* logsig in [0,1]  */
             -1    1,   /* beta1 in [-1,1]  */
             -1    1,   /* beta2 in [-1,1]  */
              1   10,   /* alpha1 in [1,10] */
              1    5};  /* alpha2 in [1,5]  */
varNames = {'logsig' 'beta1' 'beta2' 'alpha1' 'alpha2'};
NumPts = 50;
LHS = LatinHyperSample(intervals, NumPts);
 
create ParamDataLHS from LHS[c=varNames];
append from LHS;
close;
quit;
 
title "Latin Hypercube Sampling in 5-D";
proc sgscatter data=ParamDataLHS;
   matrix logsig beta1 beta2 alpha1 alpha2 / markerattrs=(symbol=CircleFilled);
run;

The call to PROC SGSCATTER provides a visualization of the 50 five-dimensional points. Notice that the 2-D marginal distributions of the points are approximately uniform.

The following DATA step defines the data for the model. The call to PROC NLMIXED fits a five-parameter model. The 50 points generated by the LHS design are used to obtain an initial guess for the nonlinear optimization.

/* Data and NLMIXED code for nonlinear regression. See  
   https://blogs.sas.com/content/iml/2018/06/25/grid-search-for-parameters-sas.html
*/
data pump;
  input y t group;
  pump = _n_;
  logtstd = log(t) - 2.4564900;
  datalines;
 5  94.320 1
 1  15.720 2
 5  62.880 1
14 125.760 1
 3   5.240 2
19  31.440 1
 1   1.048 2
 1   1.048 2
 4   2.096 2
22  10.480 2
;
 
/* Use Latin hypercube sampling instead of dense grid for parameter search */
proc nlmixed data=pump;
   parms / DATA=ParamDataLHS;        /* read guesses for parameters from LHS data set */
   if (group = 1) then 
        eta = alpha1 + beta1*logtstd + e;
   else eta = alpha2 + beta2*logtstd + e;
   lambda = exp(eta);
   model y ~ poisson(lambda);
   random e ~ normal(0,exp(2*logsig)) subject=pump;
   ods select ParameterEstimates;
   ods output Parameters=_LL;
run; 
 
/* for visualization: add an ID variable */
data LL / view=LL;
   set _LL;
   ObsNum = _N_;
run;
/* Optional: examine min and mean of negative LL */
proc means data=LL;
   var NegLogLike;
run;
 
title "Loglikelihood for 50 Points from LHS";
proc sgplot data=LL;
   scatter x=ObsNum y=NegLogLike;
   yaxis values=(0 to 250 by 50) grid;
run;

For the initial guesses that are generated by LHS, there are many whose negative LL value is less than 50. The smallest negative LL value is 30.8. My previous article shows a similar graph for 300 points on a regular grid. In that article, the smallest negative LL was 29.8, which means that the best initial guess from the LHS method is comparable to the best guess from the regular-grid method, while requiring much less computation. Of course, the LHS method has an element of randomness, so there is no guarantee that the LHS method will provide a good initial guess. However, the LHS method is frequently used in practice because it tends to perform well.

Why not use uniform random parameters?

Before I learned about Latin hypercube sampling, I would often generate random parameter uniformly in a 5-D hypercube. This is known as simple random sampling, or SRS. The LHS method has a few advantages over SRS. The main advantage is that LHS is guaranteed to explore all values of each parameter. In contrast, SRS might result in gaps and clusters in some parameters. By random chance, we might end up with parameters that are all small or that do not contain any points in the middle of the range. In contrast, the empirical marginal distributions of the LHS points are guaranteed to be approximately uniform. By construction, if you use LHS to generate K samples, each of the K equal-length subintervals for each coordinate will contain exactly one sample.

Summary

This article shows how to use Latin hypercube sampling (LHS) to generate initial conditions for an optimization that fits a nonlinear regression model in SAS. For this example, all parameters are continuous. However, this same technique can be used for hypertuning algorithms in machine learning when some parameters must be discrete values. For discrete parameters, you can use the INT function to generate integers from the LHS method.

Appendix: SAS IML modules for Latin Hypercube Sampling

For your convenience, the following SAS IML modules implement Latin hypercube sampling. These modules are explained in the article "Latin Hypercube Sampling in SAS."

/* Latin Hypercube Sampling in SAS. See
   https://blogs.sas.com/content/iml/2024/12/09/latin-hypercube-sampling-sas.html
*/
proc iml;
/* UnifSampleSubIntervals:
   specify an interval [a,b] and a number of subintervals.
   Divide [a,b] into k subintervals and return a point uniformly at random 
   within each subinterval.
*/
start UnifSampleSubIntervals(interval, k);
   a = interval[1];
   b = interval[2];
   h = (b-a)/k;
   L = colvec( do(a, b-h/2, h) ); /* left-hand endpoints of k subintervals */
   u = randfun(k, "Uniform");     /* random proportion */
   eta = L + u*h;   /* eta[i] is randomly located in the i_th subinterval */
   return( eta );
finish;
 
/* PermuteRows helper function: Use the SAMPLE function to permute the rows of a matrix */
start PermuteRows(M);
   k = nrow(M);
   p = sample(1:k, k, "WOR");  /* random permutation of 1:k */
   return( M[p, ] );
finish;
 
/* LatinHyperSample:
   Generate random location in k subintervals for each of d intervals.
   intervals[i,] specifies the interval for the i_th coordinate
   k = (scalar) specifies the number of subintervals in each dimension 
*/
start LatinHyperSample(intervals, k); 
   d = nrow(intervals);
   LHS = j(k, d, .);
   do i = 1 to  d;
      eta = UnifSampleSubIntervals(intervals[i,], k);
      LHS[, i] = PermuteRows(eta);
   end;  
   return( LHS );
finish; 
store module=(UnifSampleSubIntervals PermuteRows LatinHyperSample);
quit;

Blogs