Maximum likelihood estimation in SAS/IML


A popular use of SAS/IML software is to optimize functions of several variables. One statistical application of optimization is estimating parameters that optimize the maximum likelihood function. This post gives a simple example for maximum likelihood estimation (MLE): fitting a parametric density estimate to data.

Which density curve fits the data?

If you plot a histogram for the SepalWidth variable in the famous Fisher's iris data, the data look normally distributed. The normal distribution has two parameters: μ is the mean and σ is the standard deviation. For each possible choice of (μ, σ), you can ask the question, "If the true population is N(μ, σ), how likely is it that I would get the SepalWidth sample?" Maximum likelihood estimation is a technique that enables you to estimate the "most likely" parameters. This is commonly referred to as fitting a parametric density estimate to data.

Visually, you can think of overlaying a bunch of normal curves on the histogram and choosing the parameters for the best-fitting curve. For example, the following graph shows four normal curves overlaid on the histogram of the SepalWidth variable:

proc sgplot data=Sashelp.Iris;
histogram SepalWidth;
density SepalWidth / type=normal(mu=35 sigma=5.5);
density SepalWidth / type=normal(mu=32.6 sigma=4.2);
density SepalWidth / type=normal(mu=30.1 sigma=3.8);
density SepalWidth / type=normal(mu=30.5 sigma=4.3);

It is clear that the first curve, N(35, 5.5), does not fit the data as well as the other three. The second curve, N(32.6, 4.2), fits a little better, but mu=32.6 seems too large. The remaining curves fit the data better, but it is hard to determine which is the best fit. If I had to guess, I'd choose the brown curve, N(30.5, 4.3), as the best fit among the four.

The method of maximum likelihood provides an algorithm for choosing the best set of parameters: choose the parameters that maximize the likelihood function for the data. Parameters that maximize the log-likelihood also maximize the likelihood function (because the log function is monotone increasing), and it turns out that the log-likelihood is easier to work with. (For the normal distribution, you can explicitly solve the likelihood equations for the normal distribution. This provides an analytical solution against which to check the numerical solution.)

The likelihood and log-likelihood functions

The UNIVARIATE procedure uses maximum likelihood estimation to fit parametric distributions to data. The UNIVARIATE procedure supports fitting about a dozen common distributions, but you can use SAS/IML software to fit any parametric density to data. There are three steps:

  1. Write a SAS/IML module that computes the log-likelihood function. Optionally, since many numerical optimization techniques use derivative information, you can provide a module for the gradient of the log-likelihood function. If you do not supply a gradient module, SAS/IML software automatically uses finite differences to approximate the derivatives.
  2. Set up any constraints for the parameters. For example, the scale parameters for many distributions are restricted to be positive values.
  3. Call one of the SAS/IML nonlinear optimization routines. For this example, I call the NLPNRA subroutine, which uses a Newton-Raphson algorithm to optimize the log-likelihood.

Step 1: Write a module that computes the log-likelihood

A general discussion of log-likelihood functions, including formulas for some common distributions, is available in the documentation for the GENMOD procedure. The following module computes the log-likelihood for the normal distribution:

proc iml;
/* write the log-likelihood function for Normal dist */
start LogLik(param) global (x);
   mu = param[1];
   sigma2 = param[2]##2;
   n = nrow(x);
   c = x - mu;
   f = -n/2*log(sigma2) - 1/2/sigma2*sum(c##2);
   return ( f );

Notice that the arguments to this module are the two parameters mu and sigma. The data vector (which is constant during the optimization) is specified as the global variable x. When you use an optimization method, SAS/IML software requires that the arguments to the function be the quantities to optimize. Pass in other parameters by using the GLOBAL parameter list.

It's always a good idea to test your module. Remember those four curves that were overlaid on the histogram of the SepalWidth data? Let's evaluate the log-likelihood function at those same parameters. We expect that log-likelihood should be low for parameter values that do not fit the data well, and high for parameter values that do fit the data.

use Sashelp.Iris;
read all var {SepalWidth} into x;
close Sashelp.Iris;
/* optional: test the module */
params = {35   5.5,
          32.6 4.2,
          30.1 3.8,
          30.5 4.3};
LogLik = j(nrow(params),1);
do i = 1 to nrow(params);
   p = params[i,];
   LogLik[i] = LogLik(p);
print Params[c={"Mu" "Sigma"} label=""] LogLik;

The log-likelihood values confirm our expectations. A normal density curve with parameters (35, 5.5) does not fit the data as well as the other parameters, and the curve with parameters (30.5, 4.3) fits the curve the best because its log-likelihood is largest.

Step 2: Set up constraints

The SAS/IML User's Guide describes how to specify constraints for nonlinear optimization. For this problem, specify a matrix with two rows and k columns, where k is the number of parameters in the problem. For this example, k=2.

  • The first row of the matrix specifies the lower bounds on the parameters. Use a missing value (.) if the parameter is not bounded from below.
  • The second row of the matrix specifies the upper bounds on the parameters. Use a missing value (.) if the parameter is not bounded from above.

The only constraint on the parameters for the normal distribution is that sigma is positive. Therefore, the following statement specifies the constraint matrix:

/*     mu-sigma constraint matrix */
con = { .   0,  /* lower bounds: -infty < mu; 0 < sigma */
        .   .}; /* upper bounds:  mu < infty; sigma < infty */

Step 3: Call an optimization routine

You can now call an optimization routine to find the MLE estimate for the data. You need to provide an initial guess to the optimization routine, and you need to tell it whether you are finding a maximum or a minimum. There are other options that you can specify, such as how much printed output you want.

p = {35 5.5};/* initial guess for solution */
opt = {1,    /* find maximum of function   */
       4};   /* print a LOT of output      */
call nlpnra(rc, result, "LogLik", p, opt, con);

The following table and graph summarize the optimization process. Starting from the initial condition, the Newton-Raphson algorithm takes six steps before it reached a maximum of the log-likelihood function. (The exact numbers might vary in different SAS releases.) At each step of the process, the log-likelihood function increases. The process stops when the log-likelihood stops increasing, or when the gradient of the log-likelihood is zero, which indicates a maximum of the function.

You can summarize the process graphically by plotting the path of the optimization on a contour plot of the log-likelihood function. I used SAS/IML Studio to visualize the path. You can see that the optimization travels "uphill" until it reaches a maximum.

Check the optimization results

As I mentioned earlier, you can explicitly solve for the parameters that maximize the likelihood equations for the normal distribution. The optimal value of the mu parameter is the sample mean of the data. The optimal value of the sigma parameter is the unadjusted standard deviation of the sample. The following statements compute these quantities for the SepalWidth data:

OptMu = x[:];
OptSigma = sqrt( ssq(x-OptMu)/nrow(x) );

The values found by the NLPNRA subroutine agree with these exact values through seven decimal places.


About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.


    • Hi Rob, The code you provided was very helpful for me. I used it with my data and it worked perfect. I am now trying to do the same thing but for Poisson model instead of Normal. I wrote the following code but I am not sure whether it is correct. what's the best way to validate my code? I also need to estimate the parameters of the Lognormal model; do you have any code that would help me do so?

      proc optmodel;
      set OBS;
      num x {OBS};
      read data sashelp.iris into OBS=[_N_] x= SepalWidth;

      num n = card(OBS);

      var lambda>=0 init 0;

      max LogLikelihood = sum {i in OBS}(x[i] * log(lambda)) - n * lambda;

      /*max LogLikelihood1 = sum {i in OBS}(x[i] * log(lambda)) - log(x[i]) * lambda;
      max LogLikelihood2 = sum {i in OBS} log(PDF('poisson',x[i],lambda));
      LogLikelihood1 and LogLikelihood2 didn't work*/

      solve ;
      print lambda;
      print LogLikelihood;

      num optlambda = (sum {i in OBS} x[i]) / n;
      print optlambda;

  1. Pingback: The “power” of finite mixture models - The DO Loop

  2. Pingback: Modeling the distribution of data? Create a Q-Q plot - The DO Loop

  3. Hi Dr. Rick Wicklin, after running the MLE following above code, if I want to make a data table with the parameters (MU, SIGMA) estimated, how can I do? Thank you very much. Regards, Yong

  4. Pingback: Understanding local and global variables in the SAS/IML language - The DO Loop

  5. Dr. Rick Wicklin.. When I have a likelihood function where I have to estimate 3 parameters mu,sigma and Pc (proabability) , can I use the same code ?

    For example :likelihood = Pc ∫ f (x) dx where f is a normal distribution with unknown mu and sigma

    Also, Can the CDF function be used in the function f that you have defined.

  6. Hi Rick

    I am trying to create a vector in which each value is the result of an optimization relevant to the row. To be clearer:

    First, I have a column vector called n, composed of the numbers 50 to 40000 in increments of 1
    Second, I have a vector P defining the objective function, which is composed of two inputs: known n and unknown S where S needs to be optimized to maximize each row value of P. P is defined by:

    P = 50*(S##.45)#(n##1.1)-40000*n-S*n-S##1.35#n##.5

    Therefore, I need one optimized value in the S vector per row value of n.
    I have tried creating a do-loop in IML but as you know modules cannot be made to start on loops.
    I have tried to create a macro but had issues.

    Any suggestions on the best way forward?

    • PS I did manage to get it right outside of IML. However, this macro do-loop with Proc NLP takes a seriously long time once I loop between 50 and 40000, and I had to disable the log which kept on filling up. Still wondering if there's an easier way in IML :-)

      options nonotes nosource nosource2 errors=0;
      Data Estimates;
      Input S 12.5;
      %Macro simul;
      %do i = 50 %to 90;
      Proc NLP noprint outest=Parms(where=(_TYPE_='PARMS'));
      Max P0;
      Decvar S;
      P0 = 50*(S**.45)*(&i**1.1)-40000-(S*&i)-(S**1.35)*(&i**.5);
      Data Estimates;
      Set Estimates Parms(keep=S);

    • Rick Wicklin

      This is a rational expression in the variable S. Use calculus. Take the derivative w/r/t S and set it equal to zero. Solve for the root numerically by using the FROOT function.

  7. Pingback: Optimizing a function that evaluates an integral - The DO Loop

  8. Hyung S. Hwang on

    This blog's posts are really helpful to my job, thank you^^

    I'm working on an statistical MLE job using IML's call NLPQN module. The situation is that initial values are very critical to the stability of the solution and I need to make a simulation with different initial values of more than 100 times. I have looked through the helpdoc or manuals for the way how to store or extract objective function values printed on log window by setting option vector, and however failed to find out _._. The simulation took a long time for the completetion, so I could not watch the objective function value printed in log window in each time of simulation path.

    So, I need to get the objective function value at the MLE solution point and store it in a "data set" and use it later, for example, the selection among the simulation results.

    If possible, then hessian matrix can be stored in a daa set?

    Thank you in advance

  9. ASudipta Banerjee on

    Hi Rick, can you please share some real world problems where MLE is used?Is it only used to measure the value of mean and sigma of the dataset of the closest fit bell curve? What happens if the data is the dependent variable is driven by multiple factors?

  10. Hi again. Can you do this for uniform distribution too between 0 and 1? I don't understand how to incorporate the theta aspect.


  11. Hi dr. Rick Wicklin, i've read this article and I saw that you have used the iteration history (of the call nlprna) for the last graph. My question is trivial: how can i save the iteration history of the call nlprna in dataset and what is the procedure who you have used for the last graph? tnx.

  12. Hi Mr Rick Wicklin,
    I am studying my statistics thesis about max Extreme Value distribution.
    I try to estimate parameters of I. type (Gumbell) distribution via Maximum Likelihood Method on SAS. Is it possible to apply this on SAS ? If yes, could you please share with me related codes ?


  13. Pingback: Ten tips before you run an optimization - The DO Loop

  14. Pingback: Two ways to compute maximum likelihood estimates in SAS - The DO Loop

Leave A Reply

Back to Top