Compute the multivariate normal density in SAS

16

I've been working on a new book about Simulating Data with SAS. In researching the chapter on simulation of multivariate data, I've noticed that the probability density function (PDF) of multivariate distributions is often specified in a matrix form. Consequently, the multivariate density can usually be computed by using the SAS/IML matrix language. As an example, this article describes how to compute the multivariate normal probability density for an arbitrary number of variables.

Compute the multivariate normal PDF

The density for the multivariate distribution centered at μ with covariance Σ is given by the following formula, which I copied from a Wikipedia article:

The argument to the EXP function involves the expression d2=(x-μ)TΣ-1(x-μ), where d is the Mahalanobis distance between the multidimensional point x and the mean vector μ. I have previous written a SAS/IML function that computes the Mahalanobis distance in SAS, so it is simple to construct a SAS/IML function that computes the multivariate normal PDF for any dimension:

proc iml;
/* Cut and paste the definition of the Mahalanobis module. See
https://blogs.sas.com/content/iml/2012/02/22/how-to-compute-mahalanobis-distance-in-sas/ */
start MVNormalPDF(x, mu, cov);
   p = nrow(cov);
   const = (2*constant("PI"))##(p/2) * sqrt(det(cov));
   d = Mahalanobis(x, mu, cov);
   phi = exp( -0.5*d#d ) / const;
   return( phi );
finish;

The MVNormalPDF function is essentially an implementation of the PDF formula, except that the (efficient) Mahalanobis function is used instead of explicitly forming the expression (x-μ)TΣ-1(x-μ). Incidentally, the formula assumes that x and μ are a column vectors. However, the Mahalanobis and MVNormalPDF functions assume that mu is a row vector, and evaluates the PDF on each row of the x matrix.

Evaluating the multivariate normal PDF

Although it is not apparent at first glance, the MVNormalPDF function is vectorized, which means that the x argument can be an N x p matrix that represents N different p-dimensional points. The vectorization results from the fact that the Mahalanobis function is vectorized.

This means that you can evaluate the PDF at many points by using a single call, as shown below for a three-dimensional example:

Mu  = {1 2 3};
Cov = {3 2 1, 
       2 4 0,
       1 0 5};
x = {0 0 0,
     0 1 2,
     2 1 2,
     1 2 3};
pdf = MVNormalPDF(x, Mu, Cov);
print x[c=("x1":"x3")] pdf;

Visualize the bivariate normal PDF

In the case of two variables, you can visualize the bivariate normal density by creating a surface plot or contour plot. In either case, you need to evaluate the MVNormalPDF function at a grid of (x,y) values. You can use the Define2DGrid function to generate evenly spaced (x,y) values on a uniform grid. The following SAS/IML statements load the Define2DGrid function, evaluate the bivariate normal density at a grid of points, and write the data to a SAS data set:

/* define or load Define2DGrid module:
https://blogs.sas.com/content/iml/2011/01/21/how-to-create-a-grid-of-values/ */
load module=Define2DGrid;
Mu = {0 0};
Cov = {4 2, /* correlation coefficient is sqrt(2) */
       2 2};
x = Define2DGrid( -3, 3, 31, -3, 3, 31 ); /* grid is [-3,3]x[-3,3] */
density = MVNormalPDF(x, Mu, Cov);
 
x1 = x[,1]; x2 = x[,2];
create BivarDensity var {"x1" "x2" "Density"};
append;
close BivarDensity;
quit;

Earlier this week, I showed how to create a contour plot in SAS by using the Graph Template Language (GTL). Assuming that the ContourPlotParm template has been defined, the following call to the SGRENDER procedure creates a contour plot of the bivariate normal data:

proc sgrender data=BivarDensity template=ContourPlotParm;
dynamic _TITLE="Bivariate Normal Density, corr=sqrt(2) "
        _X="x1" _Y="x2" _Z="Density";
run;
Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

16 Comments

  1. amit bothra on

    Can we calculate Multivariate Normal Density in base SAS without using IML as I dont have access to it.

    • Rick Wicklin

      Wikipedia has a formula for the bivariate density that you can program in the DATA step. You can do an internet search for "trivariate normal density" to find the formula for three variables.

  2. Pingback: Visualize the bivariate normal cumulative distribution - The DO Loop

  3. Abdulbaki Bilgic on

    Rick,
    Thank you indeed for your answer. I simply tried to replicate your results with the program you posted.
    proc iml;
    /* define or load Mahalanobis module:
    http://blogs.sas.com/content/iml/2012/02/22/how-to-compute-mahalanobis-distance-in-sas/ */
    load module=Mahalanobis;
    start MVNormlPDF(x, mu, cov);
    p = nrow(cov);
    const = (2*constant("PI"))##(p/2) * sqrt(det(cov));
    d = Mahalanobis(x, mu, cov);
    phi = exp( -0.5*d#d ) / const;
    return( phi );
    finish;
    Mu = {1 2 3};
    Cov = {3 2 1,
    2 4 0,
    1 0 5};
    x = {0 0 0,
    0 1 2,
    2 1 2,
    1 2 3};
    pdf = MVNormalPDF(x, Mu, Cov);
    print x[c=("x1":"x3")] pdf;
    run;

    but in the log window, unfortunately i am getting the same error as i had before.
    ERROR: Unable to load module MAHALANOBIS. [Entry MAHALANOBIS.IMOD not found in catalog WORK.IMLSTOR.]
    though Module MVNORMLPDF defined.
    i do not know that wrong is with the program or not able to load the module. Thanks.

    • Rick Wicklin

      Sorry for your confusion. Notice that the comment links to an article in which I define the Mahalanobis module. Go to that URL and paste in the definition of the Mahalanobis function. I have rewritten the comment to make it more clear.

  4. Abdulbaki Bilgic on

    Rick,
    proc iml;
    start MVNormalPDF(x, mu, cov);
    y = (x-mu)

    ;
       d     = y

    * inv(cov) * y;
    p = nrow(cov);
    const = (2*constant("PI"))##(p/2) * sqrt(det(cov));
    phi = exp( -0.5*d#d ) / const;
    return( phi );
    finish;
    Mu = {1 2 3};
    Cov = {3 2 1,
    2 4 0,
    1 0 5};
    x = {0 0 0,
    0 1 2,
    2 1 2,
    1 2 3};
    pdf = MVNormalPDF(x, Mu, Cov);
    print x[c=("x1":"x3")] pdf;
    this program works and produces pdf at each x.
    Also, in your post, there is a word error between MVNormlPDF in "start" function and pdf = MVNormalPDF(x, Mu, Cov). Thanks a lot.

  5. Pingback: Compute a contour (level curve) in SAS - The DO Loop

  6. Pingback: How to generate a grid of points in SAS - The DO Loop

  7. Varun Khandelwal on

    Dear Rick,

    I tried the corrected code given by Abdulbaki Bilgic. But still it shows error "Matrices do not conform to the operation". I followed your other post as well on the error part but still it is not resolved. Please help me with the code. I have gone through your other posts as well and they are very intuitive and very helpful.

    Regards,
    Varun

  8. I need to estimate the parameters of a set of bivariate gaussians from data. The data take the form of an n x n identification/confusion matrix where n is the the number of response alternatives. In its most basic form, the stimuli are formed from two dimensions (e.g., height, width), with two levels of each dimension, giving four response alternatives. It is also assumed that there is a decision bound for each dimension, giving a total of a mean vector and covariance matrix for each response alternative, and two decision bounds. Could this code be adapted to this problem, and if so, what would your recommendation be for proceeding. My thanks in advance!

Leave A Reply

Back to Top