How to generate multiple samples from the multivariate normal distribution in SAS


A SAS customer asks:

How do I use SAS to generate multiple samples of size N from a multivariate normal distribution?

Suppose that you want to simulate k samples (each with N observations) from a multivariate normal distribution with a given mean vector and covariance matrix. Because all of the samples are drawn from the same distribution, one way to generate k samples is to generate a single "mega sample" with kN observations, and then use an index variable to indicate that the first N observations belong to the first sample, the next N observations belong to the second sample, and so forth. This article implements that technique.

I have previously shown how to use the RANDNORMAL function in SAS/IML to simulate multivariate normal data. Now suppose that you want to generate 10 samples, where each sample contains five observations from a trivariate normal distribution. You can generate 5 x 10 = 50 observations as follows:

proc iml;
N = 5;                              /* size of each sample */
NumSamples = 10;                    /* number of samples   */  
/* specify population mean and covariance */
Mean = {1, 2, 3};
Cov = {3 2 1, 
       2 4 0,
       1 0 5};
call randseed(4321);               
X = RandNormal(N*NumSamples, Mean, Cov);

Remark: If you intend to implement the rest of your algorithm in the SAS/IML language, you probably do not need to create an index variable. For example, if you want to compute correlation matrices for each sample, you can subset the samples as follows:

/* optional: do you want to analyze the data in SAS/IML? */
Obs1 = 1;                   /* beginning of i_th sample      */
do i = 1 to NumSamples;
   Obs2 = Obs1 + N -1;      /* end of i_th sample            */
   Y = X[Obs1:Obs2, ];      /* Y = i_th sample               */
   Obs1 = Obs2 + 1;         /* prepare for next sample       */
   c = corr(Y);             /* do something with i_th sample */

However, from other things that the SAS customer said, I think she wants the data in a SAS data set so that she can run a SAS procedure on the data. Therefore, construct an ID variable as follows:

ID = colvec(repeat(T(1:NumSamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */

The nested syntax looks complicated, but each function call is simple. The T function creates a column vector with elements {1,2,...,9,10}. The REPEAT function copies the column vector to create a matrix with five columns. The COLVEC function stacks the resulting elements in row-major order. Thus the first five elements of the ID vector are 1, the next five are 2, and so forth, for a total of 50 elements. You can write the data to a SAS data set as follows:

Z = ID || X;
create MVN from Z[c={"ID" "x1" "x2" "x3"}];
append from Z;
close MVN;

For completeness, the following call to the CORR procedure computes 10 correlation matrices, one for each value of the ID variable. Remember: use the BY statement to carry out an analysis on each sample. You do NOT want to break the data into 10 data sets and use a macro loop to analyze each sample because that approach is not efficient.

proc corr data=MVN noprint out=CorrOut(where=(_TYPE_="CORR"));
   by ID;
   var x1-x3;

So there you have it. To generate k samples from the same distribution, you do not need to call a function k times. Instead, it is often more efficient to make a single call that computes all of the random values in a single call. (In SAS/IML, this assumes that all of the simulated data can fit in memory.) You can then use index subscript or a separate index variable to analyze each of the k samples.

This tip also applies to simulating univariate random data. Generate all the data, then analyze the data by using BY groups.


About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.


    • Rick Wicklin

      Good question. The SAS PROBBNRM function computes the cumulative probability for the bivariate normal (2 dimensions). I am not aware of useful techniques for solving 6-d integrals other than Monte Carlo simulation.

    • Rick Wicklin

      There is no closed form solution, and six-dimensional numerical quadrature over a semi-infinite domain is not something that I would attempt. My best suggestion is the Monte Carlo approach: simulate N observations from the MVN distribution and count how many fall into the region of interest.

  1. Dear Rick, I need your help, this model combine Monte Carlo simulation, OLS regression in each simulation and generation of multiple samples from the multivariate normal distribution.
    I need to estimate the model
    Where (Ut,Vt)~ iid N(0,cov[(Ut,Vt)', (Ut,Vt)]) -covariance matrix
    the generation of multiple samples is from the multivariate normal distribution, and it's a part in thebsimulation, I have in each simulation to use the new generate samples.
    and the steps are
    1. Take the seven estimates(B0, B1, A0, A1, the covariace matrix variables) as given from the real sample, but assuming B1 = 0 and adjust B0 such that the B0 used in simulation isBˆ0 +Bˆ1*mean(x),
    simulate the model for 240 quarters. Using the simulated data, re-estimate the model.
    2. Repeat this a large number of times, saving the estimated B1 from each.
    3.Produce a density plot, Pvalue(B1>0) of the estimated B1 coefficients
    Thank you!!

  2. Dear Dr. Rick,
    I need your help to know how can I write multivariate model consists of three variables one continuous normal and second ordered catecorical and third dichotomous. Second state I want to know how can I simulate multivarite normal dist. Model and multivarite ordered model and multivarite dichotomous model. Regards

  3. dear dr. rick
    i want to know can i find simulation of multivariate normal,bernoulli,discrete uniform by using sas/iml and simulation of contamination multivariate normal,bernoulli,discrete uniform by using sas/iml in your book if available i will buy it .

  4. dear rick
    i bought u book and i dont see how can i simulate multivariate linear model like regression analysis i want to simulate all these var. y x1 x2 x3.....x10 . all theses var. normal and (all bernoulli and all discrete uniform except y ).
    my second question do you have a software copy of u book because i spend more time to write any command.
    thanks alot

    • Rick Wicklin

      1) Ch 11-12 deals with simulating regression models. In particular, see section, 11.3.3, and 11.3.4.
      2) Section 1.7 describes how to download the programs in this book.

  5. dear rick
    when i conduct the command untill end in next page in in page 206 i saw so many errors and one of these errors for example after this eq. Weight = eta + eps; ERROR: (execution) Matrix has not been set to a value.
    and others
    ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant,
    a missing value, (, _ALL_, _CHAR_, _NUM_, {.
    ERROR 200-322: The symbol is not recognized and will be ignored.
    ERROR 76-322: Syntax error, statement will be ignored.


  6. Pingback: Fun with ODS Graphics: Eclipse animation (part 2) - Graphically Speaking

  7. I have run the following code using two different datasets called "masterfile1" and "masterfile2". Masterfil1 is a small test dataset with only a 3x3 covar matrix. Masterfile2 is about 4500x4500.
    Masterfile 1 works fine. Masterfile2 gets an "invalid subscript or subscript out of range error".

    proc iml ;
    use modiml.masterfile1;
    read all var _num_ where (_TYPE_="COV") into x;
    call streaminit(0);
    N= 5000;
    Mean = j(1,&maxcount,0);
    COV = x;

    x = RANDNORMAL( N, Mean, Cov );

    x = (x[1:N,]);
    create modimout.singlesim from x;
    append from x;
    close modimout.singlesim;

    • Rick Wicklin

      I think you should post programming questions to the SAS Support Community for SAS/IML questions. Please include the relevant section of the SAS log.

      Notice that the Mean vector needs to be the same length as the number of columns in the covariance matrix. You might want to get rid of the macro variable and use
      read all var _num_ where (_TYPE_="COV") into COV;
      Mean = j(1,ncol(COV),0);

Leave A Reply

Back to Top