How do I use SAS to generate multiple samples of size N from a multivariate normal distribution?

Suppose that you want to simulate k samples (each with N observations) from a multivariate normal distribution with a given mean vector and covariance matrix. Because all of the samples are drawn from the same distribution, one way to generate k samples is to generate a single "mega sample" with kN observations, and then use an index variable to indicate that the first N observations belong to the first sample, the next N observations belong to the second sample, and so forth. This article implements that technique.

I have previously shown how to use the RANDNORMAL function in SAS/IML to simulate multivariate normal data. Now suppose that you want to generate 10 samples, where each sample contains five observations from a trivariate normal distribution. You can generate 5 x 10 = 50 observations as follows:

```proc iml; N = 5; /* size of each sample */ NumSamples = 10; /* number of samples */   /* specify population mean and covariance */ Mean = {1, 2, 3}; Cov = {3 2 1, 2 4 0, 1 0 5}; call randseed(4321); X = RandNormal(N*NumSamples, Mean, Cov);```

Remark: If you intend to implement the rest of your algorithm in the SAS/IML language, you probably do not need to create an index variable. For example, if you want to compute correlation matrices for each sample, you can subset the samples as follows:

```/* optional: do you want to analyze the data in SAS/IML? */ Obs1 = 1; /* beginning of i_th sample */ do i = 1 to NumSamples; Obs2 = Obs1 + N -1; /* end of i_th sample */ Y = X[Obs1:Obs2, ]; /* Y = i_th sample */ Obs1 = Obs2 + 1; /* prepare for next sample */ c = corr(Y); /* do something with i_th sample */ end;```

However, from other things that the SAS customer said, I think she wants the data in a SAS data set so that she can run a SAS procedure on the data. Therefore, construct an ID variable as follows:

`ID = colvec(repeat(T(1:NumSamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */`

The nested syntax looks complicated, but each function call is simple. The T function creates a column vector with elements {1,2,...,9,10}. The REPEAT function copies the column vector to create a matrix with five columns. The COLVEC function stacks the resulting elements in row-major order. Thus the first five elements of the ID vector are 1, the next five are 2, and so forth, for a total of 50 elements. You can write the data to a SAS data set as follows:

```Z = ID || X; create MVN from Z[c={"ID" "x1" "x2" "x3"}]; append from Z; close MVN; quit;```

For completeness, the following call to the CORR procedure computes 10 correlation matrices, one for each value of the ID variable. Remember: use the BY statement to carry out an analysis on each sample. You do NOT want to break the data into 10 data sets and use a macro loop to analyze each sample because that approach is not efficient.

```proc corr data=MVN noprint out=CorrOut(where=(_TYPE_="CORR")); by ID; var x1-x3; run;```

So there you have it. To generate k samples from the same distribution, you do not need to call a function k times. Instead, it is often more efficient to make a single call that computes all of the random values in a single call. (In SAS/IML, this assumes that all of the simulated data can fit in memory.) You can then use index subscript or a separate index variable to analyze each of the k samples.

This tip also applies to simulating univariate random data. Generate all the data, then analyze the data by using BY groups.

Share Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

1. How do I calculate the cumulative probability of a multivariate normal distribution (6 dimensions)? Thank you!

• Good question. The SAS PROBBNRM function computes the cumulative probability for the bivariate normal (2 dimensions). I am not aware of useful techniques for solving 6-d integrals other than Monte Carlo simulation.

• There is no closed form solution, and six-dimensional numerical quadrature over a semi-infinite domain is not something that I would attempt. My best suggestion is the Monte Carlo approach: simulate N observations from the MVN distribution and count how many fall into the region of interest.

2. Dear Rick, I need your help, this model combine Monte Carlo simulation, OLS regression in each simulation and generation of multiple samples from the multivariate normal distribution.
I need to estimate the model
Rt=B0+B1X(t-1)+Ut
Xt=A0+A1X(t-1)+Vt
Where (Ut,Vt)~ iid N(0,cov[(Ut,Vt)', (Ut,Vt)]) -covariance matrix
the generation of multiple samples is from the multivariate normal distribution, and it's a part in thebsimulation, I have in each simulation to use the new generate samples.
and the steps are
1. Take the seven estimates(B0, B1, A0, A1, the covariace matrix variables) as given from the real sample, but assuming B1 = 0 and adjust B0 such that the B0 used in simulation isBˆ0 +Bˆ1*mean(x),
simulate the model for 240 quarters. Using the simulated data, re-estimate the model.
2. Repeat this a large number of times, saving the estimated B1 from each.
3.Produce a density plot, Pvalue(B1>0) of the estimated B1 coefficients
Thank you!!
Orit

• This question is too complicated to answer here. In my book Simulating Data with SAS, I show how to simulate data from regression models (Chap 11-12), including time series models (Chap 13). To generate these data, I suggest using the VARMASIM function in SAS/IML software.

3. Dear Dr. Rick,
I need your help to know how can I write multivariate model consists of three variables one continuous normal and second ordered catecorical and third dichotomous. Second state I want to know how can I simulate multivarite normal dist. Model and multivarite ordered model and multivarite dichotomous model. Regards

• You can ask SAS modeling questions at teh SAS Support Community for statistical procedures. (The GLM and GENMOD procedures are commonly used.) The first link in this article describes how to simulate from a multivariate normal distribution. My book Simulating Data with SAS describes how to simulate data from multivariate ordinal and multivariate dichotomous distributions.

4. dear dr. rick
i want to know can i find simulation of multivariate normal,bernoulli,discrete uniform by using sas/iml and simulation of contamination multivariate normal,bernoulli,discrete uniform by using sas/iml in your book if available i will buy it .
regards

• It includes MVN, Bernoulli, discreete uniform, and contaminated MVN. You can use the same technique to contaminate other distributions, as shown in Ch. 7.

• is every thing available by using sas/iml ???

• Yes, you can use SAS/IML to simulate all of those distributions. The RANDGEN function in SAS/IML supports all of the distributions that RAND supports, plus SAS/IML provides multivariate distributions.

5. dear rick
i bought u book and i dont see how can i simulate multivariate linear model like regression analysis i want to simulate all these var. y x1 x2 x3.....x10 . all theses var. normal and (all bernoulli and all discrete uniform except y ).
my second question do you have a software copy of u book because i spend more time to write any command.
thanks alot

• 1) Ch 11-12 deals with simulating regression models. In particular, see section 11.3.2.3, 11.3.3, and 11.3.4.
2) Section 1.7 describes how to download the programs in this book.

6. thank you so much i find it

7. dear rick
when i conduct the command untill end in next page in 11.3.2.3 in page 206 i saw so many errors and one of these errors for example after this eq. Weight = eta + eps; ERROR: (execution) Matrix has not been set to a value.
and others
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant,
a missing value, (, _ALL_, _CHAR_, _NUM_, {.
ERROR 200-322: The symbol is not recognized and will be ignored.
and
ERROR 76-322: Syntax error, statement will be ignored.

regards

• The code in the book does not contain any errors. If you are experiencing problems in SAS/IML, post your code and question to the SAS/IML Support Community or contact SAS Technical Support.

8. dear rick
i fond the all codes for sas book
thank you so much

9. I have run the following code using two different datasets called "masterfile1" and "masterfile2". Masterfil1 is a small test dataset with only a 3x3 covar matrix. Masterfile2 is about 4500x4500.
Masterfile 1 works fine. Masterfile2 gets an "invalid subscript or subscript out of range error".
Thoughts?

proc iml ;
use modiml.masterfile1;
read all var _num_ where (_TYPE_="COV") into x;
call streaminit(0);
N= 5000;
Mean = j(1,&maxcount,0);
COV = x;

x = RANDNORMAL( N, Mean, Cov );

x = (x[1:N,]);
create modimout.singlesim from x;
append from x;
close modimout.singlesim;
quit;
run;

• I think you should post programming questions to the SAS Support Community for SAS/IML questions. Please include the relevant section of the SAS log.

Notice that the Mean vector needs to be the same length as the number of columns in the covariance matrix. You might want to get rid of the macro variable and use
read all var _num_ where (_TYPE_="COV") into COV;
Mean = j(1,ncol(COV),0);