The other day I was constructing covariance matrices for simulating data for a mixed model with repeated measurements. I was using the SAS/IML BLOCK function to build up the "R-side" covariance matrix from smaller blocks. The matrix I was constructing was block-diagonal and looked like this:

The matrix represents a covariance matrix for four individuals where each individual has three repeated measurements and where the measurements for each individual exhibit a compound symmetric covariance structure.

To construct the block-diagonal matrix, the first step is to construct the little 3x3 compound symmetric matrix, as follows:

proc iml; k=3; /* number of measurements per individual */ b = j(k,k,1) + 2*I(k); /* residual cov=1; common cov=2 */ print b[label="cov"];

To simulate data for multiple subjects, it is common to build **R**, a big block-diagonal matrix such that each block equals **b**. This matrix is shown at the top of this post.
I initially used the BLOCK function to constructs the covariance matrix, as follows:

s=4; /* number of individuals */ R = b; /* block diagonal: one kxk block for each indiv */ do i = 2 to s; R = block(R,b); end; print R;

That works, and the BLOCK function is good to know about, but I recently realized that there is a more efficient way to construct a block-diagonal matrix. Calling the BLOCK function in a DO loop is unnecessary! The SAS/IML language supports the direct product (Kronecker product) operator, @. This is not an operator that I use every day (or even every year!), but it turns out that this operator makes it super-easy to construct block-diagonal matrices. The following statement computes exactly the same block-diagonal matrix:

/* block-diagonal matrix with s blocks, each block equals b */ R = I(s) @ b;

I always think it is cool to discover new ways to use the SAS/IML language to avoid writing a DO loop. In any matrix-vector language (SAS/IML, R, MATLAB,...) that is a key to efficient performance. This trick to construct a block-diagonal matrix efficiently is the latest entry on my ever-growing list!

## 2 Comments

I recently had a similar situation where I used the Kronecker product to generate some data. Specifically, I needed to generate a series of bivariate normal datasets. Rather than specifying them individually, I used the Kroenecker product to create a large variance-covariance matrix with the block diagonals corresponding to my bivariate matrix. However, some of this efficiency has been lost because I have to use the BY statement in Proc Mixed when I analyze the datasets. Could you recommend a way to specify the repeated statement that would allow me to process the matrix in a single step(i.e. without the BY statement)? My original code looked like this:

proc mixed data=SIMDATA method=reml ;

class SubjectID Visit Trt Sim_ID;

by Sim_ID;

model Mu = Visit Trt Visit*Trt / s ;

repeated / type=ar(1) sub=zSub_StudyID ;

lsmeans Visit*Tx/diff=control('2' '1') ;

run;

I've tried changing type=Un@ar(1) but this doesn't quite give me what I'm looking for.

That's an interesting application, but if you have a single covariance matrix, there is no need to use a block structure. Use the RANDNORMAL function as shown in this example of how to generate samples from the multivariate normal distribution.

Maybe I don't understand your model, but in general there is nothing inefficient about using the BY statement (although no need to list Sim_ID on the CLASS stmt). It sounds like you are following the advice I frequently give on how to simulate data efficiently in SAS.

For details and many examples of using simulation for regression modeling, see Chapters 11-12 of

Simulating Data with SAS.