I often use the SAS/IML language for simulating data with certain known properties. In fact, I'm writing a book called Simulating Data with SAS. When I simulate repeated measurements (sometimes called replicated data), I often want to generate an ID variable that identifies which measurement is associated with which subject in a simulated study.
Depending on the analysis that you are conducting, there are two ways to structure the values in the ID variable: sorted by subject or sorted by "time."
To be specific, suppose that you have four patients in a study. Some measurement (for example, their weight) is taken every week for three weeks. You can order the data according to either of the following columns:
In the preceding table, the first column would be appropriate for data that are sorted by patient. The second column would be used for data that are sorted by week.
Create ID vectors in SAS/IML software
One way to create ID vectors in SAS/IML software is to use the REPEAT function. The REPEAT function creates a matrix from an input vector by repeating the vector a specified number of times horizontally and vertically. For example, the expression T(1:N) is a column vector with N elements. You can create a matrix with N rows and k columns as follows:
proc iml; N=4; k=3; r = repeat(T(1:N),1,k); print r; |
Because PROC IML stores matrices in row-major order, you can call the COLVEC function to create a column vector that contains the ID values sorted by subject, as shown in the first column of Figure 1. The following SAS/IML function encapsulates this idea:
start ReplID(N, numRepl); return( colvec(repeat(T(1:N),1,numRepl)) ); finish; Subject = ReplID(4, 3); |
In a similar way, if you do NOT transpose the expression 1:N, then the REPEAT function will repeat the values 1,2,3,...,1,2,3,.... Once again, you can use the COLVEC function to convert that sequence of values into a column vector, as shown in the second column of Figure 1. The following SAS/IML function encapsulates this idea:
start ReplIDBlock(N, numRepl); return( colvec(repeat(1:N,1,numRepl)) ); finish; Time = ReplIDBlock(4, 3); |
I find that I use the first approach more often than the second.