Create an ID vector for repeated measurements

0

I often use the SAS/IML language for simulating data with certain known properties. In fact, I'm writing a book called Simulating Data with SAS. When I simulate repeated measurements (sometimes called replicated data), I often want to generate an ID variable that identifies which measurement is associated with which subject in a simulated study.

Depending on the analysis that you are conducting, there are two ways to structure the values in the ID variable: sorted by subject or sorted by "time."

To be specific, suppose that you have four patients in a study. Some measurement (for example, their weight) is taken every week for three weeks. You can order the data according to either of the following columns:

Figure 1: Two Ways to Encode ID Variables

In the preceding table, the first column would be appropriate for data that are sorted by patient. The second column would be used for data that are sorted by week.

Create ID vectors in SAS/IML software

One way to create ID vectors in SAS/IML software is to use the REPEAT function. The REPEAT function creates a matrix from an input vector by repeating the vector a specified number of times horizontally and vertically. For example, the expression T(1:N) is a column vector with N elements. You can create a matrix with N rows and k columns as follows:

proc iml;
N=4;
k=3;
r = repeat(T(1:N),1,k);
print r;

Because PROC IML stores matrices in row-major order, you can call the COLVEC function to create a column vector that contains the ID values sorted by subject, as shown in the first column of Figure 1. The following SAS/IML function encapsulates this idea:

start ReplID(N, numRepl);
  return( colvec(repeat(T(1:N),1,numRepl)) );
finish;
 
Subject = ReplID(4, 3);

In a similar way, if you do NOT transpose the expression 1:N, then the REPEAT function will repeat the values 1,2,3,...,1,2,3,.... Once again, you can use the COLVEC function to convert that sequence of values into a column vector, as shown in the second column of Figure 1. The following SAS/IML function encapsulates this idea:

start ReplIDBlock(N, numRepl);
  return( colvec(repeat(1:N,1,numRepl)) );
finish;
 
Time = ReplIDBlock(4, 3);

I find that I use the first approach more often than the second.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top