Specify the order of variables at run time in SAS


In SAS, the order of variables in a data set is usually unimportant. However, occasionally SAS programmers need to reorder the variables in order to make a special graph or to simplify a computation.

Reordering variables in the DATA step is slightly tricky. There are Knowledge Base articles about how to do it. There are papers written about how to do it. There is a page at sasCommunity.org dedicated to how to do it. Reordering variables when the order is not known until run time is even trickier. A typical solution is to construct a macro variable that contains the desired order and then use the LENGTH or RETAIN statement in a separate DATA step to set the order.

In contrast, when you read a SAS data set into a SAS/IML matrix, you can easily specify the order of the variables.

Here's a simple example. The following SAS/IML statements read numerical data from the Sashelp.Class data set into matrices. The columns of matrix X correspond to the variables Age, Height, and Weight (in that order), whereas the columns of matrix Y correspond to the variables Weight, Height, and Age:

proc iml;
use Sashelp.Class;
read all var {Age Height Weight} into X;
read all var {Weight Height Age} into Y;

If there are k numerical variables in a data set, there are k! different matrices that you can create. Each matrix has the same elements, but the columns are permuted. (Programming Challenge: Write a SAS/IML program that creates all k! matrices.)

In the previous example, the order of the variables were hard-coded by the programmer. But it is just as easy to set the order of the variables at run time. And no macro variable is required! You merely create a character vector that contains the variables names in the order that you want.

Here's an example that is similar to a recent question that was posted on the SAS Support Communities. The data set named VECTOR contains observations in that are not in alphabetical order. (Perhaps the data set was sorted by the X variable.) Another data set named MATRIX contains a matrix where the variables are in alphabetical order.

data vector; 
input ID $ x;
D 1
C 2
B 3
E 4
A 5
data matrix;
input A B C D E;
1.0 0.8 0.6 0.4 0.2
0.8 1.0 0.8 0.6 0.4
0.6 0.8 1.0 0.8 0.6
0.4 0.6 0.8 1.0 0.8
0.2 0.4 0.6 0.8 1.0

The programmer who posed this question wanted to ensure that the observations in the X variable and the variables in the matrix were in the same order so that he could correctly multiply the matrix and vector.

There are two ways to solve the problem: either reorder the observations of the vector or reorder the columns of the matrix. The latter option is the easiest. When you read the VECTOR data set, read the ID variable that specifies the ordering for the vector. Then use that character vector to specify the order of the columns of the matrix, as follows:

proc iml;
use vector;
  read all var {ID x};     /* ID contains the order */
close vector;
use matrix;   
  read all var ID into M;  /* read variables in the same order */
close matrix;
y = M*x;                   /* do the matrix multiplication */
print y[rowname=ID];

The vector y contains the product of M*x. The observations in y are in the same order as the observations in x.

The key takeaway is that you can easily read the variables in a data set into a SAS/IML matrix in any order. In particular, you can specify the order at run time based on the values of some other data set or computation.


About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top