Excluding variables: Read all but one variable into a matrix

6
Dear Rick,
I have a data set with 1,001 numerical variables. One variable is the response, the others are explanatory variable. How can I read the 1,000 explanatory variables into an IML matrix without typing every name?

That's a good question. You need to be able to perform two sub-tasks:

  1. Create a character vector that contains the names of all the variables. (If the data set contains both numeric and character variables, the character vector should contain the names of all numeric variables.)
  2. Exclude one or more elements from a character vector.

Discover the names data set variables

Just as you can use PROC CONTENTS to discover the names of variables in a data set, SAS/IML has the CONTENTS function, which returns a character vector that contains the variable names. The argument to the CONTENTS function can be the name of a data set. If you have already opened a data set you can skip the argument to obtain the variable names of the open data set, as follows:

proc iml;
use Sashelp.Heart;                  /* open data set */
varNames = contents();              /* get all variable names */

However, most of the time (as above) we do not have a data set that has only numerical variables. To obtain a vector of only the numeric variables, read one observation of the data into a matrix, and use the COLNAME= option to obtain the variable names:

read next var _NUM_ into X[colname=varNames];    /* read only numeric vars */
print varNames;

To save space, only the first few columns of the output are displayed below:

t_excludereadvar1

Exclude elements from a character vector

After you create a vector that contains variable names, you can use the SETDIF function to exclude certain variable. The SETDIF function also sorts the list of variable names, which can be useful:

YVar = "Weight";                    /* variable to exclude from the matrix */
XVarNames = setdif(varNames, YVar); /* exclude Y, sort remaining X */

If you want to preserve the order of the variables, use the REMOVE function and specify the indices of the elements that you want to remove. The LOC function enables you to find the indices of the elements that you want to remove, as follows:

XVarNames = remove(varNames, loc(varNames=YVar));
print XVarNames;
t_excludereadvar2

Putting it all together

For the Sashelp.Heart data set, here is how to read the variable Weight into a vector, and read all other numeric variables into a matrix X:

proc iml;
YVar = "Weight";                    /* var to exclude from the matrix */
dsName = "Sashelp.Heart";
 
use (dsName);                       /* open data set */
read next var _NUM_ into X[colname=varNames];     /* read only numeric vars */
XVarNames = remove(varNames, loc(varNames=YVar)); /* exclude; preserve order */
 
read all var YVar into Y;           /* Y is vector */
read all var XVarNames into X;      /* X is matrix */
close;

You can use the LOC-ELEMENT trick to exclude multiple variables. For example, you can use the following statements to exclude two variables:

YVar = {"Weight" "Height"};
XVarNames = remove(varNames, loc( element(varNames,YVar) ));
Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

6 Comments

  1. One more way is DROP option of dataset.

    use Sashelp.Heart(drop=Weight); /* open data set */
    read all var _num_ into X; /* X is matrix */
    close;

  2. How can I read the 500 explanatory variables into an IML matrix without typing every name? In this case, using DROP option is not a good idea.

  3. Peter Lancashire on

    Another way to deal with lists of variable names is to read them from the dictionary tables or from sashelp.vcolumn. The "type" column contains "char" or "num". You could also identify dates from the format. This method is more orthogonal in that you can use the same tools as for any dataset, unlike options and statements.

Leave A Reply

Back to Top