Dear Rick,
I have a data set with 1,001 numerical variables. One variable is the response, the others are explanatory variable. How can I read the 1,000 explanatory variables into an IML matrix without typing every name?
That's a good question. You need to be able to perform two sub-tasks:
- Create a character vector that contains the names of all the variables. (If the data set contains both numeric and character variables, the character vector should contain the names of all numeric variables.)
- Exclude one or more elements from a character vector.
Discover the names data set variables
Just as you can use PROC CONTENTS to discover the names of variables in a data set, SAS/IML has the CONTENTS function, which returns a character vector that contains the variable names. The argument to the CONTENTS function can be the name of a data set. If you have already opened a data set you can skip the argument to obtain the variable names of the open data set, as follows:
proc iml; use Sashelp.Heart; /* open data set */ varNames = contents(); /* get all variable names */ |
However, most of the time (as above) we do not have a data set that has only numerical variables. To obtain a vector of only the numeric variables, read one observation of the data into a matrix, and use the COLNAME= option to obtain the variable names:
read next var _NUM_ into X[colname=varNames]; /* read only numeric vars */ print varNames; |
To save space, only the first few columns of the output are displayed below:
Exclude elements from a character vector
After you create a vector that contains variable names, you can use the SETDIF function to exclude certain variable. The SETDIF function also sorts the list of variable names, which can be useful:
YVar = "Weight"; /* variable to exclude from the matrix */ XVarNames = setdif(varNames, YVar); /* exclude Y, sort remaining X */ |
If you want to preserve the order of the variables, use the REMOVE function and specify the indices of the elements that you want to remove. The LOC function enables you to find the indices of the elements that you want to remove, as follows:
XVarNames = remove(varNames, loc(varNames=YVar)); print XVarNames; |
Putting it all together
For the Sashelp.Heart data set, here is how to read the variable Weight into a vector, and read all other numeric variables into a matrix X:
proc iml; YVar = "Weight"; /* var to exclude from the matrix */ dsName = "Sashelp.Heart"; use (dsName); /* open data set */ read next var _NUM_ into X[colname=varNames]; /* read only numeric vars */ XVarNames = remove(varNames, loc(varNames=YVar)); /* exclude; preserve order */ read all var YVar into Y; /* Y is vector */ read all var XVarNames into X; /* X is matrix */ close; |
You can use the LOC-ELEMENT trick to exclude multiple variables. For example, you can use the following statements to exclude two variables:
YVar = {"Weight" "Height"}; XVarNames = remove(varNames, loc( element(varNames,YVar) )); |
6 Comments
One more way is DROP option of dataset.
use Sashelp.Heart(drop=Weight); /* open data set */
read all var _num_ into X; /* X is matrix */
close;
Yes, if you don't want to read the variable at all, you can use DROP. However, I assumed that the user wanted to read the variable into Y, but exclude it from X.
Rick,
I am a little confused. Can we just add one more USE statement ?
use Sashelp.Heart;
read all var {weight} into Y;
close;
Yes, you can read the data twice if you need to. That is a valid solution.
How can I read the 500 explanatory variables into an IML matrix without typing every name? In this case, using DROP option is not a good idea.
Another way to deal with lists of variable names is to read them from the dictionary tables or from sashelp.vcolumn. The "type" column contains "char" or "num". You could also identify dates from the format. This method is more orthogonal in that you can use the same tools as for any dataset, unlike options and statements.