In SAS, DATA step programmers use the IN operator to determine whether a value is contained in a set of target values. Did you know that there is a similar functionality in the SAS IML language? The ELEMENT function in the SAS IML language is similar to the IN operator in the DATA step, except that it is vectorized to enable you to determine which elements in a vector are contained in the set of target values. This article shows two applications of the ELEMENT function in SAS IML: assignment of observations to groups and validation of arguments to functions.
The IN operator in the SAS DATA step
You can use the IN operator to check whether the value of a variable has any of a specified set of values. The value on the left side of the IN operator is the query value; the list on the right side of the IN operator contains the target values. The IN operator returns 1 (True) is the query value matches any of the target values. Otherwise, it returns 0.
For example, suppose you want to examine the age of some students and assign students that have similar ages into categories. One way to perform that assignment is to use the IN operator inside an IF-THEN/ELSE statement, as follows:
data Status; set Sashelp.Class; /* assign Status based on the Age variable */ length Status $12; if Age in (10, 11, 12) then Status = "Pre-Teen"; else if Age in (13, 14) then Status = "Teenager"; else if Age in (15, 16, 17) then Status = "Pre-Adult"; else Status = "Adult"; run; proc print data=Status; var Name Age Status; run; |
The DATA step implicitly loops over each observation in the Sashelp.Class data set. For each observation, the program looks at the Age variable, which has values in the range [11,16]. It decides whether the query value is in the specified list of target values. Younger children are assigned to the "Pre-Teen" category. Older children are assigned to the "Teenager" or "Pre-Adult" categories. Notice a few interesting facts:
- The list of target values can include values that are never encountered in the data. For example, Age=10 is not in the data.
- The IN operator also works for character variables. For example, you can use it to find individuals who live in certain states: if State in ("CA", "TX", "FL") then ...
- The IN operator is usually used to determine membership for mutually exclusive groups (as above). However, you can omit the ELSE-IF clauses to process the same observation in several ways. For example, you can classify the same students into middle school or high school by using a separate set of IF-THEN statements and IN operators.
The ELEMENT function in IML
You can use the ELEMENT function in SAS IML to perform similar computations. The main difference is that the query value that you are testing can be a vector.
The ELEMENT function returns a binary 0/1 vector that is the same size as the vector of query values. For the i_th query value, the ELEMENT function returns a 1 if the value is in the vector of target values. Otherwise, it returns a 0. If necessary, you can use the LOC function to convert the binary vector into a set of indices that you can use to assign categories, as follows:
proc iml; use Sashelp.Class; read all var {"Name" "Age"}; close; /* assign Status based on the Age variable */ Status = j(nrow(Age), 1, BlankStr(12)); idx = loc( element(Age, {10, 11, 12})); if ncol(idx)>0 then /* did we find any ages in the target set? */ Status[idx] = "Pre-Teen"; idx = loc( element(Age, {13, 14})); if ncol(idx)>0 then Status[idx] = "Teenager"; idx = loc( element(Age, {15, 16, 17})); if ncol(idx)>0 then Status[idx] = "Pre-Adult"; print Name Age Status; |
The result is the same as the DATA step result. Notice a few facts:
- If no elements in the query vector are contained in the target vector, the ELEMENT function returns a vector that contains all zeros.
- The LOC statement returns an empty matrix when it encounters a matrix that contains only zeros. It is a good programming practice to "beware the naked loc," so always check that indices are non-empty before you make an assignment statement that uses them.
- Notice that the program does not use any DO loops for this computation. The computation is vectorized, which means that each statement operates on many elements of a vector. This is the key to efficiency in a matrix-vector language such as SAS IML, MATLAB, and R.
Use the ELEMENT function for argument validation
An interesting application of the ELEMENT function is to perform argument validation. Suppose that you want to create a SAS IML function that has two arguments and the following properties:
- The first argument must be a character or numeric matrix. That is, you want to prevent lists, tables, and empty symbols.
- The second argument can be skipped, but it must be character if it is specified.
- If specified, the valid values for the second argument are "SQR", "SQRT", and "LOG".
Here is a program that uses the ELEMENT function to perform these argument checks:
/* use the ELEMENT function for argument validation */ proc iml; /* In this function: x : must be numeric or character matrix str : must be character or empty (if skipped). Valid values are 'SQR', 'SQRT', and 'LOG'. */ start MyFunc(x, str=); Arg1Valid = element(type(x), {'N' 'C'}); /* numeric or character */ Arg2Valid = element(type(str), {'C' 'U'}); /* character or skipped */ if ^Arg1Valid | ^Arg2Valid then STOP "ERROR: Invalid argument to function"; if type(str)='C' then do; ValueValid = element(upcase(str), {'SQR' 'SQRT' 'LOG'}); if ^ValueValid then STOP "ERROR: The value of the STR argument is invalid"; end; /* implement the body of the function; this example simply returns x */ return x; finish; /* create test cases that have errors */ z1 = MyFunc(4, -1); List = [1:3, "A":"Z"]; z2 = MyFunc(List, "SQR"); z3 = MyFunc(4, "power"); |
I have called the function three times. Each call attempts to use invalid input arguments. Consequently, the log prints the following error messages:
ERROR: Invalid argument to function ERROR: Invalid argument to function ERROR: The value of the STR argument is invalid |
Summary
The SAS DATA step supports the IN operator, which determines whether a query value is in a set of target values. The ELEMENT function provides similar functionality in the SAS IML language. The ELEMENT function enables you to specify a vector of query values and it tells you which values are in a vector of target values. This article shows how to use the ELEMENT function to assig of observations to groups and to validate arguments to functions.
2 Comments
Rick,
ANY() also can do the same as ELEMENT() did, but need help from "OR operator" or "EXPANDGRID()" :
proc iml;
x= "A":"Z";
y={'0' 'S'};
test=expandgrid(x,y);
if any(test[,1]=test[,2]) then print 'At least one of Y values is in X';
else print 'All of Y values are not in X';
quit;
proc iml;
x= "A":"Z";
if any(x='0')|any(x='S') then print 'At least one of Y values is in X';
else print 'All of Y values are not in X';
quit;
Thanks for writing. It sounds like you are suggesting ANY for group membership, but I don't see how it can be used for identifying the elements.
For example, the following program finds the students who are 10, 11, or 12: