A frequently performed task in data analysis is identifying all the observations in a data set that satisfy certain conditions. For example, you might want to identify all of the female patients in your study or to identify all patients whose systolic blood pressure is greater than 140 mm Hg.

Novice programmers often use loops to find observations that satisfy a criterion. Don't do it! In SAS/IML software, there is almost never a good reason to loop over all observations. Instead, use the LOC function.

The following statements use the LOC function to identify the patients mentioned earlier:

f = loc(gender = "female"); highBP = loc(systolic > 140); |

The LOC function is the most useful function in the SAS/IML language that DATA step programmers have never heard of. The LOC function finds the location of nonzero elements in a vector or matrix. Using the LOC function is much faster than writing a loop.

The LOC function returns a row vector that contains indices that satisfy the specified condition. You can use the indices to subset the data.

### Forming Subsets

For example, suppose you have data for some famous witches and wizards in literature:

data MagicUsers; infile datalines dsd; length Name $11 Profession $7 Source $20; input Name Profession Power Source; datalines; Morgana, Witch, 7, Authurian Legend Merlin, Wizard, 10, Authurian Legend Gryffindor, Wizard, 8, Harry Potter Books Hufflepuff, Witch, 8, Harry Potter Books Ravenclaw, Witch, 8, Harry Potter Books Slytherin, Wizard, 8, Harry Potter Books Glinda, Witch, 5, Oz Books Elphaba, Witch, 6, Oz Books Diggs, Wizard, 1, Oz Books ; |

The LOC function can help you determine which witch is which. The following statements determine which names correspond to witches:

proc iml; use MagicUsers; read all var {Name Profession Power}; close MagicUsers; ndx = loc(Profession="Witch"); /** find the indices for witches **/ Names = Name[ndx]; /** subset the names of the witches **/ |

Similarly, if you want to compute the average power of these witches, you can form a subset of the `Power` variable and compute the mean power of the witches in the subset:

WitchPower = Power[ndx]; /** subset the power variable **/ AvgPower = WitchPower[:]; /** compute the subset's average **/ print ndx[colname=Names], AvgPower; |

## 5 Comments

l use loc a lot. However , can the argument for loc be a matrix? Suppose I want to identify multiple rows in one pass to create a new matrix i.e., return new matrix={1 2 3, 4 5 6, 7 8 9} This problem is killing me because it seems that I end up having to do nested loops which requires multiple passes.

Yes, the argument to LOC is a matrix. However, the return value is always a row vector (1 row, k columns), so you might need to use the SHAPE function to reshape the result. If you have a specific example in mind that is causing you problems, post it to the SAS/IML Support Community.

Thanks a lot for this post. It is very useful. Something new that I learnt today.

I am wondering whether you can create a condition that selects witches or wizards. Obviously, in the example above, it will select everything. But if we also had mages and sorcerers than the condition will subset on more than one profession.

Yes. For your simple example, use the "OR" operator (|):

ndx = loc(Profession="Witch" | Profession="Wizard");

For more complex queries, use the LOC-ELEMENT technique.

## 9 Trackbacks

[...] straightforward approach is to loop over all variables. For each variable, use the LOC function to find the observations that are missing: /** Find observations that have a missing value for any [...]

[...] DATA step, but the DO loop is completely unnecessary in PROC IML. It is more efficient to use the LOC function to assign LogY, as shown in the following [...]

[...] run much faster than equivalent statements that involve many scalar quantities. For example, in a previous post, I asserted that the LOC function is much faster than writing a loop, for finding observations that [...]

[...] For example, I think every SAS/IML programmer needs to know the following tip: Tip: Use the LOC function to identify observations that satisfy some criteria. A related technique shows you how to analyze [...]

[...] corresponds to a level of the categorical variable. Notice how the LOC function (also known as "the most useful function you've never heard of") is used to identify the observations for each category; the output from the LOC function is used [...]

[...] LOC function is one of the most important functions in the SAS/IML language. The LOC function finds elements of a vector or matrix that satisfy some [...]

[...] to print out the scores and inspect them. The following SAS/IML statements use the LOC function (the most useful function that you've never heard of!) to find all of the data for which the robust z-score exceeds 2.5, and prints only the outliers: [...]

[...] The following statements convert the names of distributions to uppercase for easy comparison, and use the LOC function to extract the parameters for the rows that correspond to the gamma distribution: /** simpler and [...]

[…] have called the LOC function the most useful function that most DATA step programmers have never heard of. Despite its relative obscurity, it is essential that SAS/IML programmers master the LOC function. […]