LOC: The most useful function you've never heard of


A frequently performed task in data analysis is identifying all the observations in a data set that satisfy certain conditions. For example, you might want to identify all of the female patients in your study or to identify all patients whose systolic blood pressure is greater than 140 mm Hg.

Novice programmers often use loops to find observations that satisfy a criterion. Don't do it! In SAS/IML software, there is almost never a good reason to loop over all observations. Instead, use the LOC function.

The following statements use the LOC function to identify the patients mentioned earlier:

f = loc(gender = "female");
highBP = loc(systolic > 140);

The LOC function is the most useful function in the SAS/IML language that DATA step programmers have never heard of. The LOC function finds the location of nonzero elements in a vector or matrix. Using the LOC function is much faster than writing a loop.

The LOC function returns a row vector that contains indices that satisfy the specified condition. You can use the indices to subset the data.

Forming Subsets

For example, suppose you have data for some famous witches and wizards in literature:

data MagicUsers;
infile datalines dsd;
length Name $11 Profession $7 Source $20;
input Name Profession Power Source;
Morgana,    Witch,   7, Authurian Legend
Merlin,     Wizard, 10, Authurian Legend
Gryffindor, Wizard,  8, Harry Potter Books
Hufflepuff, Witch,   8, Harry Potter Books
Ravenclaw,  Witch,   8, Harry Potter Books
Slytherin,  Wizard,  8, Harry Potter Books
Glinda,     Witch,   5, Oz Books
Elphaba,    Witch,   6, Oz Books
Diggs,      Wizard,  1, Oz Books

The LOC function can help you determine which witch is which. The following statements determine which names correspond to witches:

proc iml;
use MagicUsers;
  read all var {Name Profession Power};
close MagicUsers;
ndx = loc(Profession="Witch"); /** find the indices for witches    **/
Names = Name[ndx];             /** subset the names of the witches **/

Similarly, if you want to compute the average power of these witches, you can form a subset of the Power variable and compute the mean power of the witches in the subset:

WitchPower = Power[ndx];       /** subset the power variable    **/
AvgPower = WitchPower[:];      /** compute the subset's average **/
print ndx[colname=Names], AvgPower;

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.


  1. Pingback: Removing Observations with Missing Values - The DO Loop

  2. Pingback: Log transformations: How to handle negative data values? - The DO Loop

  3. Pingback: Timing Performance: Looping versus LOC-ing - The DO Loop

  4. Pingback: Tips and Techniques – What’s the Difference? - The DO Loop

  5. Pingback: Reshape data so that each category becomes a new variable - The DO Loop

  6. Pingback: Beware the naked LOC - The DO Loop

  7. l use loc a lot. However , can the argument for loc be a matrix? Suppose I want to identify multiple rows in one pass to create a new matrix i.e., return new matrix={1 2 3, 4 5 6, 7 8 9} This problem is killing me because it seems that I end up having to do nested loops which requires multiple passes.

  8. Pingback: Detecting outliers in SAS: Part 2: Estimating scale - The DO Loop

  9. Pingback: Need case-insensitive string comparisons? UPCASE! - The DO Loop

  10. Pingback: Finding data that satisfy a criterion

  11. I am wondering whether you can create a condition that selects witches or wizards. Obviously, in the example above, it will select everything. But if we also had mages and sorcerers than the condition will subset on more than one profession.

Leave A Reply

Back to Top