LOC: The most useful function you've never heard of

A frequently performed task in data analysis is identifying all the observations in a data set that satisfy certain conditions. For example, you might want to identify all of the female patients in your study or to identify all patients whose systolic blood pressure is greater than 140 mm Hg.

Novice programmers often use loops to find observations that satisfy a criterion. Don't do it! In SAS/IML software, there is almost never a good reason to loop over all observations. Instead, use the LOC function.

The following statements use the LOC function to identify the patients mentioned earlier:

f = loc(gender = "female");
highBP = loc(systolic > 140);

The LOC function is the most useful function in the SAS/IML language that DATA step programmers have never heard of. The LOC function finds the location of nonzero elements in a vector or matrix. Using the LOC function is much faster than writing a loop.

The LOC function returns a row vector that contains indices that satisfy the specified condition. You can use the indices to subset the data.

Forming Subsets

For example, suppose you have data for some famous witches and wizards in literature:

data MagicUsers;
infile datalines dsd;
length Name $11 Profession $7 Source $20;
input Name Profession Power Source;
datalines;
Morgana,    Witch,   7, Authurian Legend
Merlin,     Wizard, 10, Authurian Legend
Gryffindor, Wizard,  8, Harry Potter Books
Hufflepuff, Witch,   8, Harry Potter Books
Ravenclaw,  Witch,   8, Harry Potter Books
Slytherin,  Wizard,  8, Harry Potter Books
Glinda,     Witch,   5, Oz Books
Elphaba,    Witch,   6, Oz Books
Diggs,      Wizard,  1, Oz Books
;

The LOC function can help you determine which witch is which. The following statements determine which names correspond to witches:

proc iml;
use MagicUsers;
  read all var {Name Profession Power};
close MagicUsers;
 
ndx = loc(Profession="Witch"); /** find the indices for witches    **/
Names = Name[ndx];             /** subset the names of the witches **/

Similarly, if you want to compute the average power of these witches, you can form a subset of the Power variable and compute the mean power of the witches in the subset:

WitchPower = Power[ndx];       /** subset the power variable    **/
AvgPower = WitchPower[:];      /** compute the subset's average **/
print ndx[colname=Names], AvgPower;
tags: Efficiency, Statistical Programming

3 Comments

  1. Matt Fetter
    Posted February 26, 2013 at 6:42 pm | Permalink

    l use loc a lot. However , can the argument for loc be a matrix? Suppose I want to identify multiple rows in one pass to create a new matrix i.e., return new matrix={1 2 3, 4 5 6, 7 8 9} This problem is killing me because it seems that I end up having to do nested loops which requires multiple passes.

    • Posted February 27, 2013 at 8:59 am | Permalink

      Yes, the argument to LOC is a matrix. However, the return value is always a row vector (1 row, k columns), so you might need to use the SHAPE function to reshape the result. If you have a specific example in mind that is causing you problems, post it to the SAS/IML Support Community.

  2. Shyam
    Posted July 28, 2013 at 9:20 am | Permalink

    Thanks a lot for this post. It is very useful. Something new that I learnt today.

8 Trackbacks

  1. [...] straightforward approach is to loop over all variables. For each variable, use the LOC function to find the observations that are missing: /** Find observations that have a missing value for any [...]

  2. [...] DATA step, but the DO loop is completely unnecessary in PROC IML. It is more efficient to use the LOC function to assign LogY, as shown in the following [...]

  3. [...] run much faster than equivalent statements that involve many scalar quantities. For example, in a previous post, I asserted that the LOC function is much faster than writing a loop, for finding observations that [...]

  4. [...] For example, I think every SAS/IML programmer needs to know the following tip: Tip: Use the LOC function to identify observations that satisfy some criteria. A related technique shows you how to analyze [...]

  5. [...] corresponds to a level of the categorical variable. Notice how the LOC function (also known as "the most useful function you've never heard of") is used to identify the observations for each category; the output from the LOC function is used [...]

  6. By Beware the naked LOC - The DO Loop on November 19, 2012 at 5:21 am

    [...] LOC function is one of the most important functions in the SAS/IML language. The LOC function finds elements of a vector or matrix that satisfy some [...]

  7. [...] to print out the scores and inspect them. The following SAS/IML statements use the LOC function (the most useful function that you've never heard of!) to find all of the data for which the robust z-score exceeds 2.5, and prints only the outliers: [...]

  8. [...] The following statements convert the names of distributions to uppercase for easy comparison, and use the LOC function to extract the parameters for the rows that correspond to the gamma distribution: /** simpler and [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>