Compute the kth smallest data value in SAS


A SAS programmer recently posted a question to the SAS/IML Support Community about how to compute the kth smallest value in a vector of numbers. In the DATA step, you can use the SMALLEST function to find the smallest value in an array, but there is no equivalent built-in function in the SAS/IML language. However, it is easy to write a short module that performs the computation.

If the data include missing values, there is an ambiguity in the question. Do you want the kth smallest value after omitting the missing values? Or do you want the kth smallest value in the ordered list where missing values appear before nonmissing values? This article considers both cases.

The smallest value in an array

The DATA step has built-in functions for both cases. The SMALLEST function returns the kth smallest nonmissing value. The ORDINAL function sorts the missing and nonmissing values and returns the kth element in the ordered list. These functions act on elements of a DATA step array. Consequently, if you want to find the kth element in a column of a data set, you need to transpose the data by using PROC TRANSPOSE.

The following example creates a data set with 10 observations, two of which are missing. The data set is transposed and the data are read into an array. The small3 variable contains the third smallest nonmissing value, ignoring the missing values. The ord3 variable contains the third smallest ordinal value, where missing values are ordered before nonmissing values. The small7 and ord7 variables contain the seventh smallest values, excluding and including the missing values, respectively.

/* Base SAS: use SMALLEST or ORDINAL with array to compute k_th smallest value */
data A;
input x @@;
. -5 -2 0 2 5 2 2 4 .
proc transpose data=A out=B;
data C;
   set B;
   array x[*] _NUMERIC_;
   ord3   = ordinal(3, of x[*]);      /* include missing values */
   ord7   = ordinal(7, of x[*]);
   small3 = smallest(3, of x[*]);     /* ignore missing values */
   small7 = smallest(7, of x[*]);
proc print noobs data=C;
   var ord3 ord7 small3 small7;

The smallest value in a vector

The SAS/IML language does not have built-in functions that are equivalent to the SMALLEST and ORDINAL functions. However, you can easily sort the data and extract the kth smallest value. The following statements define a function named ORDINAL that takes a SAS/IML vector as an argument:

proc iml;
start ordinal(k, _x);
   x = colvec(_x);       /* make sure it is a column vector */
   call sort(x);
   if k<1 | k>nrow(x) then
      return (.);        /* return missing if index out of range */
   return( x[k] );       /* value of the k_th ordered element */
/* test the module by using the example data */
use A;  read all var {"x"};  close A;
ord1 = ordinal(1, x);
ord3 = ordinal(3, x);
ord7 = ordinal(7, x);
print ord1 ord3 ord7;

You can see that the ORDINAL module returns the kth value of the data, including missing values. In fact, the argument k can be a vector. The statement ord = ordinal({1 3 7}, x) returns a vector that contains the 1st, 3rd, and 7th ordered elements.

If you want to exclude missing values, you can use the LOC function to locate nonmissing values and use the subscript operator to extract them. You can then call the ORDINAL module on the vector of nonmissing values, as shown below:

start smallest(k, _x);
   idx = loc(_x^=.);               /* find nonmissing values */
   if IsEmpty(idx) then return(.); /* all values are missing */
   x = _x[idx];                    /* extract nonmissing */
   return ( ordinal(k, x) );       /* call the previous module */
small1 = smallest(1, x);
small3 = smallest(3, x);
small7 = smallest(7, x);
print small1 small3 small7;

I leave it as an exercise to the reader to modify the definition of the ORDINAL module so that the argument k can be a vector. In other words, modify the function so that the statement s = smallest({1 3 7}, x) returns a vector with three elements.

If x is a matrix and you want the smallest elements of each column, just call the modules in a DO loop.

Do you ever use the SMALLEST or ORDINAL functions in the DATA step? For what purpose?


About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top