Compute maximum and minimum values for rows and columns in SAS

11

A common question on SAS discussion forums is how to compute the minimum and maximum values across several variables. It is easy to compute statistics across rows by using the DATA step. This article shows how to compute the minimum and maximum values for each observation (across variables) and, for completeness, for each variable. If you think of your numerical data as being in a matrix, the task is to compute the minimum and maximum values for each row or for each column. You can do this in Base SAS or by using a powerful subscript operator in the SAS/IML matrix language.

The data in this article are Fisher's famous iris data, which contains the widths and lengths of the petals and sepals of 150 iris flowers. This data is distributed in the Sashelp.Iris data set as part of SAS software.

The minimum and maximum values of variables: Base SAS

The MEANS procedure is the simplest way to compute the minimum and maximum values for each numeric variable in a data set. The following statements display the extreme values for the four numerical variables in the iris data:

proc means nolabels data=Sashelp.Iris Min Max;
   output out=MinMaxCols;
run;
t_minmax1

By not specifying the VAR statement, all numeric variables are used. (You could also specify var _numeric_;) The OUTPUT statement writes the minimum an maximum values to the MinMaxCols data set, along with a few other useful statistics.

The minimum and maximum values of observations: Base SAS

The SAS DATA step contains the MIN and MAX functions, which return the minimum and maximum nonmissing values (respectively) from a list of variables. You can read all of the numerical variables in a data set into an array and call the MIN and MAX functions as follows:

data MinMaxRows;
   set sashelp.Iris;
   array x {*} _numeric_;    /* x[1] is 1st var,...,x[4] is 4th var */
   min = min(of x[*]);       /* min value for this observation */
   max = max(of x[*]);       /* max value for this observation */
run;
proc print data=MinMaxRows(obs=7);
   var _numeric_;
run;
t_minmax2

You can see that the MIN variable contain the minimum value of each row and the MAX variable contains the maximum value. Notice that you can use the _NUMERIC_ keyword to automatically assign the contents of the array x. This DATA step will work for any input data that contains numeric variables because the variable names are not hard-coded!

Also note that you can use the OF operator (sometimes called the OF keyword) to specify that the MIN and MAX functions should operate across all elements in the array.

The minimum and maximum values of columns in a matrix

The SAS/IML language contains a number of operators (called subscript reduction operators) that you can use to perform simples statistical operations down matrix columns or across matrix rows. This makes it easy to compute the maximum value of each row or column, and similarly for the minimum value.

The important operators are the max subscript operator (<>) and the min subscript operator (><). You use these operators like subscripts. To find extrema for columns, use these operators in place of row subscripts, as follows:

proc iml;
use Sashelp.Iris;
read all var _NUM_ into X[c=varNames];
close Sashelp.Iris;
 
minC = X[><, ];    /* row vector contains min of columns */
maxC = X[<>, ];    /* row vector contains max of columns */
print (minC//maxC)[r={"Min" "Max"} c=varNames];
t_minmax3

For years I struggled to remember which combination of greater-than and less-than symbols was the min operator and which was the max operator. Eventually I developed the following mental image. Think of the minimum as being the bottom of a valley. If you are viewing a valley from the far side of a lake, the image looks like a greater-than sign placed next to a less-than sign. Similarly, think of a maximum as being the peak of a mountain. If you are viewing the mountain from across a lake, the image looks like a less-than sign placed next to a great-than sign. These mnemonic aids are shown in the following image:

The minimum and maximum values of rows in a matrix

In a similar way, you can compute the minimum and maximum values of each row of a matrix, as follows:

minR = X[ ,><];    /* column vector contains min of rows */
maxR = X[ ,<>];    /* column vector contains max of rows */

The MinR and MaxR vectors each contain 150 elements. The value MinR[i]is the minimum value of the ith row and the value MaxR[i]is the maximum value of the ith row.

One of the nice aspects of the SAS/IML matrix language is its symmetry: operations on rows and operations on columns are often closely related to each other. This is in contrast to Base SAS, where the DATA step is often the best choice for computing statistics for observations, whereas procedures are often easier to use for computing statistics for variables.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

11 Comments

  1. Pingback: Elementwise minimum and maximum operators - The DO Loop

  2. Pingback: Compute nearest neighbors in SAS - The DO Loop

  3. Pingback: Random segments and broken sticks - The DO Loop

  4. Abhinav Arora on

    What if there are two columns having same values and both are either maximum or minimum? Which columns does it take then?

  5. hi i am looking for a data step (no Proc step) to find the minimum and maximum value of the age variable in only two observation Min and Max.

    name age height weight Sex
    vikas 12 140 55 M
    sachin 15 155 48 M
    Chotu 18 145 57 M
    Karuna 26 142 46 F
    Sweety 23 144 52 F
    shobhit 32 158 28 M

  6. Hi,
    Please advise.
    I am doing certain data manipulations in SAS Arrays but I am getting missing results for z_xyz_m[i+1] arrays. This is happening inside a loop for i=1to36; Can we use Min/Max/Log functions like below?

    z_xyz_m[i+1] = min(max(xyz_m[i+1],0.3),1.3);
    z_appr_m[i+1] = log(min(max(appr_m[i+1],0.3),1.5));

    Any feedback will be helpful.

  7. For the sake of comprehensiveness, here is a more complete list of supported data step syntaxes, which don't all yield the same result.
    data t;
    A=-1;
    B=. ;
    C1= A min B;
    C2= A max B;
    D1= A >< B;
    D2= A <> B;
    E1=smallest(1,A,B);
    E2=largest (1,A,B);
    F1=min(1,A,B);
    F2=max(1,A,B);
    run;

    • Rick Wicklin

      The last two statements should be MIN(A,B) and MAX(A,B). The discrepancies you observe are because some operators and functions exclude missing values whereas others include missing values and treat them as the "most negative" floating point value.

Leave A Reply

Back to Top