It's the start of a new year. Have you made a resolution to be a better data analyst? A better SAS statistical programmer? To learn more about multivariate statistics? What better way to start the New Year than to read (or re-read!) the top 12 articles for statistical programmers from

The LOC function is one of the most important functions in the SAS/IML language. The LOC function finds elements of a vector or matrix that satisfy some condition. For example, if you are going to apply a logarithmic transform to data, you can use the LOC function to find all

I was recently asked, "Does SAS support computing inverse hyperbolic trigonometric functions?" I was pretty sure that I had used the inverse hyperbolic trig functions in SAS, so I was surprised when I read the next sentence: "I ask because I saw a Usage Note that says these functions are

What's in a name? As Shakespeare's Juliet said, "That which we call a rose / By any other name would smell as sweet." A similar statement holds true for the names of colors in SAS: "Rose" by any other name would look as red! SAS enables you to specify a

It is easy to simulate data that is uniformly distributed in the unit cube for any dimension. However, it is less obvious how to generate data in the unit simplex. The simplex is the set of points (x1,x2,...,xd) such that Σi xi = 1 and 0 ≤ xi ≤ 1

Did you know that you can index into SAS/IML matrices by using unique strings that you assign via the MATTRIB statement? The MATTRIB statement associates various attributes to a matrix. Usually, these attributes are only used for printing, but you can also use the ROWNAME= and COLNAME= attributes to subset

If you use a word three times, it's yours. -Unknown When I was a child, my mother used to encourage me to increase my vocabulary by saying, "If you use a word three times, it's yours for life." I believe that the same saying holds for programming techniques: Use a

I needed to construct a string to use in the title of a scatter plot. The scatter plot showed a line, and I wanted to include the equation of the line in the plot's title. This article shows how to construct a string that contains the equation in a readable

When I studied math in school, I learned that the expression a (mod n) is always an integer between 0 and q – 1 for integer values of a and q. It's a nice convention, but SAS and many other computer languages allow the result to be negative if a (or q) is

The SAS/IML language supports user-defined functions (also called modules). Many SAS/IML programmers know that you can use the RETURN function to return a value from a user-defined function. For example, the following function returns the sum of each column of matrix: proc iml; start ColSum(M); return( M[+, ] ); /*

Sometimes a small option can make a big difference. Last week I thought to myself, "I wish there were an option that prevents variable labels from appearing in a table or graph." Well, it turns out that there is! I was using PROC MEANS to display some summary statistics, and

A comment to last week's article on "How to get data values out of ODS graphics" indicated that the technique would be useful for changing the title on an ODS graph "without messing around with GTL." You can certainly use the technique for that purpose, but if you want to

I received the following question: In the DATA step I always use the ** operator to raise a values to a power, like this: x**2. But on your blog I you use the ## operator to raise values to a power in SAS/IML programs. Does SAS/IML not support the **

Suppose that you have two data vectors, x and y, with the same number of elements. How can you rearrange the values of y so that they have the same relative order as the values of x? In other words, find a permutation, π, of the elements of y so

No matter what statistical programming language you use, be careful of testing for an exact value of a floating-point number. This is known in the world of numerical analysis as "10.0 times 0.1 is hardly ever 1.0" (Kernighan and Plauger, 1974, The Elements of Programming Style). There are many examples

In the SAS/IML language, a user-defined function or subroutine is called a module. Modules are used to extend the capability of the SAS/IML language. In my blog posts, I usually define a module in a PROC IML session and then immediately use it. However, sometimes it is useful to store

In the SAS/IML language, a user-defined function or subroutine is called a module. Modules are used to extend the capability of the SAS/IML language. Usually you need to explicitly load modules before you use them, but there are two cases where PROC IML loads a module automatically. Modules in IMLMLIB

In a previous blog, I showed how to use SAS/IML subscript reduction operators to compute the location of the maximum values for each row of a matrix. The subscript reduction operators are useful for computing simple statistics for each row (or column) of a numerical matrix. If x is a

The other day I encountered an article in the SAS Knowledge Base that shows how to write a macro that "returns the variable name that contains the maximum or minimum value across an observation." Some people might say that the macro is "clever." I say it is complicated. This is

One of the first skills that a beginning SAS/IML programmer learns is how to read data from a SAS data set into SAS/IML vectors. (Alternatively, you can read data into a matrix). The beginner is sometimes confused about the syntax of the READ statement: do you specify the names of

In statistical programming, I often test a program by running it on a problem for which I know the correct answer. I often use a single expression to compute the maximum value of the absolute difference between the vectors: maxDiff = max( abs( z-correct ) ); /* largest absolute difference

To a statistician, the DIF function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function has many other uses, including computing finite differences. The DIF function computes the difference between the original vector and a shifted version

To a statistician, the LAG function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function provides a convenient way to compute quantitites that involve adjacent values in any vector. The LAG function is essentially a "shift operator."

Because the SAS/IML language is a general purpose programming language, it doesn't have a BY statement like most other SAS procedures (such as PROC REG). However, there are several ways to loop over categorical variables and perform an analysis on the observations in each category. One way is to use

The birthday matching problem is a classic problem in probability theory. The part of it that people tend to remember is that in a room of 23 people, there is greater than 50% chance that two people in the room share a birthday. But the birthday matching problem is also

Locating missing values is important in statistical data analysis. I've previously written about how to count the number of missing values for each variable in a data set. In Base SAS, I showed how to use the MEANS or FREQ procedures to count missing values. In the SAS/IML language, I

The fundamental units in the SAS/IML language are matrices and vectors. Consequently, you might wonder about conditional expression such as if v>0 then.... What does this expression mean when v contains more than a single element? Evaluating vector expressions When you test a vector for some condition, expressions like v>0

The SAS/IML language supports both row vectors and column vectors. This is useful for performing linear algebra, but it can cause headaches when you are writing a SAS/IML module. I want my modules to be able to handle both row vectors and column vectors. I don't want the user to

SAS provides several ways to compute sample quantiles of data. The UNIVARIATE procedure can compute quantiles (also called percentiles), but you can also compute them in the SAS/IML language. Prior to SAS/IML 9.22 (released in 2010) statistical programmers could call a SAS/IML module that computes sample quantiles. With the release