A fundamental operation in statistical data analysis is to fit a statistical regression model on one set of data and then evaluate the model on another set of data. The act of evaluating the model on the second set of data is called scoring. One of first "tricks" that I

## Tag: **Tips and Techniques**

Vector languages such as SAS/IML, MATLAB, and R are powerful because they enable you to use high-level matrix operations (matrix multiplication, dot products, etc) rather than loops that perform scalar operations. In general, vectorized programs are more efficient (and therefore run faster) than programs that contain loops. For an example

If you write an n x p matrix from PROC IML to a SAS data set, you'll get a data set with n rows and p columns. For some applications, it is more convenient to write the matrix in a "long format" with np observations and three columns. The first

I was looking at someone else's SAS/IML program when I saw this line of code: y = sqrt(x<>0); The statement uses the element maximum operator (<>) in the SAS/IML language to make sure that negative value are never passed to the square root function. This little trick is a real

A challenge for statistical programmers is getting data into the right form for analysis. For graphing or analyzing data, sometimes the "wide format" (each subject is represented by one row and many variables) is required, but other times the "long format" (observations for each subject span multiple rows) is more

While walking in the woods, a statistician named Goldilocks wanders into a cottage and discovers three bears. The bears, being hungry, threaten to eat the young lady, but Goldilocks begs them to give her a chance to win her freedom. The bears agree. While Mama Bear and Papa Bear block

Sometimes it is useful in the SAS/IML language to convert a character string into a vector of one-character values. For example, you might want to count the frequency distribution of characters, which is easy when each character is an element of a vector. The question of how to convert a

Are you still using the old RANUNI, RANNOR, RANBIN, and other "RANXXX" functions to generate random numbers in SAS? If so, here are six reasons why you should switch from these older (1970s) algorithms to the newer (late 1990s) Mersenne-Twister algorithm, which is implemented in the RAND function. The newer

A SAS user told me that he computed a vector of values in the SAS/IML language and wanted to use those values on a statement in a SAS procedure. The particular application involved wanting to use the values on the ESTIMATE and CONTRAST statements in a SAS regression procedure, but

In my article "Simulation in SAS: The slow way or the BY way," I showed how to use BY-group processing rather than a macro loop in order to efficiently analyze simulated data with SAS. In the example, I analyzed the simulated data by using PROC MEANS, and I use the

I recently showed someone a trick to create a graph, and he was extremely pleased to learn it. The trick is well known to many SAS users, but I hope that this article will introduce it to even more SAS users. At issue is how to use the SGPLOT procedure

A SAS customer asks: How do I use SAS to generate multiple samples of size N from a multivariate normal distribution? Suppose that you want to simulate k samples (each with N observations) from a multivariate normal distribution with a given mean vector and covariance matrix. Because all of the

Argh! I've just spilled coffee on output that shows the least squares coefficients for a regression model that I was investigating. Now the parameter estimate for the intercept is completely obscured, although I can still see the parameter estimates for the coefficients of the continuous explanatory variable. What can I

If you are like me, you've experienced the following frustration. You are reading the SAS/STAT documentation, trying to understand some procedure or option, when you find an example that is very similar to what you need. "Great," you think, "this example will help me understand how the SAS procedure works!"

Last week the SAS Training Post blog posted a short article on an easy way to find variables in common to two data sets. The article used PROC CONTENTS (with the SHORT option) to print out the names of variables in SAS data sets so that you can visually determine

The SAS/IML language secretly creates temporary variables. Most of the time programmers aren't even aware that the language does this. However, there is one situation where if you don't think carefully about temporary variables, your program will silently produce an error. And as every programmer knows, silent wrong numbers are

A while ago I saw a blog post on how to simulate Bernoulli outcomes when the probability of generating a 1 (success) varies from observation to observation. I've done this often in SAS, both in the DATA step and in the SAS/IML language. For example, when simulating data that satisfied

In a recent article on efficient simulation from a truncated distribution, I wrote some SAS/IML code that used the LOC function to find and exclude observations that satisfy some criterion. Some readers came up with an alternative algorithm that uses the REMOVE function instead of subscripts. I remarked in a

The LOC function is one of the most important functions in the SAS/IML language. The LOC function finds elements of a vector or matrix that satisfy some condition. For example, if you are going to apply a logarithmic transform to data, you can use the LOC function to find all

The determinant of a matrix arises in many statistical computations, such as in estimating parameters that fit a distribution to multivariate data. For example, if you are using a log-likelihood function to fit a multivariate normal distribution, the formula for the log-likelihood involves the expression log(det(Σ)), where Σ is the

Did you know that you can index into SAS/IML matrices by using unique strings that you assign via the MATTRIB statement? The MATTRIB statement associates various attributes to a matrix. Usually, these attributes are only used for printing, but you can also use the ROWNAME= and COLNAME= attributes to subset

It is common to want to extract the lower or upper triangular elements of a matrix. For example, if you have a correlation matrix, the lower triangular elements are the nontrivial correlations between variables in your data. As I've written before, you can use the VECH function to extract the

Many SAS procedures can produce ODS statistical graphics as naturally as they produce tables. Did you know that it is possible to obtain the numbers underlying an ODS statistical graph? This post shows how. Suppose that a SAS procedure creates a graph that displays a curve and that you want

Last week I wrote an article in which I pointed out that many SAS programmers write a simulation in SAS by writing a macro loop. This approach is extremely inefficient, so I presented a more efficient technique. Not only is the macro loop approach slow, but there are other undesirable

I often use the SAS/IML language for simulating data with certain known properties. In fact, I'm writing a book called Simulating Data with SAS. When I simulate repeated measurements (sometimes called replicated data), I often want to generate an ID variable that identifies which measurement is associated with which subject

No matter what statistical programming language you use, be careful of testing for an exact value of a floating-point number. This is known in the world of numerical analysis as "10.0 times 0.1 is hardly ever 1.0" (Kernighan and Plauger, 1974, The Elements of Programming Style). There are many examples

I recently read a blog post in which a SAS user had to rename a bunch of variables named A1, A2,..., A10, such as are contained in the following data set: /* generate data with variables A1-A10 */ data A; array A[10] A1-A10 (1); do i = 1 to 10;

In the SAS/IML language, a user-defined function or subroutine is called a module. Modules are used to extend the capability of the SAS/IML language. In my blog posts, I usually define a module in a PROC IML session and then immediately use it. However, sometimes it is useful to store

In the SAS/IML language, a user-defined function or subroutine is called a module. Modules are used to extend the capability of the SAS/IML language. Usually you need to explicitly load modules before you use them, but there are two cases where PROC IML loads a module automatically. Modules in IMLMLIB

SAS software provides many run-time functions that you can call from your SAS/IML or DATA step programs. The SAS/IML language has several hundred built-in statistical functions, and Base SAS software contains hundreds more. However, it is common for statistical programmers to extend the run-time library to include special user-defined functions.