## Tag: Tips and Techniques

0
The missing value trick for scoring a regression model

A fundamental operation in statistical data analysis is to fit a statistical regression model on one set of data and then evaluate the model on another set of data. The act of evaluating the model on the second set of data is called scoring. One of first "tricks" that I

Learn SAS
0
How to vectorize time series computations

Vector languages such as SAS/IML, MATLAB, and R are powerful because they enable you to use high-level matrix operations (matrix multiplication, dot products, etc) rather than loops that perform scalar operations. In general, vectorized programs are more efficient (and therefore run faster) than programs that contain loops. For an example

Learn SAS
0
Write a matrix in the "long form"

If you write an n x p matrix from PROC IML to a SAS data set, you'll get a data set with n rows and p columns. For some applications, it is more convenient to write the matrix in a "long format" with np observations and three columns. The first

Learn SAS
0
Square root transformations: How to handle negative data values?

I was looking at someone else's SAS/IML program when I saw this line of code: y = sqrt(x<>0); The statement uses the element maximum operator (<>) in the SAS/IML language to make sure that negative value are never passed to the square root function. This little trick is a real

Programming Tips
0
Output percentiles of multiple variables in a tabular format

A challenge for statistical programmers is getting data into the right form for analysis. For graphing or analyzing data, sometimes the "wide format" (each subject is represented by one row and many variables) is required, but other times the "long format" (observations for each subject span multiple rows) is more

0
How to tell whether a sequence of heads and tails is random

While walking in the woods, a statistician named Goldilocks wanders into a cottage and discovers three bears. The bears, being hungry, threaten to eat the young lady, but Goldilocks begs them to give her a chance to win her freedom. The bears agree. While Mama Bear and Papa Bear block

Learn SAS
0
Convert a string into a vector of characters

Sometimes it is useful in the SAS/IML language to convert a character string into a vector of one-character values. For example, you might want to count the frequency distribution of characters, which is easy when each character is an element of a vector. The question of how to convert a

Programming Tips
0
Six reasons you should stop using the RANUNI function to generate random numbers

Are you still using the old RANUNI, RANNOR, RANBIN, and other "RANXXX" functions to generate random numbers in SAS? If so, here are six reasons why you should switch from these older (1970s) algorithms to the newer (late 1990s) Mersenne-Twister algorithm, which is implemented in the RAND function. The newer

Learn SAS
0
Passing values from PROC IML into SAS procedures

A SAS user told me that he computed a vector of values in the SAS/IML language and wanted to use those values on a statement in a SAS procedure. The particular application involved wanting to use the values on the ESTIMATE and CONTRAST statements in a SAS regression procedure, but

0
Turn off ODS when running simulations in SAS

In my article "Simulation in SAS: The slow way or the BY way," I showed how to use BY-group processing rather than a macro loop in order to efficiently analyze simulated data with SAS. In the example, I analyzed the simulated data by using PROC MEANS, and I use the

0
How to overlay custom curves with PROC SGPLOT

I recently showed someone a trick to create a graph, and he was extremely pleased to learn it. The trick is well known to many SAS users, but I hope that this article will introduce it to even more SAS users. At issue is how to use the SGPLOT procedure

0
How to generate multiple samples from the multivariate normal distribution in SAS

A SAS customer asks: How do I use SAS to generate multiple samples of size N from a multivariate normal distribution? Suppose that you want to simulate k samples (each with N observations) from a multivariate normal distribution with a given mean vector and covariance matrix. Because all of the

0
The case of spilled coffee and the regression intercept

Argh! I've just spilled coffee on output that shows the least squares coefficients for a regression model that I was investigating. Now the parameter estimate for the intercept is completely obscured, although I can still see the parameter estimates for the coefficients of the continuous explanatory variable. What can I

Learn SAS
0
How to access any program or data set that appears in the SAS/STAT documentation

If you are like me, you've experienced the following frustration. You are reading the SAS/STAT documentation, trying to understand some procedure or option, when you find an example that is very similar to what you need. "Great," you think, "this example will help me understand how the SAS procedure works!"

0
Find variables common to multiple data sets

Last week the SAS Training Post blog posted a short article on an easy way to find variables in common to two data sets. The article used PROC CONTENTS (with the SHORT option) to print out the names of variables in SAS data sets so that you can visually determine

Learn SAS
0
Oh, those pesky temporary variables!

The SAS/IML language secretly creates temporary variables. Most of the time programmers aren't even aware that the language does this. However, there is one situation where if you don't think carefully about temporary variables, your program will silently produce an error. And as every programmer knows, silent wrong numbers are

0
Generate binary outcomes with varying probability

A while ago I saw a blog post on how to simulate Bernoulli outcomes when the probability of generating a 1 (success) varies from observation to observation. I've done this often in SAS, both in the DATA step and in the SAS/IML language. For example, when simulating data that satisfied

0
Remove or keep: Which is faster?

In a recent article on efficient simulation from a truncated distribution, I wrote some SAS/IML code that used the LOC function to find and exclude observations that satisfy some criterion. Some readers came up with an alternative algorithm that uses the REMOVE function instead of subscripts. I remarked in a

Learn SAS
0
Beware the naked LOC

The LOC function is one of the most important functions in the SAS/IML language. The LOC function finds elements of a vector or matrix that satisfy some condition. For example, if you are going to apply a logarithmic transform to data, you can use the LOC function to find all

0
Compute the log-determinant of a matrix

The determinant of a matrix arises in many statistical computations, such as in estimating parameters that fit a distribution to multivariate data. For example, if you are using a log-likelihood function to fit a multivariate normal distribution, the formula for the log-likelihood involves the expression log(det(Σ)), where Σ is the

0
Access rows or columns of a matrix by names

Did you know that you can index into SAS/IML matrices by using unique strings that you assign via the MATTRIB statement? The MATTRIB statement associates various attributes to a matrix. Usually, these attributes are only used for printing, but you can also use the ROWNAME= and COLNAME= attributes to subset

0
Extract the lower triangular elements of a matrix

It is common to want to extract the lower or upper triangular elements of a matrix. For example, if you have a correlation matrix, the lower triangular elements are the nontrivial correlations between variables in your data. As I've written before, you can use the VECH function to extract the

0
How to get data values out of ODS graphics

Many SAS procedures can produce ODS statistical graphics as naturally as they produce tables. Did you know that it is possible to obtain the numbers underlying an ODS statistical graph? This post shows how. Suppose that a SAS procedure creates a graph that displays a curve and that you want

0
Using macro loops for simulation

Last week I wrote an article in which I pointed out that many SAS programmers write a simulation in SAS by writing a macro loop. This approach is extremely inefficient, so I presented a more efficient technique. Not only is the macro loop approach slow, but there are other undesirable

0
Create an ID vector for repeated measurements

I often use the SAS/IML language for simulating data with certain known properties. In fact, I'm writing a book called Simulating Data with SAS. When I simulate repeated measurements (sometimes called replicated data), I often want to generate an ID variable that identifies which measurement is associated with which subject

0
Programming tip: Avoid testing floating-point values for equality

No matter what statistical programming language you use, be careful of testing for an exact value of a floating-point number. This is known in the world of numerical analysis as "10.0 times 0.1 is hardly ever 1.0" (Kernighan and Plauger, 1974, The Elements of Programming Style). There are many examples

0
Rename many variables that have numerical suffixes and a common prefix

I recently read a blog post in which a SAS user had to rename a bunch of variables named A1, A2,..., A10, such as are contained in the following data set: /* generate data with variables A1-A10 */ data A; array A A1-A10 (1); do i = 1 to 10;

0
An easy way to define a library of user-defined functions

In the SAS/IML language, a user-defined function or subroutine is called a module. Modules are used to extend the capability of the SAS/IML language. In my blog posts, I usually define a module in a PROC IML session and then immediately use it. However, sometimes it is useful to store