Friends have to look out for each other. Sometimes this can be slightly embarrassing. At lunch you might need to tell a friend that he has some tomato sauce on his chin. Or that she has a little spinach stuck between her teeth. Or you might need to tell your
Tag: Tips and Techniques
The SAS DATA step supports multidimensional arrays. However, matrices in SAS/IML are like mathematical matrices: they are always two dimensional. In simulation studies you might need to generate and store thousands of matrices for a later statistical analysis of their properties. How can you accomplish that unless you can create
The other day I was creating some histograms inside a loop in PROC IML. It was difficult for me to determine which histogram was associated with which value of the looping variable. "No problem," I said. "I'll just use a TITLE statement inside the loop so that each histogram has
The other day I was doing some computations that caused me to wonder, "What is the smallest power of 2 that is greater than a given number?" The mathematics is straightforward. Given a number n, find the least value of k such that 2k ≥ n or, equivalently, k ≥
Many people know that the SAS/IML language enables you to read data from and write results to multiple SAS data sets. When you open a new data set, it is a good programming practice to close the previous data set. But did you know that you can have two data
I received the following email from a SAS/IML programmer: I am getting an error in a PROC IML module that I wrote. The SAS Log says NOTE: Paused in module NAME When I submit other commands, PROC IML doesn't seem to understand them. How can I continue the program? The
The SAS/IML language is used for many kinds of computations, but three important numerical tasks are integration, optimization, and root finding. Recently a SAS customer asked for help with a problem that involved all three tasks. The customer had an objective function that was defined in terms of an integral.
Bootstrap methods and permutation tests are popular and powerful nonparametric methods for testing hypotheses and approximating the sampling distribution of a statistic. I have described a SAS/IML implementation of a bootstrap permutation test for matched pairs of data (an alternative to a matched-pair t test) in my paper "Modern Data
Last week, as part of an article on how spammers generate comments for blogs, I showed how to generate random messages by using the CATX function in the DATA step. In that example, the strings were scalar quantities, but you can also concatenate vectors of strings in the SAS/IML language.
Dear Rick, I am trying to create a numerical matrix with 100,000 rows and columns in PROC IML. I get the following error: (execution) Unable to allocate sufficient memory. Can IML allocate a matrix of this size? What is wrong? Several times a month I see a variation of this
My previous post described how to use the "missing response trick" to score a regression model. As I said in that article, there are other ways to score a regression model. This article describes using the SCORE procedure, a SCORE statement, the relatively new PLM procedure, and the CODE statement.
A fundamental operation in statistical data analysis is to fit a statistical regression model on one set of data and then evaluate the model on another set of data. The act of evaluating the model on the second set of data is called scoring. One of first "tricks" that I
Vector languages such as SAS/IML, MATLAB, and R are powerful because they enable you to use high-level matrix operations (matrix multiplication, dot products, etc) rather than loops that perform scalar operations. In general, vectorized programs are more efficient (and therefore run faster) than programs that contain loops. For an example
If you write an n x p matrix from PROC IML to a SAS data set, you'll get a data set with n rows and p columns. For some applications, it is more convenient to write the matrix in a "long format" with np observations and three columns. The first
I was looking at someone else's SAS/IML program when I saw this line of code: y = sqrt(x<>0); The statement uses the element maximum operator (<>) in the SAS/IML language to make sure that negative value are never passed to the square root function. This little trick is a real
A challenge for statistical programmers is getting data into the right form for analysis. For graphing or analyzing data, sometimes the "wide format" (each subject is represented by one row and many variables) is required, but other times the "long format" (observations for each subject span multiple rows) is more
While walking in the woods, a statistician named Goldilocks wanders into a cottage and discovers three bears. The bears, being hungry, threaten to eat the young lady, but Goldilocks begs them to give her a chance to win her freedom. The bears agree. While Mama Bear and Papa Bear block
Sometimes it is useful in the SAS/IML language to convert a character string into a vector of one-character values. For example, you might want to count the frequency distribution of characters, which is easy when each character is an element of a vector. The question of how to convert a
Are you still using the old RANUNI, RANNOR, RANBIN, and other "RANXXX" functions to generate random numbers in SAS? If so, here are six reasons why you should switch from these older (1970s) algorithms to the newer (late 1990s) Mersenne-Twister algorithm, which is implemented in the RAND function. The newer
A SAS user told me that he computed a vector of values in the SAS/IML language and wanted to use those values on a statement in a SAS procedure. The particular application involved wanting to use the values on the ESTIMATE and CONTRAST statements in a SAS regression procedure, but
In my article "Simulation in SAS: The slow way or the BY way," I showed how to use BY-group processing rather than a macro loop in order to efficiently analyze simulated data with SAS. In the example, I analyzed the simulated data by using PROC MEANS, and I use the
I recently showed someone a trick to create a graph, and he was extremely pleased to learn it. The trick is well known to many SAS users, but I hope that this article will introduce it to even more SAS users. At issue is how to use the SGPLOT procedure
A SAS customer asks: How do I use SAS to generate multiple samples of size N from a multivariate normal distribution? Suppose that you want to simulate k samples (each with N observations) from a multivariate normal distribution with a given mean vector and covariance matrix. Because all of the
Argh! I've just spilled coffee on output that shows the least squares coefficients for a regression model that I was investigating. Now the parameter estimate for the intercept is completely obscured, although I can still see the parameter estimates for the coefficients of the continuous explanatory variable. What can I
If you are like me, you've experienced the following frustration. You are reading the SAS/STAT documentation, trying to understand some procedure or option, when you find an example that is very similar to what you need. "Great," you think, "this example will help me understand how the SAS procedure works!"
Last week the SAS Training Post blog posted a short article on an easy way to find variables in common to two data sets. The article used PROC CONTENTS (with the SHORT option) to print out the names of variables in SAS data sets so that you can visually determine
The SAS/IML language secretly creates temporary variables. Most of the time programmers aren't even aware that the language does this. However, there is one situation where if you don't think carefully about temporary variables, your program will silently produce an error. And as every programmer knows, silent wrong numbers are
A while ago I saw a blog post on how to simulate Bernoulli outcomes when the probability of generating a 1 (success) varies from observation to observation. I've done this often in SAS, both in the DATA step and in the SAS/IML language. For example, when simulating data that satisfied
In a recent article on efficient simulation from a truncated distribution, I wrote some SAS/IML code that used the LOC function to find and exclude observations that satisfy some criterion. Some readers came up with an alternative algorithm that uses the REMOVE function instead of subscripts. I remarked in a
The LOC function is one of the most important functions in the SAS/IML language. The LOC function finds elements of a vector or matrix that satisfy some condition. For example, if you are going to apply a logarithmic transform to data, you can use the LOC function to find all