I recently wrote about how to overlay multiple curves on a single graph by reshaping wide data (with many variables) into long data (with a grouping variable). The implementation used PROC TRANSPOSE, which is a procedure in Base SAS. When you program in the SAS/IML language, you might encounter data
Author
Data. To a statistician, data are the observed values. To a SAS programmer, analyzing data requires knowledge of the values and how the data are arranged in a data set. Sometimes the data are in a "wide form" in which there are many variables. However, to perform a certain analysis
SAS procedures usually handle missing values automatically. Univariate procedures such as PROC MEANS automatically delete missing values when computing basic descriptive statistics. Many multivariate procedures such as PROC REG delete an entire observation if any variable in the analysis has a missing value. This is called listwise deletion or using
When you have a long-running SAS/IML program, it is sometimes useful to be able to monitor the progress of the program. For example, suppose you need to computing statistics for 1,000 different data sets and each computation takes between 5 and 30 seconds. You might want to output a message
Friends have to look out for each other. Sometimes this can be slightly embarrassing. At lunch you might need to tell a friend that he has some tomato sauce on his chin. Or that she has a little spinach stuck between her teeth. Or you might need to tell your
The xkcd comic often makes me think and laugh. The comic features physics, math, and statistics among its topics. Many years ago, the comic showed a "binary heart": a grid of binary (0/1) numbers with the certain numbers colored red so that they formed a heart. Some years later, I
The SAS DATA step supports multidimensional arrays. However, matrices in SAS/IML are like mathematical matrices: they are always two dimensional. In simulation studies you might need to generate and store thousands of matrices for a later statistical analysis of their properties. How can you accomplish that unless you can create
In SAS, the order of variables in a data set is usually unimportant. However, occasionally SAS programmers need to reorder the variables in order to make a special graph or to simplify a computation. Reordering variables in the DATA step is slightly tricky. There are Knowledge Base articles about how
A SAS/IML programmer asked a question on a discussion forum, which I paraphrase below: I've written a SAS/IML function that takes several arguments. Some of the arguments have default values. When the module is called, I want to compute some quantity, but I only want to compute it for the
In my book Simulating Data with SAS, I discuss a relationship between the skewness and kurtosis of probability distributions that might not be familiar to some statistical programmers. Namely, the skewness and kurtosis of a probability distribution are not independent. If κ is the full kurtosis of a distribution and
In the SAS DATA step, all variables are scalar quantities. Consequently, an IF-THEN/ELSE statement that evaluates a logical expression is unambiguous. For example, the following DATA step statements print "c=5 is TRUE" to the log if the variable c is equal to 5: if c=5 then put "c=5 is TRUE";
At the beginning of my book Statistical Programming with SAS/IML Software I give the following programming tip (p. 25): Do not confuse an empty matrix with a matrix that contains missing values or with a zero matrix. An empty matrix has no rows and no columns. A matrix that contains
Data simulation is a fundamental technique in statistical programming and research. My book Simulating Data with SAS is an accessible how-to book that describes the most useful algorithms and the best programming techniques for efficient data simulation in SAS. Here are five lessons you can learn by reading it: Learn strategies
A common task in SAS/IML programming is finding elements of a SAS/IML matrix that satisfy a logical expression. For example, you might need to know which matrix elements are missing, are negative, or are divisible by 2. In the DATA step, you can use the WHERE clause to subset data.
The other day I was creating some histograms inside a loop in PROC IML. It was difficult for me to determine which histogram was associated with which value of the looping variable. "No problem," I said. "I'll just use a TITLE statement inside the loop so that each histogram has
I began 2015 by compiling a list of popular articles from my blog in 2014. Although this "People's Choice" list contains many interesting articles, some of my favorites did not make the list. Today I present the "Editor's Choice" list of articles that deserve a second look. I've highlighted one
A SAS programmer recently posted a question to the SAS/IML Support Community about how to compute the kth smallest value in a vector of numbers. In the DATA step, you can use the SMALLEST function to find the smallest value in an array, but there is no equivalent built-in function
I published 118 blog posts in 2014. This article presents my most popular posts from 2014 and late 2013. 2014 will always be a special year for me because it was the year that the SAS University Edition was launched. The University Edition means that SAS/IML is available to all
I recently posted an article about self-similar structures that arise in Pascal's triangle. Did you know that the Kronecker product (or direct product) can be used to create matrices that have self-similar structure? The basic idea is to start with a 0/1 matrix and compute a sequence of direct products
Like most programming languages, the SAS/IML language has many functions. However, the SAS/IML language also has quite a few operators. Operators can act on a matrix or on rows or columns of a matrix. They are less intuitive, but can be quite powerful because they enable you perform computations without
O Christmas tree, O Christmas tree, One year a fractal made thee! O Christmas tree, O Christmas tree, A heat map can display thee! From Pascal's matrix we define! Reflect across, divide by nine. O Christmas tree, O Christmas tree, Self-similar and so divine! Eventually I will run out of
There are many ways to multiply scalars, vectors, and matrices, but the Kronecker product (also called the direct product) is multiplication on steroids. The Kronecker product looks scary, but it is actually simple. The Kronecker product is merely a way to pack multiples of a matrix B into a block
A colleague asked me a question regarding my recent post about the Pascal triangle matrix. While responding to his question, I discovered a program that I had written in 1999 that computed with a Pascal triangle matrix. Wow, I've been computing with Pascal's triangle for 15 years! I don't know
Pascal's triangle is the name given to the triangular array of binomial coefficients. The nth row is the set of coefficients in the expansion of the binomial expression (1 + x)n. Complicated stuff, right? Well, yes and no. Pascal's triangle is known to many school children who have never heard of polynomials
A common question on SAS discussion forums is how to compute the minimum and maximum values across several variables. It is easy to compute statistics across rows by using the DATA step. This article shows how to compute the minimum and maximum values for each observation (across variables) and, for
I've written about how to generate a sample from a multivariate normal (MVN) distribution in SAS by using the RANDNORMAL function in SAS/IML software. Last week a SAS/IML programmer showed me a program that simulated MVN data and computed the resulting covariance matrix for each simulated sample. The purpose of
SAS software contains a lot of features, and each release adds more.To make sure that you do not miss new features that appear in the SAS/IML language, the word cloud on the right sidebar of my blog contains numbers that relate to SAS or SAS/IML releases. For example, you can
My colleagues at the SAS & R blog recently posted an example of how to program a permutation test in SAS and R. Their SAS implementation used Base SAS and was "relatively cumbersome" (their words) when compared with the R code. In today's post I implement the permutation test in
I sometimes wonder whether some functions and options in SAS software ever get used. Last week I was reviewing new features that were added to SAS/IML 13.1. One of the new functions is the CV function, which computes the sample coefficient of variation for data. Maybe it is just me,
Have you ever noticed that some SAS/IML programmers use the CALL statement to call a subroutine, whereas others use the RUN statement? Have you ever wondered why the SAS/IML language has two statements that do the same thing? It turns out that the CALL statement and the RUN statement do