Modern statistical software provides many options for computing robust statistics. For example, SAS can compute robust univariate statistics by using PROC UNIVARIATE, robust linear regression by using PROC ROBUSTREG, and robust multivariate statistics such as robust principal component analysis. Much of the research on robust regression was conducted in the

## Tag: **vectorization**

Many statistical tests use a CUSUM statistic as part of the test. It can be confusing when a researcher refers to "the CUSUM test" without providing details about exactly which CUSUM test is being used. This article describes a CUSUM test for the randomness of a binary sequence. You start

Suppose you need to assign 100 patients equally among 3 treatment groups in a clinical study. Obviously, an equal allocation is impossible because the second number does not evenly divide the first, but you can get close by assigning 34 patients to one group and 33 to the others. Mathematically,

As a general rule, when SAS programmers want to manipulate data row by row, they reach for the SAS DATA step. When the computation requires column statistics, the SQL procedure is also useful. When both row and column operations are required, the SAS/IML language is a powerful addition to a

One way to assess the precision of a statistic (a point estimate) is to compute the standard error, which is the standard deviation of the statistic's sampling distribution. A relatively large standard error indicates that the point estimate should be viewed with skepticism, either because the sample size is small

Many intervals in statistics have the form p ± δ, where p is a point estimate and δ is the radius (or half-width) of the interval. (For example, many two-sided confidence intervals have this form, where δ is proportional to the standard error.) Many years ago I wrote an article

I recently needed to solve a fun programming problem. I challenge other SAS programmers to solve it, too! The problem is easy to state: Given a long sequence of digits, can you write a program to count how many times a particular subsequence occurs? For example, if I give you

Two of my favorite string-manipulation functions in the SAS DATA step are the COUNTW function and the SCAN function. The COUNTW function counts the number of words in a long string of text. Here "word" means a substring that is delimited by special characters, such as a space character, a

I was a freshman in college the first time I saw the Cantor middle-thirds set and the related Cantor "Devil's staircase" function. (Shown at left.) These constructions expanded my mind and led me to study fractals, real analysis, topology, and other mathematical areas. The Cantor function and the Cantor middle-thirds

Last week I showed how to use PROC EXPAND to compute moving averages and other rolling statistics in SAS. Unfortunately, PROC EXPAND is part of SAS/ETS software and not every SAS site has a license for SAS/ETS. For simple moving averages, you can write a DATA step program, as discussed

Recently I blogged about how to compute a weighted mean and showed that you can use a weighted mean to compute the center of mass for a system of N point masses in the plane. That led me to think about a related problem: computing the center of mass (called

Every year near Halloween I write a trick-and-treat article in which I demonstrate a simple programming trick that is a real treat to use. This year's trick features two of my favorite functions, the CUSUM function and the LAG function. By using these function, you can compute the rows of

In a previous post I described how to simulate random samples from an urn that contains colored balls. The previous article described the case where the balls can be either of two colors. In that csae, all the distributions are univariate. In this article I examine the case where the

A customer asked: How do we go about summing a finite series in SAS? For example, I want to compute for various integers n ≥ 3. I want to output two columns, one for the natural numbers and one for the summation of the series. Summations arise often in statistical

I often blog about the usefulness of vectorization in the SAS/IML language. A one-sentence summary of vectorization is "execute a small number of statements that each analyze a lot of data." In general, for matrix languages (SAS/IML, MATLAB, R, ...) vectorization is more efficient than the alternative, which is to

Last week I received a message from SAS Technical Support saying that a customer's IML program was running slowly. Could I look at it to see whether it could be improved? What I discovered is a good reminder about the importance of vectorizing user-defined modules. The program in this blog

Like most programming languages, the SAS/IML language has many functions. However, the SAS/IML language also has quite a few operators. Operators can act on a matrix or on rows or columns of a matrix. They are less intuitive, but can be quite powerful because they enable you perform computations without

My colleagues at the SAS & R blog recently posted an example of how to program a permutation test in SAS and R. Their SAS implementation used Base SAS and was "relatively cumbersome" (their words) when compared with the R code. In today's post I implement the permutation test in

My previous blog post describes how to implement Conway's Game of Life by using the dynamically linked graphics in SAS/IML Studio. But the Game of Life is not the only kind of cellular automata. This article describes a system of cellular automata that is known as Wolfram's Rule 30. In

A SAS customer showed me a SAS/IML program that he had obtained from a book. The program was taking a long time to run on his data, which was somewhat large. He was wondering if I could identify any inefficiencies in the program. The first thing I did was to

Nonlinear optimization routines enable you to find the values of variables that optimize an objective function of those variables. When you use a numerical optimization routine, you need to provide an initial guess, often called a "starting point" for the algorithm. Optimization routines iteratively improve the initial guess in an

Bootstrap methods and permutation tests are popular and powerful nonparametric methods for testing hypotheses and approximating the sampling distribution of a statistic. I have described a SAS/IML implementation of a bootstrap permutation test for matched pairs of data (an alternative to a matched-pair t test) in my paper "Modern Data

Just one last short article about properties of the Hilbert matrix. I've already blogged about how to construct a Hilbert matrix in the SAS/IML language and how to compute a formula for the determinant. One reason that the Hilbert matrix is a famous (some would say infamous!) example in numerical

The Hilbert matrix is the most famous ill-conditioned matrix in numerical linear algebra. It is often used in matrix computations to illustrate problems that arise when you compute with ill-conditioned matrices. The Hilbert matrix is symmetric and positive definite, properties that are often associated with "nice" and "tame" matrices. The

Vector languages such as SAS/IML, MATLAB, and R are powerful because they enable you to use high-level matrix operations (matrix multiplication, dot products, etc) rather than loops that perform scalar operations. In general, vectorized programs are more efficient (and therefore run faster) than programs that contain loops. For an example

In using a vector-matrix language such as SAS/IML, MATLAB, or R, one of the challenges for programmers is learning how to vectorize computations. Often it is not intuitive how to program a computation so that you avoid looping over the rows and columns of a matrix. However, there are a

SAS/IML programmers know that the VECDIAG matrix can be used to extract the diagonal elements of a matrix. For example, the following statements extract the diagonal of a 3 x 3 matrix: proc iml; m = {1 2 3, 4 5 6, 7 8 9}; v = vecdiag(m); /* v = {1,5,9}

Last week I discussed a program that had three nested loops that used scalar operations in the innermost loop. I mentioned that this program was not vectorized, and would therefore be slow in a matrix language such as SAS/IML, MATLAB, or R. I then went through a series of steps

Last week someone posted an interesting question to the SAS/IML Support Community. The problem involved four nested DO loops and took hours to run. By transforming several nested DO loops into an equivalent matrix operation, I was able to reduce the run time to about one second. The process of

The LOC function is one of the most important functions in the SAS/IML language. The LOC function finds elements of a vector or matrix that satisfy some condition. For example, if you are going to apply a logarithmic transform to data, you can use the LOC function to find all