## Tag: Efficiency

0
The partition problem

The partition problem has many variations, but recently I encountered it as an interactive puzzle on a computer. (Try a similar game yourself!) The player is presented with an old-fashioned pan-balance scale and a set of objects of different weights. The challenge is to divide (or partition) the objects into

0
Short-circuit evaluation and logical ligatures in SAS

Many programmers are familiar with "short-circuit" evaluation in an IF-THEN statement. Short circuit means that a program does not evaluate the remainder of a logical expression if the value of the expression is already logically determined. The SAS DATA step supports short-circuiting for simple logical expressions in IF-THEN statements and

0
Write to a SAS data set from inside a SAS/IML loop

In SAS/IML programs, a common task is to write values in a matrix to a SAS data set. For some programs, the values you want to write are in a matrix and you use the CREATE FROM/APPEND FROM syntax to create the data set, as follows: proc iml; X =

Programming Tips
0
The intersection of multiple sets

This article compares several ways to find the elements that are common to multiple sets. I test which method is the fastest in the SAS/IML language. However, all algorithms are intrinsically fast, which raises an important question: when is it worth the time and effort to optimize an algorithm? The

Programming Tips
0
Radial basis functions and Gaussian kernels in SAS

A radial basis function is a scalar function that depends on the distance to some point, called the center point, c. One popular radial basis function is the Gaussian kernel φ(x; c) = exp(-||x – c||2 / (2 σ2)), which uses the squared distance from a vector x to the

Programming Tips
0
6 tips for timing the performance of algorithms

When you implement a statistical algorithm in a vector-matrix language such as SAS/IML, R, or MATLAB, you should measure the performance of your implementation, which means that you should time how long a program takes to analyze data of varying sizes and characteristics. There are some general tips that can

Learn SAS
0
Compare the performance of algorithms in SAS

As my colleague Margaret Crevar recently wrote, it is useful to know how long SAS programs take to run. Margaret and others have written about how to use the SAS FULLSTIMER option to monitor the performance of the SAS system. In fact, SAS distributes a macro that enables you to

0
Finding observations that match a target value

Imagine that you have one million rows of numerical data and you want to determine if a particular "target" value occurs. How might you find where the value occurs? For univariate data, this is an easy problem. In the SAS DATA step you can use a WHERE clause or a

0
An easy way to approximate a cumulative distribution function

Evaluating a cumulative distribution function (CDF) can be an expensive operation. Each time you evaluate the CDF for a continuous probability distribution, the software has to perform a numerical integration. (Recall that the CDF at a point x is the integral under the probability density function (PDF) where x is

0
Friends don't let friends concatenate results inside a loop

Friends have to look out for each other. Sometimes this can be slightly embarrassing. At lunch you might need to tell a friend that he has some tomato sauce on his chin. Or that she has a little spinach stuck between her teeth. Or you might need to tell your

Learn SAS
0
Finding matrix elements that satisfy a logical expression

A common task in SAS/IML programming is finding elements of a SAS/IML matrix that satisfy a logical expression. For example, you might need to know which matrix elements are missing, are negative, or are divisible by 2. In the DATA step, you can use the WHERE clause to subset data.

0
Pairwise comparisons of a data vector

A SAS customer showed me a SAS/IML program that he had obtained from a book. The program was taking a long time to run on his data, which was somewhat large. He was wondering if I could identify any inefficiencies in the program. The first thing I did was to

0
Simulate many samples from a logistic regression model

My last blog post showed how to simulate data for a logistic regression model with two continuous variables. To keep the discussion simple, I simulated a single sample with N observations. However, to obtain the sampling distribution of statistics, you need to generate many samples from the same logistic model.

0
Simulating data for a logistic regression model

In my book Simulating Data with SAS, I show how to use the SAS DATA step to simulate data from a logistic regression model. Recently there have been discussions on the SAS/IML Support Community about simulating logistic data by using the SAS/IML language. This article describes how to efficiently simulate

0
Using associativity can lead to big performance improvements in matrix multiplication

In a previous post, I stated that you should avoid matrix multiplication that involves a huge diagonal matrix because that operation can be carried out more efficiently. Here's another tip that sometimes improves the efficiency of matrix multiplication: use parentheses to prevent the creation of large matrices. Matrix multiplication is

0
Never multiply with a large diagonal matrix

I love working with SAS Technical Support because I get to see real problems that SAS customers face as they use SAS/IML software. The other day I advised a customer how to improve the efficiency of a computation that involved multiplying large matrices. In this article I describe an important

Learn SAS
0
Assign the diagonal elements of a matrix

SAS/IML programmers know that the VECDIAG matrix can be used to extract the diagonal elements of a matrix. For example, the following statements extract the diagonal of a 3 x 3 matrix: proc iml; m = {1 2 3, 4 5 6, 7 8 9}; v = vecdiag(m); /* v = {1,5,9}

0
How is being agile different from doing agile?

There’s a reason SAS is a leader in so many different areas, and the adoption of Agile Scrum is no exception. “Scrum” gets its namesake from rugby, and more specifically, the name of a play where all team members lock arms and wrestle for the ball so that the game

0
Count the number of unique rows in a matrix

How do you count the number of unique rows in a matrix? The simplest algorithm is to sort the data and then iterate down the rows, comparing each row with the previous row. However, this algorithm has two shortcomings: it physically sorts the data (which means that the original locations

0
Using simulation to estimate the power of a statistical test

The power of a statistical test measures the test's ability to detect a specific alternate hypothesis. For example, educational researchers might want to compare the mean scores of boys and girls on a standardized test. They plan to use the well-known two-sample t test. The null hypothesis is that the

0
How to vectorize computations in a matrix language

Last week someone posted an interesting question to the SAS/IML Support Community. The problem involved four nested DO loops and took hours to run. By transforming several nested DO loops into an equivalent matrix operation, I was able to reduce the run time to about one second. The process of

Learn SAS
0
Oh, those pesky temporary variables!

The SAS/IML language secretly creates temporary variables. Most of the time programmers aren't even aware that the language does this. However, there is one situation where if you don't think carefully about temporary variables, your program will silently produce an error. And as every programmer knows, silent wrong numbers are

0
Generate binary outcomes with varying probability

A while ago I saw a blog post on how to simulate Bernoulli outcomes when the probability of generating a 1 (success) varies from observation to observation. I've done this often in SAS, both in the DATA step and in the SAS/IML language. For example, when simulating data that satisfied

0
Remove or keep: Which is faster?

In a recent article on efficient simulation from a truncated distribution, I wrote some SAS/IML code that used the LOC function to find and exclude observations that satisfy some criterion. Some readers came up with an alternative algorithm that uses the REMOVE function instead of subscripts. I remarked in a

0
Efficient acceptance-rejection simulation: Part II

Last week I wrote about using acceptance-rejection algorithms in vector languages to simulate data. The main point I made is that in a vector language it is efficient to generate many more variates than are needed, with the knowledge that a certain proportion will be rejected. In last week's article,

0
Efficient acceptance-rejection simulation

A few days ago on the SAS/IML Support Community, there was an interesting discussion about how to simulate data from a truncated Poisson distribution. The SAS/IML user wanted to generate values from a Poisson distribution, but discard any zeros that are generated. This kind of simulation is known as an

0
Constructing block matrices with applications to mixed models

The other day I was constructing covariance matrices for simulating data for a mixed model with repeated measurements. I was using the SAS/IML BLOCK function to build up the "R-side" covariance matrix from smaller blocks. The matrix I was constructing was block-diagonal and looked like this: The matrix represents a

0
Using macro loops for simulation

Last week I wrote an article in which I pointed out that many SAS programmers write a simulation in SAS by writing a macro loop. This approach is extremely inefficient, so I presented a more efficient technique. Not only is the macro loop approach slow, but there are other undesirable

0
Simulation in SAS: The slow way or the BY way

Over the past few years, and especially since I posted my article on eight tips to make your simulation run faster, I have received many emails (often with attached SAS programs) from SAS users who ask for advice about how to speed up their simulation code. For this reason, I

0
Indexing a SAS data set to improve processing categories in SAS/IML

I have blogged about three different SAS/IML techniques that iterate over categories and process the observations in each category. The three techniques are as follows: Use a WHERE clause on the READ statement to read only the observations in the ith category. This is described in the article "BY-group processing