The birthday matching problem is a classic problem in probability theory. The part of it that people tend to remember is that in a room of 23 people, there is greater than 50% chance that two people in the room share a birthday. But the birthday matching problem is also

## Tag: **Efficiency**

Sometimes in matrix computations you need to obtain the values of certain submatrices such as the diagonal elements or the super- or subdiagonal elements. About a year ago, I showed one way to do that: convert subscripts to indices and vice-versa. However, a tip from @RLangTip on Twitter got me

I recently blogged about Mahalanobis distance and what it means geometrically. I also previously showed how Mahalanobis distance can be used to compute outliers in multivariate data. But how do you compute Mahalanobis distance in SAS? Computing Mahalanobis distance with built-in SAS procedures and functions There are several ways to

Way back when I learned to program, I remember a computer instructor explaining that an IF-THEN statement can be a relatively slow operation. He said "If a multiplication takes one unit of time, an IF statement requires about 70 units." I don't know where his numbers came from, or even

Polynomials are used often in data analysis. Low-order polynomials are used in regression to model the relationship between variables. Polynomials are used in numerical analysis for numerical integration and Taylor series approximations. It is therefore important to be able to evaluate polynomials in an efficient manner. My favorite evaluation technique

When I write SAS/IML programs, I usually do my development in the SAS/IML Studio environment. Why? There are many reasons, but the one that I will discuss today is the fact that the application is multithreaded and supports multiple programming workspaces. The advantages of multiple programming workspaces I am always

I've previously described ways to solve systems of linear equations, A*b = c. While discussing the relative merits of the solving a system for a particular right hand side versus solving for the inverse matrix, I made the assertion that it is faster to solve a particular system than it

This article describes the SAS/IML CHOOSE function: how it works, how it doesn't work, and how to use it to make your SAS/IML programs more compact. In particular, the CHOOSE function has a potential "gotcha!" that you need to understand if you want your program to perform as expected. What

The SAS/IML language provides two functions for solving a nonsingular nxn linear system A*x = c: The INV function numerically computes the inverse matrix, A-1. You can use this to solve for x: Ainv = inv(A); x = Ainv*c;. The SOLVE function numerically computes the particular solution, x, for a

Recently Charlie Huang showed how to use the SAS/IML language to compute an exponentially weighted moving average of some financial data. In the commentary to his analysis, he said: I found that if a matrix or a vector is declared with specified size before the computation step, the program’s efficiency

SAS is the underwriter of a MeriTalk study released today that focuses on “The Federal Efficiency Opportunity”. The study uncovered meaningful insights into how federal managers and professionals are trying to meet their goals while facing enormous budget cuts. The study was done after President Obama’s Deficit Commission suggested in

A fundamental operation in data analysis is finding data that satisfy some criterion. How many people are older than 85? What are the phone numbers of the voters who are registered Democrats? These questions are examples of locating data with certain properties or characteristics. The SAS DATA step has a

The log transformation is one of the most useful transformations in data analysis. It is used as a transformation to normality and as a variance stabilizing transformation. A log transformation is often used as part of exploratory data analysis in order to visualize (and later model) data that ranges over

Unless you’ve been living under a rock, you’ve heard about the budget problems running rampant across all levels of government. Federal, State and Local Governments are all facing historic budget shortfalls due to the economic crisis and decreased tax receipts. This has led to a much closer examination of services

In my article on computing confidence intervals for rankings, I had to generate p random vectors that each contained N random numbers. Each vector was generated from normal distribution with different parameters. This post compares two different ways to generate p vectors that are sampled from independent normal distributions. Sampling

The other day, someone asked me how to compute a matrix of pairwise differences for a vector of values. The person asking the question was using SQL to do the computation for 2,000 data points, and it was taking many hours to compute the pairwise differences. He asked if SAS/IML

When you pass a matrix as an parameter (argument) to a SAS/IML module, the SAS/IML language does not create a copy of the matrix. That approach, known as "calling by value," is inefficient. It is well-known that languages that implement call-by-value semantics suffer performance penalties. In the SAS/IML language, matrices

Sampling with replacement is a useful technique for simulations and for resampling from data. Over at the SAS/IML Discussion Forum, there was a recent question about how to use SAS/IML software to sample with replacement from a set of events. I have previously blogged about efficient sampling, but this topic

I was recently asked how to create a tridiagonal matrix in SAS/IML software. For example, how can you easily specify the following symmetric tridiagonal matrix without typing all of the zeros? proc iml; m = {1 6 0 0 0, 6 2 7 0 0, 0 7 3 8 0,

In a previous post, I discussed how to use the LOC function to eliminate loops over observations. Dale McLerran chimed in to remind me that another way to improve efficiency is to use subscript reduction operators. I ended my previous post by issuing a challenge: can you write an efficient

I recently read a paper that described a SAS macro to carry out a permutation test. The permutations were generated by PROC IML. (In fact, an internet search for the terms "SAS/IML" and "permutation test" gives dozens of papers in recent years.) The PROC IML code was not as efficient

The SAS/IML language is a vector language, so statements that operate on a few long vectors run much faster than equivalent statements that involve many scalar quantities. For example, in a previous post, I asserted that the LOC function is much faster than writing a loop, for finding observations that

Recently, SAS Global Forum announced the call for papers for the 2011 conference to be held at Caesars Palace in Las Vegas. Since the conference is in Las Vegas, I’ve been thinking a lot about games of chance: blackjack, craps, roulette, and the like. You can analyze these games by

A frequently performed task in data analysis is identifying all the observations in a data set that satisfy certain conditions. For example, you might want to identify all of the female patients in your study or to identify all patients whose systolic blood pressure is greater than 140 mm Hg.

"How do I apply a format to a vector of values in IML? In the DATA step, I can just call the PUTN function.” This question came from a SAS customer that I met recently at a conference. My reply? Use the PUTN function, but send it a vector of