Blogs

Blogs

Tag: Statistical Programming

Advanced Analytics

Rick WicklinOctober 4, 2012 0

Dice probabilities and the game of "craps"

Gambling games that use dice, such as the game of "craps," are often used to demonstrate the laws of probability. For two dice, the possible rolls and probability of each roll are usually represented by a matrix. Consequently, the SAS/IML language makes it easy to compute the probabilities of various

Read More

Advanced Analytics

Rick WicklinSeptember 26, 2012 0

A surprising result: The expected number of uniform variates whose sum exceeds one

I was recently flipping through Ross' Simulation (2006, 4th Edition) and saw the following exercise: Let N be the minimum number of draws from a uniform distribution [until the sum of the variates]exceeds 1. What is the expected value of N? Write a simulation to estimate the expected value. For

Read More

Advanced Analytics

Rick WicklinSeptember 24, 2012 0

Grouping observations based on quantiles

Sometimes it is useful to group observations based on the values of some variable. Common schemes for grouping include binning and using quantiles. In the binning approach, a variable is divided into k equal intervals, called bins, and each observation is assigned to a bin. In this scheme, the size

Read More

Advanced Analytics

Rick WicklinSeptember 17, 2012 0

Filling the lower and upper triangular portions of a matrix

If you use a word three times, it's yours. -Unknown When I was a child, my mother used to encourage me to increase my vocabulary by saying, "If you use a word three times, it's yours for life." I believe that the same saying holds for programming techniques: Use a

Read More

Advanced Analytics

Rick WicklinSeptember 6, 2012 0

Testing for equality of sets

Ah! The joys of sets! It is easy to test whether two vectors are equal in SAS/IML software. It is only slightly more challenging to test whether two sets are equal. Recall that A and B are equal as sets if they contain the same elements. Order does not matter.

Read More

Advanced Analytics

Rick WicklinAugust 20, 2012 0

How to return multiple values from a SAS/IML function

The SAS/IML language supports user-defined functions (also called modules). Many SAS/IML programmers know that you can use the RETURN function to return a value from a user-defined function. For example, the following function returns the sum of each column of matrix: proc iml; start ColSum(M); return( M[+, ] ); /*

Read More

Advanced Analytics

Rick WicklinAugust 16, 2012 0

Extract the lower triangular elements of a matrix

It is common to want to extract the lower or upper triangular elements of a matrix. For example, if you have a correlation matrix, the lower triangular elements are the nontrivial correlations between variables in your data. As I've written before, you can use the VECH function to extract the

Read More

Advanced Analytics

Rick WicklinJuly 11, 2012 0

Visualize the bivariate normal cumulative distribution

When you are working with probability distributions (normal, Poisson, exponential, and so forth), there are four essential functions that a statistical programmer needs. As I've written before, for common univariate distributions, SAS provides the following functions: the PDF function, which returns the probability density at a given point the CDF

Read More

Advanced Analytics

Rick WicklinJuly 9, 2012 0

Reordering data to match a target order

Suppose that you have two data vectors, x and y, with the same number of elements. How can you rearrange the values of y so that they have the same relative order as the values of x? In other words, find a permutation, π, of the elements of y so

Read More

Advanced Analytics

Rick WicklinJuly 5, 2012 0

Compute the multivariate normal density in SAS

I've been working on a new book about Simulating Data with SAS. In researching the chapter on simulation of multivariate data, I've noticed that the probability density function (PDF) of multivariate distributions is often specified in a matrix form. Consequently, the multivariate density can usually be computed by using the

Read More

Advanced Analytics

Rick WicklinJune 27, 2012 0

Create an ID vector for repeated measurements

I often use the SAS/IML language for simulating data with certain known properties. In fact, I'm writing a book called Simulating Data with SAS. When I simulate repeated measurements (sometimes called replicated data), I often want to generate an ID variable that identifies which measurement is associated with which subject

Read More

Advanced Analytics

Rick WicklinJune 22, 2012 0

Compute a running total for a "window" of time

A reader wrote for help with a computational problem. He has a vector of length N and the vector contains integer values in the range [1, 120], which represent months for which events occurred over a 10-year period. The question is: what is the 24-month period for which the most

Read More

Advanced Analytics

Rick WicklinMay 29, 2012 0

Did you know that PROC IML automatically loads certain modules?

In the SAS/IML language, a user-defined function or subroutine is called a module. Modules are used to extend the capability of the SAS/IML language. Usually you need to explicitly load modules before you use them, but there are two cases where PROC IML loads a module automatically. Modules in IMLMLIB

Read More

Advanced Analytics

Rick WicklinMay 23, 2012 0

Compute statistics for each row by using subscript operators

In a previous blog, I showed how to use SAS/IML subscript reduction operators to compute the location of the maximum values for each row of a matrix. The subscript reduction operators are useful for computing simple statistics for each row (or column) of a numerical matrix. If x is a

Read More

Advanced Analytics

Rick WicklinMay 9, 2012 0

The power method: compute only the largest eigenvalue of a matrix

When I was at SAS Global Forum last week, a SAS user asked my advice regarding a SAS/IML program that he wrote. One step of the program was taking too long to run and he wondered if I could suggest a way to speed it up. The long-running step was

Read More

Advanced Analytics

Rick WicklinMay 7, 2012 0

Checking your answers: Are computed values close to the true values?

In statistical programming, I often test a program by running it on a problem for which I know the correct answer. I often use a single expression to compute the maximum value of the absolute difference between the vectors: maxDiff = max( abs( z-correct ) ); /* largest absolute difference

Read More

Advanced Analytics

Rick WicklinMay 4, 2012 0

Expand data by using frequencies

A reader asked: I want to create a vector as follows. Suppose there are two given vectors x=[A B C] and f=[1 2 3]. Here f indicates the frequency vector. I hope to generate a vector c=[A B B C C C]. I am trying to use the REPEAT function

Read More

Advanced Analytics

Rick WicklinMay 2, 2012 0

The DIF function: Compute lagged differences and finite differences

To a statistician, the DIF function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function has many other uses, including computing finite differences. The DIF function computes the difference between the original vector and a shifted version

Read More

Advanced Analytics

Rick WicklinApril 30, 2012 0

The LAG function: Useful for more than time series analysis

To a statistician, the LAG function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function provides a convenient way to compute quantitites that involve adjacent values in any vector. The LAG function is essentially a "shift operator."

Read More

Advanced Analytics

Rick WicklinApril 20, 2012 0

Popular! Articles that strike a chord with SAS users

I blog about a lot of topics, but the following five categories represent some of my favorite subjects. Judging by the number of readers and comments, these articles have struck a chord with SAS users. If you haven't read them, check them out. (If you HAVE read them, some are

Read More

Rick WicklinApril 18, 2012 0

Extending SAS: How to define new functions in PROC FCMP and SAS/IML software

SAS software provides many run-time functions that you can call from your SAS/IML or DATA step programs. The SAS/IML language has several hundred built-in statistical functions, and Base SAS software contains hundreds more. However, it is common for statistical programmers to extend the run-time library to include special user-defined functions.

Read More

Rick WicklinApril 16, 2012 0

BY-group processing in SAS/IML

Because the SAS/IML language is a general purpose programming language, it doesn't have a BY statement like most other SAS procedures (such as PROC REG). However, there are several ways to loop over categorical variables and perform an analysis on the observations in each category. One way is to use

Read More

Rick WicklinApril 12, 2012 0

The Poissonness plot: A goodness-of-fit diagnostic

Last week I discussed how to fit a Poisson distribution to data. The technique, which involves using the GENMOD procedure, produces a table of some goodness-of-fit statistics, but I find it useful to also produce a graph that indicates the goodness of fit. For continuous distributions, the quantile-quantile (Q-Q) plot

Read More

Rick WicklinApril 9, 2012 0

Vectorized computations and the birthday matching problem

The birthday matching problem is a classic problem in probability theory. The part of it that people tend to remember is that in a room of 23 people, there is greater than 50% chance that two people in the room share a birthday. But the birthday matching problem is also

Read More

Rick WicklinApril 4, 2012 0

Fitting a Poisson distribution to data in SAS

Over at the SAS Discussion Forums, someone asked how to use SAS to fit a Poisson distribution to data. The questioner asked how to fit the distribution but also how to overlay the fitted density on the data and to create a quantile-quantile (Q-Q) plot. The questioner mentioned that the

Read More

Rick WicklinApril 2, 2012 0

Count missing values in observations

Locating missing values is important in statistical data analysis. I've previously written about how to count the number of missing values for each variable in a data set. In Base SAS, I showed how to use the MEANS or FREQ procedures to count missing values. In the SAS/IML language, I

Read More

Rick WicklinMarch 26, 2012 0

ANY versus ALL: Testing the elements of a vector

The fundamental units in the SAS/IML language are matrices and vectors. Consequently, you might wonder about conditional expression such as if v>0 then.... What does this expression mean when v contains more than a single element? Evaluating vector expressions When you test a vector for some condition, expressions like v>0

Read More

Rick WicklinMarch 21, 2012 0

Creating symmetric matrices: Two useful functions with strange names

Covariance, correlation, and distance matrices are a few examples of symmetric matrices that are frequently encountered in statistics. When you create a symmetric matrix, you only need to specify the lower triangular portion of the matrix. The VECH and SQRVECH functions, which were introduced in SAS/IML 9.3, are two functions

Read More

Rick WicklinMarch 19, 2012 0

Row vectors versus column vectors

The SAS/IML language supports both row vectors and column vectors. This is useful for performing linear algebra, but it can cause headaches when you are writing a SAS/IML module. I want my modules to be able to handle both row vectors and column vectors. I don't want the user to

Read More

Rick WicklinMarch 16, 2012 0

Linear interpolation in SAS/IML

A recent discussion on the SAS-L discussion forum concerned how to implement linear interpolation in SAS. Some people suggested using PROC EXPAND in SAS/ETS software, whereas others proposed a DATA step solution. For me, the SAS/IML language provides a natural programming environment to implement an interpolation scheme. It also provides

Read More