Are you still using the old RANUNI, RANNOR, RANBIN, and other "RANXXX" functions to generate random numbers in SAS? If so, here are six reasons why you should switch from these older (1970s) algorithms to the newer (late 1990s) Mersenne-Twister algorithm, which is implemented in the RAND function. The newer
Tag: Simulation
As I wrote in my previous post, a SAS customer noticed that he was getting some duplicate values when he used the RAND function to generate a large number of random uniform values on the interval [0,1]. He wanted to know if this result indicates a bug in the RAND
Tossing dice is a simple and familiar process, yet it can illustrate deep and counterintuitive aspects of random numbers. For example, if you toss four identical six-sided dice, what is the probability that the faces are all distinct, as shown to the left? Many people would guess that the probability
Last week I showed how to use simulation to estimate the power of a statistical test. I used the two-sample t test to illustrate the technique. In my example, the difference between the means of two groups was 1.2, and the simulation estimated a probability of 0.72 that the t
The power of a statistical test measures the test's ability to detect a specific alternate hypothesis. For example, educational researchers might want to compare the mean scores of boys and girls on a standardized test. They plan to use the well-known two-sample t test. The null hypothesis is that the
In my article "Simulation in SAS: The slow way or the BY way," I showed how to use BY-group processing rather than a macro loop in order to efficiently analyze simulated data with SAS. In the example, I analyzed the simulated data by using PROC MEANS, and I use the
In my constant effort to keep pace with Chris Hemedinger, I am pleased to announce the availability of my new book, Simulating Data with SAS. Chris started a tradition for SAS Press authors to post a photo of themselves with their new book. Thanks to everyone who helped with the
A SAS customer asks: How do I use SAS to generate multiple samples of size N from a multivariate normal distribution? Suppose that you want to simulate k samples (each with N observations) from a multivariate normal distribution with a given mean vector and covariance matrix. Because all of the
I have previously written about how to use the "table" distribution to generate random values from a discrete probability distribution. For example, if there are 50 black marbles, 20 red marbles, and 30 white marbles in a box, the following SAS/IML program simulates random draws (with replacement) of 1,000 marbles:
I wanted to write a blog post about the "Table distribution" in SAS. The Table distribution, which is supported by the RAND and the RANDGEN function, enables you to specify the probability of selecting each of k items. Therefore you can use the Table distribution to sample, with replacement, from
A while ago I saw a blog post on how to simulate Bernoulli outcomes when the probability of generating a 1 (success) varies from observation to observation. I've done this often in SAS, both in the DATA step and in the SAS/IML language. For example, when simulating data that satisfied
Last week I wrote about using acceptance-rejection algorithms in vector languages to simulate data. The main point I made is that in a vector language it is efficient to generate many more variates than are needed, with the knowledge that a certain proportion will be rejected. In last week's article,
A few days ago on the SAS/IML Support Community, there was an interesting discussion about how to simulate data from a truncated Poisson distribution. The SAS/IML user wanted to generate values from a Poisson distribution, but discard any zeros that are generated. This kind of simulation is known as an
I recently encountered a SUGI30 paper by Chuck Kincaid entitled "Guidelines for Selecting the Covariance Structure in Mixed Model Analysis." I think Kincaid does a good job of describing some common covariance structures that are used in mixed models. One of the many uses for SAS/IML is as a language
There are a lot of useful probability distributions that are not featured in standard statistical textbooks. Some of them have distinctive names. In the past year I have had contact with SAS customers who use the Tweedie distribution, the slash distribution, and the PERT distribution. Often these distributions are used
It is easy to simulate data that is uniformly distributed in the unit cube for any dimension. However, it is less obvious how to generate data in the unit simplex. The simplex is the set of points (x1,x2,...,xd) such that Σi xi = 1 and 0 ≤ xi ≤ 1
I was recently flipping through Ross' Simulation (2006, 4th Edition) and saw the following exercise: Let N be the minimum number of draws from a uniform distribution [until the sum of the variates]exceeds 1. What is the expected value of N? Write a simulation to estimate the expected value. For
This article is an excerpt from my forthcoming book Simulating Data with SAS. Not every matrix with 1 on the diagonal and off-diagonal elements in the range [–1, 1] is a valid correlation matrix. A correlation matrix has a special property known as positive semidefiniteness. All correlation matrices are positive
Over the past few years, and especially since I posted my article on eight tips to make your simulation run faster, I have received many emails (often with attached SAS programs) from SAS users who ask for advice about how to speed up their simulation code. For this reason, I
I received the following query regarding the RAND function in Base SAS: In SAS, is specifying 0 as a random number seed the same as not specifying a seed at all? The question concerns initializing the SAS random number stream by using the internal system clock. You can do this
"Help! My simulation is taking too long to run! How can I make it go faster?" I frequently talk with statistical programmers who claim that their "simulations are too slow" (by which they mean, "they take too long"). They suspect that their program is inefficient, but they aren't sure why.
I've been a fan of statistical simulation and other kinds of computer experimentation for many years. For me, simulation is a good way to understand how the world of statistics works, and to formulate and test conjectures. Last week, while investigating the efficiency of the power method for finding dominant
The April 2012 issue of ORMS Today contains a piece on "How analytics enhance the guest experience at Walt Disney World," by Pete Buczkowski and Hai Chu. While many of us are used to forecasting just one or two things (such as unit sales or revenue), Pete and Hai illustrate
In a previous post I showed how to implement Stewart's (1980) algorithm for generating random orthogonal matrices in SAS/IML software. By using the algorithm, it is easy to generate a random matrix that contains a specified set of eigenvalues. If D = diag(λ1, ..., λp) is a diagonal matrix and
Because I am writing a new book about simulating data in SAS, I have been doing a lot of reading and research about how to simulate various quantities. Random integers? Check! Random univariate samples? Check! Random multivariate samples? Check! Recently I've been researching how to generate random matrices. I've blogged
After my post on detecting outliers in multivariate data in SAS by using the MCD method, Peter Flom commented "when there are a bunch of dimensions, every data point is an outlier" and remarked on the curse of dimensionality. What he meant is that most points in a high-dimensional cloud
Covariance, correlation, and distance matrices are a few examples of symmetric matrices that are frequently encountered in statistics. When you create a symmetric matrix, you only need to specify the lower triangular portion of the matrix. The VECH and SQRVECH functions, which were introduced in SAS/IML 9.3, are two functions
A variance-covariance matrix expresses linear relationships between variables. Given the covariances between variables, did you know that you can write down an invertible linear transformation that "uncorrelates" the variables? Conversely, you can transform a set of uncorrelated variables into variables with given covariances. The transformation that works this magic is
The other day I encountered the following SAS DATA step for generating three normally distributed variables. Study it, and see if you can discover what is unnecessary (and misleading!) about this program: data points; drop i; do i=1 to 10; x=rannor(34343); y=rannor(12345); z=rannor(54321); output; end; run; The program creates the
In my article on Buffon's needle experiment, I showed a graph that converges fairly nicely and regularly to the value π, which is the value that the simulation is trying to estimate. This graph is, indeed, a typical graph, as you can verify by running the simulation yourself. However, notice