A SAS customer asks: How do I use SAS to generate multiple samples of size N from a multivariate normal distribution? Suppose that you want to simulate k samples (each with N observations) from a multivariate normal distribution with a given mean vector and covariance matrix. Because all of the
Tag: Simulation
I have previously written about how to use the "table" distribution to generate random values from a discrete probability distribution. For example, if there are 50 black marbles, 20 red marbles, and 30 white marbles in a box, the following SAS/IML program simulates random draws (with replacement) of 1,000 marbles:
I wanted to write a blog post about the "Table distribution" in SAS. The Table distribution, which is supported by the RAND and the RANDGEN function, enables you to specify the probability of selecting each of k items. Therefore you can use the Table distribution to sample, with replacement, from
A while ago I saw a blog post on how to simulate Bernoulli outcomes when the probability of generating a 1 (success) varies from observation to observation. I've done this often in SAS, both in the DATA step and in the SAS/IML language. For example, when simulating data that satisfied
Last week I wrote about using acceptance-rejection algorithms in vector languages to simulate data. The main point I made is that in a vector language it is efficient to generate many more variates than are needed, with the knowledge that a certain proportion will be rejected. In last week's article,
A few days ago on the SAS/IML Support Community, there was an interesting discussion about how to simulate data from a truncated Poisson distribution. The SAS/IML user wanted to generate values from a Poisson distribution, but discard any zeros that are generated. This kind of simulation is known as an
I recently encountered a SUGI30 paper by Chuck Kincaid entitled "Guidelines for Selecting the Covariance Structure in Mixed Model Analysis." I think Kincaid does a good job of describing some common covariance structures that are used in mixed models. One of the many uses for SAS/IML is as a language
There are a lot of useful probability distributions that are not featured in standard statistical textbooks. Some of them have distinctive names. In the past year I have had contact with SAS customers who use the Tweedie distribution, the slash distribution, and the PERT distribution. Often these distributions are used
It is easy to simulate data that is uniformly distributed in the unit cube for any dimension. However, it is less obvious how to generate data in the unit simplex. The simplex is the set of points (x1,x2,...,xd) such that Σi xi = 1 and 0 ≤ xi ≤ 1
I was recently flipping through Ross' Simulation (2006, 4th Edition) and saw the following exercise: Let N be the minimum number of draws from a uniform distribution [until the sum of the variates]exceeds 1. What is the expected value of N? Write a simulation to estimate the expected value. For
This article is an excerpt from my forthcoming book Simulating Data with SAS. Not every matrix with 1 on the diagonal and off-diagonal elements in the range [–1, 1] is a valid correlation matrix. A correlation matrix has a special property known as positive semidefiniteness. All correlation matrices are positive
Over the past few years, and especially since I posted my article on eight tips to make your simulation run faster, I have received many emails (often with attached SAS programs) from SAS users who ask for advice about how to speed up their simulation code. For this reason, I
I received the following query regarding the RAND function in Base SAS: In SAS, is specifying 0 as a random number seed the same as not specifying a seed at all? The question concerns initializing the SAS random number stream by using the internal system clock. You can do this
"Help! My simulation is taking too long to run! How can I make it go faster?" I frequently talk with statistical programmers who claim that their "simulations are too slow" (by which they mean, "they take too long"). They suspect that their program is inefficient, but they aren't sure why.
I've been a fan of statistical simulation and other kinds of computer experimentation for many years. For me, simulation is a good way to understand how the world of statistics works, and to formulate and test conjectures. Last week, while investigating the efficiency of the power method for finding dominant
The April 2012 issue of ORMS Today contains a piece on "How analytics enhance the guest experience at Walt Disney World," by Pete Buczkowski and Hai Chu. While many of us are used to forecasting just one or two things (such as unit sales or revenue), Pete and Hai illustrate
In a previous post I showed how to implement Stewart's (1980) algorithm for generating random orthogonal matrices in SAS/IML software. By using the algorithm, it is easy to generate a random matrix that contains a specified set of eigenvalues. If D = diag(λ1, ..., λp) is a diagonal matrix and
Because I am writing a new book about simulating data in SAS, I have been doing a lot of reading and research about how to simulate various quantities. Random integers? Check! Random univariate samples? Check! Random multivariate samples? Check! Recently I've been researching how to generate random matrices. I've blogged
After my post on detecting outliers in multivariate data in SAS by using the MCD method, Peter Flom commented "when there are a bunch of dimensions, every data point is an outlier" and remarked on the curse of dimensionality. What he meant is that most points in a high-dimensional cloud
Covariance, correlation, and distance matrices are a few examples of symmetric matrices that are frequently encountered in statistics. When you create a symmetric matrix, you only need to specify the lower triangular portion of the matrix. The VECH and SQRVECH functions, which were introduced in SAS/IML 9.3, are two functions
A variance-covariance matrix expresses linear relationships between variables. Given the covariances between variables, did you know that you can write down an invertible linear transformation that "uncorrelates" the variables? Conversely, you can transform a set of uncorrelated variables into variables with given covariances. The transformation that works this magic is
The other day I encountered the following SAS DATA step for generating three normally distributed variables. Study it, and see if you can discover what is unnecessary (and misleading!) about this program: data points; drop i; do i=1 to 10; x=rannor(34343); y=rannor(12345); z=rannor(54321); output; end; run; The program creates the
In my article on Buffon's needle experiment, I showed a graph that converges fairly nicely and regularly to the value π, which is the value that the simulation is trying to estimate. This graph is, indeed, a typical graph, as you can verify by running the simulation yourself. However, notice
Buffon's needle experiment for estimating π is a classical example of using an experiment (or a simulation) to estimate a probability. This example is presented in many books on statistical simulation and is famous enough that Brian Ripley in his book Stochastic Simulation states that the problem is "well known
One aspect of blogging that I enjoy is getting feedback from readers. Usually I get statistical or programming questions, but every so often I receive a comment from someone who stumbled across a blog post by way of an internet search. This morning I received the following delightful comment on
I was contacted by SAS Technical Support regarding a customer who was trying to use SAS/IML to compute quantiles of the folded normal distribution. I had heard of the distribution, but it is not built into SAS and I had never worked with it. Nevertheless, I set out to understand
When I learn a new statistical technique, one of first things I do is to understand the limitations of the technique. This blog post shares some thoughts on modeling finite mixture models with the FMM procedure. What is a reasonable task for FMM? When are you asking too much? I
Normal, Poisson, exponential—these and other "named" distributions are used daily by statisticians for modeling and analysis. There are four operations that are used often when you work with statistical distributions. In SAS software, the operations are available by using the following four functions, which are essential for every statistical programmer
Sometimes a population of individuals is modeled as a combination of subpopulations. For example, if you want to model the heights of individuals, you might first model the heights of males and females separately. The height of the population can then be modeled as a combination of the male and
I previously showed how to generate random numbers in SAS by using the RAND function in the DATA step or by using the RANDGEN subroutine in SAS/IML software. These functions generate a stream of random numbers. (In statistics, the random numbers are usually a sample from a distribution such as