A variance-covariance matrix expresses linear relationships between variables. Given the covariances between variables, did you know that you can write down an invertible linear transformation that "uncorrelates" the variables? Conversely, you can transform a set of uncorrelated variables into variables with given covariances. The transformation that works this magic is
Tag: Simulation
The other day I encountered the following SAS DATA step for generating three normally distributed variables. Study it, and see if you can discover what is unnecessary (and misleading!) about this program: data points; drop i; do i=1 to 10; x=rannor(34343); y=rannor(12345); z=rannor(54321); output; end; run; The program creates the
In my article on Buffon's needle experiment, I showed a graph that converges fairly nicely and regularly to the value π, which is the value that the simulation is trying to estimate. This graph is, indeed, a typical graph, as you can verify by running the simulation yourself. However, notice
Buffon's needle experiment for estimating π is a classical example of using an experiment (or a simulation) to estimate a probability. This example is presented in many books on statistical simulation and is famous enough that Brian Ripley in his book Stochastic Simulation states that the problem is "well known
One aspect of blogging that I enjoy is getting feedback from readers. Usually I get statistical or programming questions, but every so often I receive a comment from someone who stumbled across a blog post by way of an internet search. This morning I received the following delightful comment on
I was contacted by SAS Technical Support regarding a customer who was trying to use SAS/IML to compute quantiles of the folded normal distribution. I had heard of the distribution, but it is not built into SAS and I had never worked with it. Nevertheless, I set out to understand
When I learn a new statistical technique, one of first things I do is to understand the limitations of the technique. This blog post shares some thoughts on modeling finite mixture models with the FMM procedure. What is a reasonable task for FMM? When are you asking too much? I
Normal, Poisson, exponential—these and other "named" distributions are used daily by statisticians for modeling and analysis. There are four operations that are used often when you work with statistical distributions. In SAS software, the operations are available by using the following four functions, which are essential for every statistical programmer
Sometimes a population of individuals is modeled as a combination of subpopulations. For example, if you want to model the heights of individuals, you might first model the heights of males and females separately. The height of the population can then be modeled as a combination of the male and
I previously showed how to generate random numbers in SAS by using the RAND function in the DATA step or by using the RANDGEN subroutine in SAS/IML software. These functions generate a stream of random numbers. (In statistics, the random numbers are usually a sample from a distribution such as
You can generate a set of random numbers in SAS that are uniformly distributed by using the RAND function in the DATA step or by using the RANDGEN subroutine in SAS/IML software. (These same functions also generate random numbers from other common distributions such as binomial and normal.) The syntax
I recently blogged about how many times, on average, you must roll a die until you see all six faces. This question is a special case of the coupon collector's problem. My son noted that the expected value (the mean number of rolls) is not necessarily the best statistic to
As I was reviewing notes for my course "Data Simulation for Evaluating Statistical Methods in SAS," I realized that I haven't blogged about simulating categorical data in SAS. This article corrects that oversight. An Easy Way and a Harder Way SAS software makes it easy to sample from discrete "named"
Last week I presented the GSR algorithm, a statistical model of a riffle shuffle. In the model, a deck of n cards is split into two parts according to the binomial distribution. Each piece has roughly n/2 cards. Then cards are dropped from the two stacks according to the number
I recently returned from a five-day conference in Las Vegas. On the way there, I finally had time to read a classic statistical paper: Bayer and Diaconis (1992) describes how many shuffles are needed to randomize a deck of cards. Their famous result that it takes seven shuffles to randomize
In my article on computing confidence intervals for rankings, I had to generate p random vectors that each contained N random numbers. Each vector was generated from normal distribution with different parameters. This post compares two different ways to generate p vectors that are sampled from independent normal distributions. Sampling
In a previous post, I described how to compute means and standard errors for data that I want to rank. The example data (which are available for download) are mean daily delays for 20 US airlines in 2007. The previous post carried out steps 1 and 2 of the method
In my spare time, I enjoy browsing the StackOverflow discussion forum to see what questions people are asking about SAS, SAS/IML, and statistics. Last week, a statistics student asked for help with the following homework problem: I need to generate a one-dimensional random walk in which the step length and
In a previous blog post, I described the rules for a tic-tac-toe scratch-off lottery game and showed that it is a bad idea to generate the game tickets by using a scheme that uses equal probabilities. Instead, cells that yield large cash awards must be assigned a small probability of
Because of this week's story about a geostatistician, Mohan Srivastava, who figured out how predict winning tickets in a scratch-off lottery, I've been thinking about scratch-off games. He discovered how to predict winners when he began to "wonder how they make these [games]." Each ticket has a set of "lucky
Last week I generated two kinds of random point patterns: one from the uniform distribution on a two-dimensional rectangle, the other by jittering a regular grid by a small amount. My show choir director liked the second method (jittering) better because of the way it looks on stage: there are
One of my New Year's resolutions is to learn a new area of statistics. I'm off to a good start, because I recently investigated an issue which started me thinking about spatial statistics—a branch of statistics that I have never formally studied. During the investigation, I asked myself: Given an
"What is the chance that two people in a room of 20 share initials?" This was the question posed to me by a colleague who had been taking notes at a meeting with 20 people. He recorded each person's initials next to their comments and, upon editing the notes, was
SAS/IML software is often used for sampling and simulation studies. For simulating data from univariate distributions, the RANDSEED and RANDGEN subroutines suffice to sample from a wide range of distributions. (I use the terms "sampling from a distribution" and "simulating data from a distribution" interchangeably.) For multivariate simulations, the IMLMLIB
Computing probabilities can be tricky. And if you are a statistician and you get them wrong, you feel pretty foolish. That's why I like to run a quick simulation just to make sure that the numbers that I think are correct are, in fact, correct. My last post of 2010
In many families, siblings draw names so that each family member and spouse gives and receives exactly one present. This year there was a little bit of controversy when a family member noticed that once again she was assigned to give presents to me. This post includes my response to
Sampling with replacement is a useful technique for simulations and for resampling from data. Over at the SAS/IML Discussion Forum, there was a recent question about how to use SAS/IML software to sample with replacement from a set of events. I have previously blogged about efficient sampling, but this topic
I recently read a paper that described a SAS macro to carry out a permutation test. The permutations were generated by PROC IML. (In fact, an internet search for the terms "SAS/IML" and "permutation test" gives dozens of papers in recent years.) The PROC IML code was not as efficient
Recently, SAS Global Forum announced the call for papers for the 2011 conference to be held at Caesars Palace in Las Vegas. Since the conference is in Las Vegas, I’ve been thinking a lot about games of chance: blackjack, craps, roulette, and the like. You can analyze these games by