Ugh! Your favorite regression procedure just printed a warning to the SAS log. Something is wrong, and your attempt to fit a model to the data has not succeeded. A typical message is "WARNING: The validity of the model fit is questionable," perhaps followed by some additional diagnostic messages about

## Tag: **Simulation**

One of my presentations at SAS Global Forum 2015 was titled "Ten Tips for Simulating Data with SAS". The paper was published in the conference proceedings several months ago, but I recently recorded a short video that gives an overview of the 10 tips: If your browser does not support

Last month I wrote about how to simulate a drunkard's walk in SAS for a drunkard who can move only left or right in one direction. A reader asked whether the problem could be generalized to two dimensions. Yes! This article shows how to simulate a 2-D drunkard's walk, also

The triangular distribution has applications in risk analysis and reliability analysis. It is also a useful theoretical tool because of its simplicity. Its density function is piecewise linear. The standardized distribution is defined on [0,1] and has one parameter, 0 ≤ c ≤ 1, which determines the peak of the

You've probably heard of a random walk, but have you heard about the drunkard's walk? I've previously written about how to simulate a one-dimensional random walk in SAS. In the random walk, you imagine a person who takes a series of steps where the step size and direction is a

SAS procedures can produce a lot of output, but you don't always want to see it all. In simulation and bootstrap studies, you might analyze 10,000 samples or resamples. Usually you are not interested in seeing the results of each analysis displayed on your computer screen. Instead, you want to

The primary objective of many discrete-event simulation projects is system investigation. Output data from the simulation model are used to better understand the operation of the system (whether that system is real or theoretical), as well as to conduct various "what-if"-type analyses. However, I recently worked on another model

The Monty Hall Problem is one of the most famous problems in elementary probability. It is famous because the correct solution is counter-intuitive and because it caused an uproar when it appeared in the "Ask Marilyn" column in Parade magazine in 1990. Discussing the problem has been known to create

♦We learned this week that SAS is ranked #4 on Fortune's 100 Best Companies to Work For in 2015. This makes six straight years ranking in the top four (including twice at #1). ♦The March/April 2015 issue of Analytics Magazine includes a SAS company profile by my colleague Kathy Lange. As

In my book Simulating Data with SAS, I discuss a relationship between the skewness and kurtosis of probability distributions that might not be familiar to some statistical programmers. Namely, the skewness and kurtosis of a probability distribution are not independent. If κ is the full kurtosis of a distribution and

I began 2015 by compiling a list of popular articles from my blog in 2014. Although this "People's Choice" list contains many interesting articles, some of my favorites did not make the list. Today I present the "Editor's Choice" list of articles that deserve a second look. I've highlighted one

Last year, my SAS Simulation Studio R&D team began a discrete-event simulation modeling project of a neonatal intensive care unit (NICU) with two doctors from Duke University’s Division of Neonatal-Perinatal Medicine. After several initial meetings discussing such things as necrotizing enterocolitis (NEC), retinopathy of prematurity (ROP), patent ductus arteriosis (PDA), and

I've written about how to generate a sample from a multivariate normal (MVN) distribution in SAS by using the RANDNORMAL function in SAS/IML software. Last week a SAS/IML programmer showed me a program that simulated MVN data and computed the resulting covariance matrix for each simulated sample. The purpose of

My last blog post showed how to simulate data for a logistic regression model with two continuous variables. To keep the discussion simple, I simulated a single sample with N observations. However, to obtain the sampling distribution of statistics, you need to generate many samples from the same logistic model.

In my book Simulating Data with SAS, I show how to use the SAS DATA step to simulate data from a logistic regression model. Recently there have been discussions on the SAS/IML Support Community about simulating logistic data by using the SAS/IML language. This article describes how to efficiently simulate

A colleague asked me an interesting question: I have a journal article that includes sample quantiles for a variable. Given a new data value, I want to approximate its quantile. I also want to simulate data from the distribution of the published data. Is that possible? This situation is common.

I've pointed out in the past that in the SAS/IML language matrices are passed to modules "by reference." This means that large matrices are not copied in and out of modules but are updated "in place." As a result, the SAS/IML language can be very efficient when it computes with

In my book Simulating Data with SAS, I specify how to generate lognormal data with a shape and scale parameter. The method is simple: you use the RAND function to generate X ~ N(μ, σ), then compute Y = exp(X). The random variable Y is lognormally distributed with parameters μ

While at a conference recently, I was asked whether it was possible to use SAS to simulate data from an inverse gamma distribution. The SAS customer had looked at the documentation for the RAND function and did not see "inverse gamma" listed among the possible choices. The answer is "yes."

While at SAS Global Forum 2014 I attended a talk by Jorge G. Morel on the analysis of data with overdispersion. (His slides are available, along with a video of his presentation.) The Wikipedia defines overdispersion as "greater variability than expected from a simple model." For count data, the "simple

*The DO Loop*that merit a second look

I began 2014 by compiling a list of 13 popular articles from my blog in 2013. Although this "People's Choice" list contains many articles that I am proud of, it did not include all of my favorites, so I decided to compile an "Editor's Choice" list. The blog posts on

My previous post described the multinomial distribution and showed how to generate random data from the multinomial distribution in SAS by using the RANDMULTINOMIAL function in SAS/IML software. The RANDMULTINOMIAL function is simple to use and implements an efficient algorithm called the sequential conditional marginal method (see Gentle (2003), p.

This article describes how to generate random samples from the multinomial distribution in SAS. The content is taken from Chapter 8 of my book Simulating Data with SAS. The multinomial distribution is a discrete multivariate distribution. Suppose there are k different types of items in a box, such as a

This article describes how to implement the truncated normal distribution in SAS. Although the implementation in this article uses the SAS/IML language, you can also implement the ideas and formulas by using the DATA step and PROC FCMP. For reference, I recommend the Wikipedia article on the truncated normal distribution.

There are many techniques for generating random variates from a specified probability distribution such as the normal, exponential, or gamma distribution. However, one technique stands out because of its generality and simplicity: the inverse CDF sampling technique. If you know the cumulative distribution function (CDF) of a probability distribution, then

Are you still using the old RANUNI, RANNOR, RANBIN, and other "RANXXX" functions to generate random numbers in SAS? If so, here are six reasons why you should switch from these older (1970s) algorithms to the newer (late 1990s) Mersenne-Twister algorithm, which is implemented in the RAND function. The newer

As I wrote in my previous post, a SAS customer noticed that he was getting some duplicate values when he used the RAND function to generate a large number of random uniform values on the interval [0,1]. He wanted to know if this result indicates a bug in the RAND

Tossing dice is a simple and familiar process, yet it can illustrate deep and counterintuitive aspects of random numbers. For example, if you toss four identical six-sided dice, what is the probability that the faces are all distinct, as shown to the left? Many people would guess that the probability

Last week I showed how to use simulation to estimate the power of a statistical test. I used the two-sample t test to illustrate the technique. In my example, the difference between the means of two groups was 1.2, and the simulation estimated a probability of 0.72 that the t

The power of a statistical test measures the test's ability to detect a specific alternate hypothesis. For example, educational researchers might want to compare the mean scores of boys and girls on a standardized test. They plan to use the well-known two-sample t test. The null hypothesis is that the