In a previous article, I showed how to simulate data for a linear regression model with an arbitrary number of continuous explanatory variables. To keep the discussion simple, I simulated a single sample with N observations and p variables. However, to use Monte Carlo methods to approximate the sampling distribution

## Tag: **Simulation**

This article shows how to simulate a data set in SAS that satisfies a least squares regression model for continuous variables. When you simulate to create "synthetic" (or "fake") data, you (the programmer) control the true parameter values, the form of the model, the sample size, and magnitude of the

How can you generate data that contains outliers in a simulation study? The contaminated normal distribution is a simple but useful distribution you can use to simulate outliers. The distribution is easy to explain and understand, and it is also easy to implement in SAS. What is a contaminated normal

In the classic textbook by Johnson and Wichern (Applied Multivariate Statistical Analysis, Third Edition, 1992, p. 164), it says: All measures of goodness-of-fit suffer the same serious drawback. When the sample size is small, only the most aberrant behaviors will be identified as lack of fit. On the other hand,

Somewhere in my past I encountered a panel of histograms for small random samples of normal data. I can't remember the source, but it might have been from John Tukey or William Cleveland. The point of the panel was to emphasize that (because of sampling variation) a small random sample

The 2016 INFORMS Annual Meeting will be held at the Music City Center and Omni Nashville Hotel in downtown Nashville, TN on November 13-16, with pre-conference events starting on Saturday, November 12. SAS will be a major participant in this conference. Over two dozen people from SAS will attend, with

When simulating data or testing algorithms, it is useful to be able to generate patterns of missing data. This article shows how to generate random and systematic patterns of missing values. In other words, this article shows how to replace nonmissing data with missing data. Generate a random pattern of

Although statisticians often assume normally distributed errors, there are important processes for which the error distribution has a heavy tail. A well-known heavy-tailed distribution is the t distribution, but the t distribution is unsuitable for some applications because it does not have finite moments (means, variance,...) for small parameter values.

The article uses the SAS DATA step and Base SAS procedures to estimate the coverage probability of the confidence interval for the mean of normally distributed data. This discussion is based on Section 5.2 (p. 74–77) of Simulating Data with SAS. What is a confidence interval? Recall that a confidence

Analytics Experience 2016 will be held on Sept. 12-14, 2016 at the Bellagio in Las Vegas, NV. There will be a great number of excellent talks and demonstrations at the conference, covering many aspects of SAS analytics and many practical applications. Several of these sessions deal directly with the use

This year's SAS Global Forum conference will take place April 18-21 at The Venetian in Las Vegas. For SAS/OR, SAS staff will present two Super Demos and three papers:

I saw an interesting mathematical result in Wired magazine. The original article was about mathematical research into prime numbers, but the article included the following tantalizing fact: If Alice tosses a [fair]coin until she sees a head followed by a tail, and Bob tosses a coin until he sees two

SAS will have a major presence at the 2016 INFORMS Conference on Business Analytics and Operations Research, which will be held at the Hyatt Regency Grand Cypress hotel in Orlando, FL on April 10-12. Many SAS staff will participate in this conference. SAS/OR, the SAS Global Academic Program, and JMP

Last week I showed how to generate random points uniformly inside a 2-d circular region. That article showed that the distance of a point to the circle's center cannot be distributed uniformly. Instead, you should use the square root of a uniform variate to generate 2-D distances to the origin.

It is easy to generate random points that are uniformly distributed inside a rectangle. You simply generate independent random uniform values for each coordinate. However, nonrectangular regions are more complicated. An instructive example is to simulate points uniformly inside the ball with a given radius. The two-dimensional case is to

There are several ways to simulate multinomial data in SAS. In the SAS/IML matrix language, you can use the RANDMULTINOMIAL function to generate samples from the multinomial distribution. If you don't have a SAS/IML license, I have previously written about how to use the SAS DATA step or PROC SURVEYSELECT

Today is March 14th, which is annually celebrated as Pi Day. Today's date, written as 3/14/16, represents the best five-digit approximation of pi. On Pi Day, many people blog about how to approximate pi. This article uses a Monte Carlo simulation to estimate pi, in spite of the fact that

Many simulation and resampling tasks use one of four sampling methods. When you draw a random sample from a population, you can sample with or without replacement. At the same time, all individuals in the population might have equal probability of being selected, or some individuals might be more likely

How do you sample with replacement in SAS when the probability of choosing each observation varies? I was asked this question recently. The programmer thought he could use PROC SURVEYSELECT to generate the samples, but he wasn't sure which sampling technique he should use to sample with unequal probability. This

Die gescorten Rentiere scharren heute schon ganz nervös mit den Hufen. Bald geht es los! Sie freuen sich schon so auf die Reise. Überall auf der Erde ist es so schön geschmückt, alles leuchtet und blinkt! Und vielleicht liegt sogar ein bisschen Schnee. Heute wird mal wieder simuliert. Unsere bisherigen

The INFORMS 2015 Annual Meeting will be held in Philadelphia November 1-4. More than two dozen SAS staff will participate, and SAS will have three adjacent booths representing SAS/OR (and all of Advanced Analytics), JMP, and the SAS Global Academic Program. SAS is well-represented among the presentations at this meeting,

The FREQ procedure in SAS supports computing exact p-values for many statistical tests. For small and mid-sized problems, the procedure runs very quickly. However, even though PROC FREQ uses efficient methods to avoid unnecessary computations, the computational time required by exact tests might be prohibitively expensive for certain tables. If

How do you simulate a contingency table that has a specified row and column sum? Last week I showed how to simulate a random 2 x 2 contingency table when the marginal frequencies are specified. This article generalizes to random r x c frequency tables (also called cross-tabulations) that have the same marginal row

When modeling and simulating data, it is important to be able to articulate the real-life statistical process that generates the data. Suppose a friend says to you, "I want to simulate two random correlated variables, X and Y." Usually this means that he wants data generated from a multivariate distribution,

In a previous post, I discussed using discrete-event simulation to validate an optimization model and its underlying assumptions. A similar approach can be used to validate queueing models as well. And when it is found that the assumptions required for a queueing model are not a good fit for the

You can use SAS to generate random integers between 1–10 or in the range 1–100. This article shows how to generate random integers as easily as Excel does. I was recently talking with some SAS customers and I was asked "Why can't SAS create an easy way to generate random

In a previous post I described how to simulate random samples from an urn that contains colored balls. The previous article described the case where the balls can be either of two colors. In that csae, all the distributions are univariate. In this article I examine the case where the

If not for probability theory, urns would appear only in funeral homes and anthologies of British poetry. But in probability and statistics, urns are ever present and contain colored balls. The removal and inspection of colored balls from an urn is a classic way to demonstrate probability, sampling, variation, and

Occasionally a SAS statistical programmer will ask me, "How can I construct a large correlation matrix?" Often they are simulating data with SAS or developing a matrix algorithm that involves a correlation matrix. Typically they want a correlation matrix that is too large to input by hand, such as a

SAS/OR 14.1, which became available on July 14, delivers a number of new and enhanced features in optimization and simulation. These changes are designed to make SAS/OR even easier to use and to enable you to model and solve larger, more complex problems more efficiently. If you're using SAS/OR now,