## Tag: Simulation

0
Simulate proportions for groups

A statistical programmer asked how to simulate event-trials data for groups. The subjects in each group have a different probability of experiencing the event. This article describes one way to simulate this scenario. The simulation is similar to simulating from a mixture distribution. This article also shows three different ways

0
Simulate multivariate correlated data by using PROC COPULA in SAS

In general, it is hard to simulate multivariate data that has a specified correlation structure. Copulas make that task easier for continuous distributions. A previous article presented the geometry behind a copula and explained copulas in an intuitive way. Although I strongly believe that statistical practitioners should be familiar with

0
An introduction to simulating correlated data by using copulas

Do you know what a copula is? It is a popular way to simulate multivariate correlated data. The literature for copulas is mathematically formidable, but this article provides an intuitive introduction to copulas by describing the geometry of the transformations that are involved in the simulation process. Although there are

Programming Tips
0
The probability integral transform

This article uses simulation to demonstrate the fact that any continuous distribution can be transformed into the uniform distribution on (0,1). The function that performs this transformation is a familiar one: it is the cumulative distribution function (CDF). A continuous CDF is defined as an integral, so the transformation is

0
The geometry of the Iman-Conover transformation

A previous article showed how to simulate multivariate correlated data by using the Iman-Conover transformation (Iman and Conover, 1982). The transformation preserves the marginal distributions of the original data but permutes the values (columnwise) to induce a new correlation among the variables. When I first read about the Iman-Conover transformation,

0
Simulate correlated variables by using the Iman-Conover transformation

Simulating univariate data is relatively easy. Simulating multivariate data is much harder. The main difficulty is to generate variables that have given univariate distributions but also are correlated with each other according to a specified correlation matrix. However, Iman and Conover (1982, "A distribution-free approach to inducing rank correlation among

0
Odani's truism: A probabilistic way to compare fractions

Quick! Which fraction is bigger, 40/83 or 27/56? It's not always easy to mentally compare two fractions to determine which is larger. For this example, you can easily see that both fractions are a little less than 1/2, but to compare the numbers you need to compare the products 40*56

0
Double integrals by using Monte Carlo methods

As mentioned in my article about Monte Carlo estimate of (one-dimensional) integrals, one of the advantages of Monte Carlo integration is that you can perform multivariate integrals on complicated regions. This article demonstrates how to use SAS to obtain a Monte Carlo estimate of a double integral over rectangular and

0
Sample size for the Monte Carlo estimate of an integral

A previous article shows how to use Monte Carlo simulation to estimate a one-dimensional integral on a finite interval. A larger random sample will (on average) result in an estimate that is closer to the true value of the integral than a smaller sample. This article shows how you can

0
Estimate an integral by using Monte Carlo simulation

Numerical integration is important in many areas of applied mathematics and statistics. For one-dimensional integrals on the interval (a, b), SAS software provides two important tools for numerical integration: For common univariate probability distributions, you can use the CDF function to integrate the density, thus obtaining the probability that a

0
Generate random points on a sphere

In a previous article, I showed how to generate random points uniformly inside a d-dimensional sphere. In that article, I stated the following fact: If Y is drawn from the uncorrelated multivariate normal distribution, then S = Y / ||Y|| has the uniform distribution on the unit sphere. I was

Programming Tips
0
Gaussian random walks and Levy flights

Imagine an animal that is searching for food in a vast environment where food is scarce. If no prey is nearby, the animal's senses (such as smell and sight) are useless. In that case, a reasonable search strategy is a random walk. The animal can choose a random direction, walk/swim/fly

0
Find a vector that has a specified correlation with another vector

Do you know that you can create a vector that has a specific correlation with another vector? That is, given a vector, x, and a correlation coefficient, ρ, you can find a vector, y, such that corr(x, y) = ρ. The vectors x and y can have an arbitrary number

0
Tips to simulate binary and categorical variables

When there are two equivalent ways to do something, I advocate choosing the one that is simpler and more efficient. Sometimes, I encounter a SAS program that simulates random numbers in a way that is neither simple nor efficient. This article demonstrates two improvements that you can make to your

Analytics
0
The sample skewness is a biased statistic

The skewness of a distribution indicates whether a distribution is symmetric or not. The Wikipedia article about skewness discusses two common definitions for the sample skewness, including the definition used by SAS. In the middle of the article, you will discover the following sentence: In general, the [estimators] are both

0
Generate random points in a polygon

The triangulation theorem for polygons says that every simple polygon can be triangulated. In fact, if the polygon has V vertices, you can decompose it into V-2 non-overlapping triangles. In this article, a "polygon" always means a simple polygon. Also, a "random point" means one that is drawn at random

0
Generate random points in a triangle

How can you efficiently generate N random uniform points in a triangular region of the plane? There is a very cool algorithm (which I call the reflection method) that makes the process easy. I no longer remember where I saw this algorithm, but it is different from the "weighted average"

0
The Poisson-binomial distribution

The Poisson-binomial distribution is a generalization of the binomial distribution. For the binomial distribution, you carry out N independent and identical Bernoulli trials. Each trial has a probability, p, of success. The total number of successes, which can be between 0 and N, is a binomial random variable. The distribution

Programming Tips
0
Rewinding random number streams: An application

In the paper "Tips and Techniques for Using the Random-Number Generators in SAS" (Sarle and Wicklin, 2018), I discussed an example that uses the new STREAMREWIND subroutine in Base SAS 9.4M5. As its name implies, the STREAMREWIND subroutine rewinds a random number stream, essentially resetting the stream to the beginning.

0
Use simulation to estimate the power of a statistical test

A previous article about standardizing data in groups shows how to simulate data from two groups. One sample (with n1=20 observations) is simulated from an N(15, 5) distribution whereas a second (with n2=30 observations) is simulated from an N(16, 5) distribution. The sample means of the two groups are close

0
Predict a random integer: The tradeoff between bias and variance

Books about statistics and machine learning often discuss the tradeoff between bias and variance for an estimator. These discussions are often motivated by a sophisticated predictive model such as a regression or a decision tree. But the basic idea can be seen in much simpler situations. This article presents a

0
ROC curves for a binormal sample

In a previous article, I discussed the binormal model for a binary classification problem. This model assumes a set of scores that are normally distributed for each population, and the mean of the scores for the Negative population is less than the mean of scores for the Positive population. I

Analytics
0
Extreme values: What is an extreme value for normally distributed data?

Is 4 an extreme value for the standard normal distribution? In high school, students learn the famous 68-95-99.7 rule, which is a way to remember that 99.7 percent of random observation from a normal distribution are within three standard deviations from the mean. For the standard normal distribution, the probability

0
Critical values of the Kolmogorov-Smirnov test

Recently I wrote about how to compute the Kolmogorov D statistic, which is used to determine whether a sample has a particular distribution. One of the beautiful facts about modern computational statistics is that if you can compute a statistic, you can use simulation to estimate the sampling distribution of

0
How to simulate data from a generalized linear model

Here's a simulation tip: When you simulate a fixed-effect generalized linear regression model, don't add a random normal error to the linear predictor. Only the response variable should be random. This tip applies to models that apply a link function to a linear predictor, including logistic regression, Poisson regression, and

0
The normal mixture distribution in SAS

Did you know that SAS provides built-in support for working with probability distributions that are finite mixtures of normal distributions? This article shows examples of using the "NormalMix" distribution in SAS and describes a trick that enables you to easily work with distributions that have many components. As with all

0
How to simulate multivariate outliers

In simulation studies, sometimes you need to simulate outliers. For example, in a simulation study of regression techniques, you might want to generate outliers in the explanatory variables to see how the technique handles high-leverage points. This article shows how to generate outliers in multivariate normal data that are a

0
Simulate data for a regression model with categorical and continuous variables

This article shows how to use SAS to simulate data that fits a linear regression model that has categorical regressors (also called explanatory or CLASS variables). Simulating data is a useful skill for both researchers and statistical programmers. You can use simulation for answering research questions, but you can also

0
Fast simulation of multivariate normal data with an AR(1) correlation structure

It is sometimes necessary for researchers to simulate data with thousands of variables. It is easy to simulate thousands of uncorrelated variables, but more difficult to simulate thousands of correlated variables. For that, you can generate a correlation matrix that has special properties, such as a Toeplitz matrix or a

Programming Tips
0
The probability that two random chords of a circle intersect

In a previous article, I showed how to find the intersection (if it exists) between two line segments in the plane. There are some fun problems in probability theory that involve intersections of line segments. One is "What is the probability that two randomly chosen chords of a circle intersect?"

1 2 3 6