Blogs

Blogs

Tag: Simulation

Analytics

Monte Carlo distribution of skewness statistic (B=10000, N=100)

Rick WicklinOctober 28, 2020 0

The sample skewness is a biased statistic

The skewness of a distribution indicates whether a distribution is symmetric or not. The Wikipedia article about skewness discusses two common definitions for the sample skewness, including the definition used by SAS. In the middle of the article, you will discover the following sentence: In general, the [estimators] are both

Read More

Data Visualization | Programming Tips

Decomposition of a convex polygon into triangles

Rick WicklinOctober 21, 2020 0

Generate random points in a polygon

The triangulation theorem for polygons says that every simple polygon can be triangulated. In fact, if the polygon has V vertices, you can decompose it into V-2 non-overlapping triangles. In this article, a "polygon" always means a simple polygon. Also, a "random point" means one that is drawn at random

Read More

Analytics | Programming Tips

Random uniform points in a triangle

Rick WicklinOctober 19, 2020 0

Generate random points in a triangle

How can you efficiently generate N random uniform points in a triangular region of the plane? There is a very cool algorithm (which I call the reflection method) that makes the process easy. I no longer remember where I saw this algorithm, but it is different from the "weighted average"

Read More

Analytics | Programming Tips

Rick WicklinSeptember 28, 2020 0

The Poisson-binomial distribution

The Poisson-binomial distribution is a generalization of the binomial distribution. For the binomial distribution, you carry out N independent and identical Bernoulli trials. Each trial has a probability, p, of success. The total number of successes, which can be between 0 and N, is a binomial random variable. The distribution

Read More

Programming Tips

Rick WicklinAugust 26, 2020 0

Rewinding random number streams: An application

In the paper "Tips and Techniques for Using the Random-Number Generators in SAS" (Sarle and Wicklin, 2018), I discussed an example that uses the new STREAMREWIND subroutine in Base SAS 9.4M5. As its name implies, the STREAMREWIND subroutine rewinds a random number stream, essentially resetting the stream to the beginning.

Read More

Learn SAS | Programming Tips

Rick WicklinAugust 12, 2020 0

Use simulation to estimate the power of a statistical test

A previous article about standardizing data in groups shows how to simulate data from two groups. One sample (with n1=20 observations) is simulated from an N(15, 5) distribution whereas a second (with n2=30 observations) is simulated from an N(16, 5) distribution. The sample means of the two groups are close

Read More

Analytics | Programming Tips

Rick WicklinMarch 16, 2020 0

Predict a random integer: The tradeoff between bias and variance

Books about statistics and machine learning often discuss the tradeoff between bias and variance for an estimator. These discussions are often motivated by a sophisticated predictive model such as a regression or a decision tree. But the basic idea can be seen in much simpler situations. This article presents a

Read More

Advanced Analytics | Data Visualization | Programming Tips

Rick WicklinMarch 9, 2020 0

ROC curves for a binormal sample

In a previous article, I discussed the binormal model for a binary classification problem. This model assumes a set of scores that are normally distributed for each population, and the mean of the scores for the Negative population is less than the mean of scores for the Positive population. I

Read More

Analytics

Rick WicklinJuly 22, 2019 0

Extreme values: What is an extreme value for normally distributed data?

Is 4 an extreme value for the standard normal distribution? In high school, students learn the famous 68-95-99.7 rule, which is a way to remember that 99.7 percent of random observation from a normal distribution are within three standard deviations from the mean. For the standard normal distribution, the probability

Read More

Analytics | Programming Tips

Rick WicklinMay 20, 2019 0

Critical values of the Kolmogorov-Smirnov test

Recently I wrote about how to compute the Kolmogorov D statistic, which is used to determine whether a sample has a particular distribution. One of the beautiful facts about modern computational statistics is that if you can compute a statistic, you can use simulation to estimate the sampling distribution of

Read More

Analytics | Programming Tips

Rick WicklinMay 6, 2019 0

How to simulate data from a generalized linear model

Here's a simulation tip: When you simulate a fixed-effect generalized linear regression model, don't add a random normal error to the linear predictor. Only the response variable should be random. This tip applies to models that apply a link function to a linear predictor, including logistic regression, Poisson regression, and

Read More

Learn SAS | Programming Tips

Rick WicklinApril 29, 2019 0

The normal mixture distribution in SAS

Did you know that SAS provides built-in support for working with probability distributions that are finite mixtures of normal distributions? This article shows examples of using the "NormalMix" distribution in SAS and describes a trick that enables you to easily work with distributions that have many components. As with all

Read More

Analytics | Data Visualization

Rick WicklinMarch 27, 2019 0

How to simulate multivariate outliers

In simulation studies, sometimes you need to simulate outliers. For example, in a simulation study of regression techniques, you might want to generate outliers in the explanatory variables to see how the technique handles high-leverage points. This article shows how to generate outliers in multivariate normal data that are a

Read More

Learn SAS | Programming Tips

Parameter estimates for synthetic (simulated) data that follows a regression model.

Rick WicklinJanuary 28, 2019 0

Simulate data for a regression model with categorical and continuous variables

This article shows how to use SAS to simulate data that fits a linear regression model that has categorical regressors (also called explanatory or CLASS variables). Simulating data is a useful skill for both researchers and statistical programmers. You can use simulation for answering research questions, but you can also

Read More

Analytics | Programming Tips

Rick WicklinOctober 3, 2018 0

Fast simulation of multivariate normal data with an AR(1) correlation structure

It is sometimes necessary for researchers to simulate data with thousands of variables. It is easy to simulate thousands of uncorrelated variables, but more difficult to simulate thousands of correlated variables. For that, you can generate a correlation matrix that has special properties, such as a Toeplitz matrix or a

Read More

Programming Tips

Rick WicklinJuly 11, 2018 0

The probability that two random chords of a circle intersect

In a previous article, I showed how to find the intersection (if it exists) between two line segments in the plane. There are some fun problems in probability theory that involve intersections of line segments. One is "What is the probability that two randomly chosen chords of a circle intersect?"

Read More

Programming Tips

Rick WicklinJune 15, 2018 0

Video: New random number generators in SAS

My 2018 SAS Global Forum paper was about "how to use the random-number generators (RNGs) in SAS." You can read the paper for details, but I recently recorded a short video that summarizes the main ideas in the paper. In particular, the video gives an overview of the new RNGs

Read More

Programming Tips

Rick WicklinJune 6, 2018 0

Sample and obtain the results in random order

The SURVEYSELECT procedure in SAS 9.4M5 supports the OUTRANDOM option, which causes the selected items in a simple random sample to be randomly permuted after they are selected. This article describes several statistical tasks that benefit from this option, including simulating card games, randomly permuting observations in a DATA step,

Read More

Analytics | Programming Tips

How to generate random numbers in SAS

Rick WicklinMay 9, 2018 0

Independent streams of random numbers in SAS

In a previous blog post, I discussed ways to produce statistically independent samples from a random number generator (RNG). The best way is to generate all samples from one stream. However, if your program uses two or more SAS DATA steps to simulate the data, you cannot use the same

Read More

Programming Tips

Rick WicklinMay 7, 2018 0

Independence and overlap in streams of random numbers

Simulation studies require both randomness and reproducibility, two qualities that are sometimes at odds with each other. A Monte Carlo simulation might need to generate millions of random samples, where each sample contains dozens of continuous variables and many thousands of observations. In simulation studies, the researcher wants each sample

Read More

Programming Tips

Rick WicklinApril 16, 2018 0

Random permutations without duplicates

A colleague and I recently discussed how to generate random permutations without encountering duplicates. Given a set of n items, there are n! permutations My colleague wants to generate k unique permutations at random from among the total of n!. Said differently, he wants to sample without replacement from the

Read More

Analytics | Data Visualization

Rick WicklinMarch 26, 2018 0

A zipper plot for visualizing coverage probability in simulation studies

Simulation studies are used for many purposes, one of which is to examine how distributional assumptions affect the coverage probability of a confidence interval. This article describes the "zipper plot," which enables you to compare the coverage probability of a confidence interval when the data do or do not follow

Read More

Analytics | Data Visualization

Rick WicklinFebruary 7, 2018 0

The distribution of shared birthdays in the Birthday Problem

If N random people are in a room, the classical birthday problem provides the probability that at least two people share a birthday. The birthday problem does not consider how many birthdays are in common. However, a generalization (sometimes called the Multiple-Birthday Problem) examines the distribution of the number of

Read More

Programming Tips

Rick WicklinFebruary 5, 2018 0

Simulate the birthday-matching problem

This article simulates the birthday-matching problem in SAS. The birthday-matching problem (also called the birthday problem or birthday paradox) answers the following question: "if there are N people in a room, what is the probability that at least two people share a birthday?" The birthday problem is famous because the

Read More

Learn SAS | Programming Tips

How to generate random numbers in SAS

Rick WicklinJanuary 29, 2018 0

How to use the new random-number generators in SAS

What is a random number generator? What are the random-number generators in SAS, and how can you use them to generate random numbers from probability distributions? In SAS 9.4M5, you can use the STREAMINIT function to select from eight random-number generators (RNGs), including five new RNGs. After choosing an RNG,

Read More

Analytics | Programming Tips

Rick WicklinJanuary 15, 2018 0

Data unavailable? Use the "eyeball distribution" to simulate

Last week I got the following message: Dear Rick: How can I create a normal distribution within a specified range (min and max)? I need to simulate a normal distribution that fits within a specified range. I realize that a normal distribution is by definition infinite... Are there any alternatives,

Read More

Analytics | Programming Tips

Beta-binomial distribution and expected values in SAS

Rick WicklinNovember 20, 2017 0

Simulate data from the beta-binomial distribution in SAS

This article shows how to simulate beta-binomial data in SAS and how to compute the density function (PDF). The beta-binomial distribution is a discrete compound distribution. The "binomial" part of the name means that the discrete random variable X follows a binomial distribution with parameters N (number of trials) and

Read More

Advanced Analytics | Analytics

Christian GoßlerNovember 8, 2017 0

Lenin und der Rote Rapper im Internet of Ticks (IoT5)

„… Internet, Internet, ich hör‘ hier immer Internet. Sag’n Se‘ ma‘, ganz richtig ist das nicht!“ Der Service-Manager errötet nach seinem Rap. Lenin schwankt zwischen Belustigung und bolschewistischem Ingrimm: Stellt der Rote Rapper seine Erfolge im Internet of Things infrage? Der Rapper fährt fort: „Denn diese Daten, die Sie verbraten,

Read More

Programming Tips

Rick WicklinOctober 11, 2017 0

Simulate correlations by using the Wishart distribution

The article "Fisher's transformation of the correlation coefficient" featured a Monte Carlo simulation that generated sample correlations from bivariate normal data. The simulation used three steps: Simulate B samples of size N from a bivariate normal distribution with correlation ρ. Use PROC CORR to compute the sample correlation matrix for

Read More

Learn SAS | Programming Tips

Results of a data-driven simulation in which parameters are stored in a file and processed by a SAS program

Rick WicklinSeptember 27, 2017 0

Data-driven simulation

In a large simulation study, it can be convenient to have a "control file" that contains the parameters for the study. My recent article about how to simulate multivariate normal clusters demonstrates a simple example of this technique. The simulation in that article uses an input data set that contains

Read More

Previous 1 2 3 4 5 … 7 Next