One of the benefits of social media is the opportunity to learn new things. Recently, I saw a post on Twitter that intrigued me. The tweet said that the expected volume of a random tetrahedron in the unit cube (in 3-D) is E[Volume] = 0.0138427757.... This number seems surprisingly small!
Tag: Simulation
A common question on SAS discussion forums is how to use SAS to generate random ID values. The use case is to generate a set of random strings to assign to patients in a clinical study. If you assign each patient a unique ID and delete the patients' names, you
When I was writing Simulating Data with SAS (Wicklin, 2013), I read a lot of introductory textbooks about Monte Carlo simulation. One of my favorites is Sheldon Ross's book Simulation. (I read the 4th Edition (2006); the 5th Edition was published in 2013.) I love that the book brings together
I've previously shown how to use Monte Carlo simulation to estimate probabilities and areas. I illustrated the Monte Carlo method by estimating π ≈ 3.14159... by generating points uniformly at random in a unit square and computing the proportion of those points that were inside the unit circle. The previous
SAS' Bahar Biller reveals how simulations enable KPI generation, risk quantification, risk management and more.
Recently, I wrote about Bartlett's test for sphericity. The purpose of this hypothesis test is to determine whether the variables in the data are uncorrelated. It works by testing whether the sample correlation matrix is close to the identity matrix. Often statistics textbooks or articles include a statement such as
While studying business intelligence as an undergraduate student at business school HEC Montreal, Camille Duchesne encountered Cortex, an analytics simulation that pits participants against each other to develop the most accurate models for a particular task. In this case, the simulation supports a fictional charity by predicting which subjects from
Here's a fun problem to think about: Suppose that you have two different valid ways to test a statistical hypothesis. For a given sample, will both tests reject or fail to reject the hypothesis? Or might one test reject it whereas the other does not? The answer is that two
Several probability distributions model the outcomes of various trials when the probabilities of certain events are given. For some distributions, the definitions make sense even when a probability is 0. For other distributions, the definitions do not make sense unless all probabilities are strictly positive. This article examines how zero
On this blog, I write about a diverse set of topics that are relevant to statistical programming and data visualization. In a previous article, I presented some of the most popular blog posts from 2021. The most popular articles often deal with elementary or familiar topics that are useful to
You can use the Cholesky decomposition of a covariance matrix to simulate data from a correlated multivariate normal distribution. This method is encapsulated in the RANDNORMAL function in SAS/IML software, but you can also perform the computations manually by calling the ROOT function to get the Cholesky root and then
While discussing how to compute convex hulls in SAS with a colleague, we wondered how the size of the convex hull compares to the size of the sample. For most distributions of points, I claimed that the size of the convex hull is much less than the size of the
Recall that the binomial distribution is the distribution of the number of successes in a set of independent Bernoulli trials, each having the same probability of success. Most introductory statistics textbooks discuss the approximation of the binomial distribution by the normal distribution. The graph to the right shows that the
There are times when it is useful to simulate data. One of the reasons I use simulated data sets is to demonstrate statistical techniques such as multiple or logistic regression. By using SAS random functions and some DATA step logic, you can create variables that follow certain distributions or are
SAS' Bahar Biller, an operations researcher, details how to develop a supply chain digital twin.
The field of probability and statistics is full of asymptotic results. The Law of Large Numbers and the Central Limit Theorem are two famous examples. An asymptotic result can be both a blessing and a curse. For example, consider a result that says that the distribution of some statistic converges
A statistical programmer asked how to simulate event-trials data for groups. The subjects in each group have a different probability of experiencing the event. This article describes one way to simulate this scenario. The simulation is similar to simulating from a mixture distribution. This article also shows three different ways
In general, it is hard to simulate multivariate data that has a specified correlation structure. Copulas make that task easier for continuous distributions. A previous article presented the geometry behind a copula and explained copulas in an intuitive way. Although I strongly believe that statistical practitioners should be familiar with
Do you know what a copula is? It is a popular way to simulate multivariate correlated data. The literature for copulas is mathematically formidable, but this article provides an intuitive introduction to copulas by describing the geometry of the transformations that are involved in the simulation process. Although there are
This article uses simulation to demonstrate the fact that any continuous distribution can be transformed into the uniform distribution on (0,1). The function that performs this transformation is a familiar one: it is the cumulative distribution function (CDF). A continuous CDF is defined as an integral, so the transformation is
A previous article showed how to simulate multivariate correlated data by using the Iman-Conover transformation (Iman and Conover, 1982). The transformation preserves the marginal distributions of the original data but permutes the values (columnwise) to induce a new correlation among the variables. When I first read about the Iman-Conover transformation,
Simulating univariate data is relatively easy. Simulating multivariate data is much harder. The main difficulty is to generate variables that have given univariate distributions but also are correlated with each other according to a specified correlation matrix. However, Iman and Conover (1982, "A distribution-free approach to inducing rank correlation among
Quick! Which fraction is bigger, 40/83 or 27/56? It's not always easy to mentally compare two fractions to determine which is larger. For this example, you can easily see that both fractions are a little less than 1/2, but to compare the numbers you need to compare the products 40*56
As mentioned in my article about Monte Carlo estimate of (one-dimensional) integrals, one of the advantages of Monte Carlo integration is that you can perform multivariate integrals on complicated regions. This article demonstrates how to use SAS to obtain a Monte Carlo estimate of a double integral over rectangular and
A previous article shows how to use Monte Carlo simulation to estimate a one-dimensional integral on a finite interval. A larger random sample will (on average) result in an estimate that is closer to the true value of the integral than a smaller sample. This article shows how you can
Numerical integration is important in many areas of applied mathematics and statistics. For one-dimensional integrals on the interval (a, b), SAS software provides two important tools for numerical integration: For common univariate probability distributions, you can use the CDF function to integrate the density, thus obtaining the probability that a
In a previous article, I showed how to generate random points uniformly inside a d-dimensional sphere. In that article, I stated the following fact: If Y is drawn from the uncorrelated multivariate normal distribution, then S = Y / ||Y|| has the uniform distribution on the unit sphere. I was
Imagine an animal that is searching for food in a vast environment where food is scarce. If no prey is nearby, the animal's senses (such as smell and sight) are useless. In that case, a reasonable search strategy is a random walk. The animal can choose a random direction, walk/swim/fly
Do you know that you can create a vector that has a specific correlation with another vector? That is, given a vector, x, and a correlation coefficient, ρ, you can find a vector, y, such that corr(x, y) = ρ. The vectors x and y can have an arbitrary number
When there are two equivalent ways to do something, I advocate choosing the one that is simpler and more efficient. Sometimes, I encounter a SAS program that simulates random numbers in a way that is neither simple nor efficient. This article demonstrates two improvements that you can make to your