There are many ways to model a set of raw data by using a continuous probability distribution. It can be challenging, however, to choose the distribution that best models the data. Are the data normal? Lognormal? Is there a theoretical reason to prefer one distribution over another? The SAS has
Does anyone write paper checks anymore? According to researchers at the Federal Reserve Bank of Atlanta (Greene, et al., 2020), the use of paper checks has declined 63% among US consumers since the year 2000. The researchers surveyed more than 3,000 consumers in 2017-2018 and discovered that only 7% of
I have previously written about how to efficiently generate points uniformly at random on the surface of a sphere. The method uses a mathematical fact from multivariate statistics: If X is drawn from the uncorrelated multivariate normal distribution, then S = X / ||X|| has the uniform distribution on the
The acceptance-rejection method (sometimes called rejection sampling) is a method that enables you to generate a random sample from an arbitrary distribution by using only the probability density function (PDF). This is in contrast to the inverse CDF method, which uses the cumulative distribution function (CDF) to generate a random
A previous article shows an example of a Markov chain model and computes the probability that the system ends up in a terminal state (called an absorbing state). As explained previously, you can often compute exact probabilities for questions about Markov chains. Nevertheless, it can be useful to know how
The "Teacher’s Corner" of The American Statistician enables statisticians to discuss topics that are relevant to teaching and learning statistics. Sometimes, the articles have practical relevance, too. Andersson (2023) "The Wald Confidence Interval for a Binomial p as an Illuminating 'Bad' Example," is intended for professors and masters-level students in
A previous article describes the metalog distribution (Keelin, 2016). The metalog distribution is a flexible family of distributions that can model a wide range of shapes for data distributions. The metalog system can model bounded, semibounded, and unbounded continuous distributions. This article shows how to use the metalog distribution in
A SAS programmer asked for help to simulate data from a distribution that has certain properties. The distribution must be supported on the interval [a, b] and have a specified mean, μ, where a < μ < b. It turns out that there are infinitely many distributions that satisfy these
SAS programmers love to make special graphs for Valentine's Day. In fact, there is a long history of heart-shaped graphs and love-inspired programs written in SAS! Last year, I added to the collection by showing how a ball bounces on a heart-shaped billiards table. This year, I create a similar
A previous article shows that you can use the Intercept parameter to control the ratio of events to nonevents in a simulation of data from a logistic regression model. If you decrease the intercept parameter, the probability of the event decreases; if you increase the intercept parameter, the probability of
This article shows that you can use the intercept parameter to control the probability of the event in a simulation study that involves a binary logistic regression model. For simplicity, I will simulate data from a logistic regression model that involves only one explanatory variable, but the main idea applies
SAS' Bahar Biller expounds on the idea that stochastic simulations are large-data generation programs for highly complex and dynamic stochastic systems.
A probabilistic card trick is a trick that succeeds with high probability and does not require any skill from the person performing the trick. I have seen a certain trick mentioned several times on social media. I call it "ladders" or the "ladders game" because it reminds me of the
A SAS programmer was trying to simulate poker hands. He was having difficulty because the sampling scheme for simulating card games requires that you sample without replacement for each hand. In statistics, this is called "simple random sampling." If done properly, it is straightforward to simulate poker hands in SAS.
I recently blogged about how to compute the area of the convex hull of a set of planar points. This article discusses the expected value of the area of the convex hull for n random uniform points in the unit square. The article introduces an exact formula (due to Buchta,
One of the benefits of social media is the opportunity to learn new things. Recently, I saw a post on Twitter that intrigued me. The tweet said that the expected volume of a random tetrahedron in the unit cube (in 3-D) is E[Volume] = 0.0138427757.... This number seems surprisingly small!
A common question on SAS discussion forums is how to use SAS to generate random ID values. The use case is to generate a set of random strings to assign to patients in a clinical study. If you assign each patient a unique ID and delete the patients' names, you
When I was writing Simulating Data with SAS (Wicklin, 2013), I read a lot of introductory textbooks about Monte Carlo simulation. One of my favorites is Sheldon Ross's book Simulation. (I read the 4th Edition (2006); the 5th Edition was published in 2013.) I love that the book brings together
I've previously shown how to use Monte Carlo simulation to estimate probabilities and areas. I illustrated the Monte Carlo method by estimating π ≈ 3.14159... by generating points uniformly at random in a unit square and computing the proportion of those points that were inside the unit circle. The previous
SAS' Bahar Biller reveals how simulations enable KPI generation, risk quantification, risk management and more.
Recently, I wrote about Bartlett's test for sphericity. The purpose of this hypothesis test is to determine whether the variables in the data are uncorrelated. It works by testing whether the sample correlation matrix is close to the identity matrix. Often statistics textbooks or articles include a statement such as
While studying business intelligence as an undergraduate student at business school HEC Montreal, Camille Duchesne encountered Cortex, an analytics simulation that pits participants against each other to develop the most accurate models for a particular task. In this case, the simulation supports a fictional charity by predicting which subjects from
Here's a fun problem to think about: Suppose that you have two different valid ways to test a statistical hypothesis. For a given sample, will both tests reject or fail to reject the hypothesis? Or might one test reject it whereas the other does not? The answer is that two
Several probability distributions model the outcomes of various trials when the probabilities of certain events are given. For some distributions, the definitions make sense even when a probability is 0. For other distributions, the definitions do not make sense unless all probabilities are strictly positive. This article examines how zero
On this blog, I write about a diverse set of topics that are relevant to statistical programming and data visualization. In a previous article, I presented some of the most popular blog posts from 2021. The most popular articles often deal with elementary or familiar topics that are useful to
You can use the Cholesky decomposition of a covariance matrix to simulate data from a correlated multivariate normal distribution. This method is encapsulated in the RANDNORMAL function in SAS/IML software, but you can also perform the computations manually by calling the ROOT function to get the Cholesky root and then
While discussing how to compute convex hulls in SAS with a colleague, we wondered how the size of the convex hull compares to the size of the sample. For most distributions of points, I claimed that the size of the convex hull is much less than the size of the
Recall that the binomial distribution is the distribution of the number of successes in a set of independent Bernoulli trials, each having the same probability of success. Most introductory statistics textbooks discuss the approximation of the binomial distribution by the normal distribution. The graph to the right shows that the
There are times when it is useful to simulate data. One of the reasons I use simulated data sets is to demonstrate statistical techniques such as multiple or logistic regression. By using SAS random functions and some DATA step logic, you can create variables that follow certain distributions or are
SAS' Bahar Biller, an operations researcher, details how to develop a supply chain digital twin.