The DO Loop

Rick WicklinJuly 31, 2024 0

Fit, simulate, fit: How models can collapse after generations of recursive fitting

An article published in Nature has the intriguing title, "AI models collapse when trained on recursively generated data." (Shumailov, et al., 2024). The article is quite readable, but I also recommend a less technical overview of the result: "AI models fed AI-generated data quickly spew nonsense" (Gibney, 2024). The Gibney

English

Analytics | Data Visualization | Programming Tips

Rick WicklinJune 10, 2024 1

The distribution of the R-square statistic

A SAS analyst ran a linear regression model and obtained an R-square statistic for the fit. However, he wanted a confidence interval, so he posted a question to a discussion forum asking how to obtain a confidence interval for the R-square parameter. Someone suggested a formula from a textbook (Cohen,

English

Analytics | Learn SAS

Rick WicklinMay 20, 2024 5

On the correctness of a discrete simulation

After writing a program that simulates data, it is important to check that the statistical properties of the simulated (synthetic) data match the properties of the model. As a first step, you can generate a large random sample from the model distribution and compare the sample statistics to the expected

English

Analytics | Programming Tips

Rick WicklinMay 13, 2024 2

The distribution of p-values under the null hypothesis

A SAS statistical programmer recently asked a theoretical question about statistics. "I've read that 'p-values are uniformly distributed under the null hypothesis,'" he began, "but what does that mean in practice? Is it important?" I think data simulation is a great way to discuss the conditions for which p-values are

English

Learn SAS | Programming Tips

Rick WicklinFebruary 19, 2024 0

The linear distribution on an interval

In a recent Monte Carlo project, I needed to simulate numbers on an interval by using a continuous linear probability density function (PDF). An example is shown to the right. In this example, the linear density function is decreasing on the interval, but the function could also be constant or

English

Analytics | Learn SAS

Rick WicklinJanuary 22, 2024 0

Angles vs slopes: The statistics of steepness

There are two popular ways to express the steepness of a line or ray. The most-often used mathematical definition is from high-school math where the slope is defined as "rise over run." A second way is to report the angle of inclination to the horizontal, as introduced in basic trigonometry.

English

Learn SAS | Programming Tips

Rick WicklinJanuary 15, 2024 0

Simulate correlated continuous and discrete variables

Statistical software provides methods to simulate independent random variates from continuous and discrete distributions. For example, in the SAS DATA step, you can use the RAND function to simulate variates from continuous distributions (such as the normal or lognormal distributions) or from discrete distributions (such as the Bernoulli or Poisson).

English

Analytics | Learn SAS | Programming Tips

Rick WicklinSeptember 6, 2023 7

Model data from published summary statistics

There are many ways to model a set of raw data by using a continuous probability distribution. It can be challenging, however, to choose the distribution that best models the data. Are the data normal? Lognormal? Is there a theoretical reason to prefer one distribution over another? The SAS has

English

Learn SAS | Programming Tips

Rick WicklinAugust 30, 2023 2

Simulate the use of personal checks in the US

Does anyone write paper checks anymore? According to researchers at the Federal Reserve Bank of Atlanta (Greene, et al., 2020), the use of paper checks has declined 63% among US consumers since the year 2000. The researchers surveyed more than 3,000 consumers in 2017-2018 and discovered that only 7% of

English

Analytics | Data Visualization | Programming Tips

Rick WicklinAugust 28, 2023 6

Generate random uniform points in an ellipse

I have previously written about how to efficiently generate points uniformly at random inside a sphere (often called a ball by mathematicians). The method uses a mathematical fact from multivariate statistics: If X is drawn from the uncorrelated multivariate normal distribution in dimensiond, then S = r*X / ||X|| has

English

Blogs

Blogs

Tag: Simulation