In a recent article, I graphed the PDF of a few Beta distributions that had a variety of skewness and kurtosis values. I thought that I had chosen the parameter values to represent a wide variety of Beta shapes. However, I was surprised to see that the distributions were all

## Tag: **Statistical Programming**

The moment-ratio diagram is a tool that is useful when choosing a distribution that models a sample of univariate data. As I show in my book (Simulating Data with SAS, Wicklin, 2013), you first plot the skewness and kurtosis of the sample on the moment-ratio diagram to see what common

A SAS programmer wanted to simulate samples from a family of Beta(a,b) distributions for a simulation study. (Recall that a Beta random variable is bounded with values in the range [0,1].) She wanted to choose the parameters such that the skewness and kurtosis of the distributions varied over range of

A recent article describes how to estimate coefficients in a simple linear regression model by using maximum likelihood estimation (MLE). One of the nice properties of an MLE formulation is that you can compare a large model with a nested submodel in a natural way. For example, if you can

A statistical analyst used the GENMOD procedure in SAS to fit a linear regression model. He noticed that the table of parameter estimates has an extra row (labeled "Scale") that is not a regression coefficient. The "scale parameter" is not part of the parameter estimates table produced by PROC REG

In a recent Monte Carlo project, I needed to simulate numbers on an interval by using a continuous linear probability density function (PDF). An example is shown to the right. In this example, the linear density function is decreasing on the interval, but the function could also be constant or

I read a journal article in which a researcher used a formula for the probability density function (PDF) of the sample correlation coefficient. The formula was rather complicated, and presented with no citation, so I was curious to learn more. I found the distribution for the correlation coefficient in the

This article looks at a geometric method for estimating the center of a multivariate point cloud. The method is known as convex-hull peeling. In two-dimensions, you can perform convex-hull peeling in SAS 9 by using the CVEXHULL function in SAS IML software. For higher dimensions, you can use the CONVEXHULL

There are two popular ways to express the steepness of a line or ray. The most-often used mathematical definition is from high-school math where the slope is defined as "rise over run." A second way is to report the angle of inclination to the horizontal, as introduced in basic trigonometry.

In a previous article, I presented some of the most popular blog posts from 2023. The popular articles tend to discuss elementary topics that have broad appeal. However, I also wrote many technical articles about advanced topics. The following articles didn't make the Top 10 list, but they deserve a

*The DO Loop*in 2023

In 2023, I wrote 90 articles for The DO Loop blog. My most popular articles were about SAS programming, data visualization, and statistics. In addition, several "general interest" articles were popular, including my article for Pi Day and an article about AI chatbots. If you missed any of these articles,

Statistical software often includes supports for a weight variable. Many SAS procedures make a distinction between integer frequencies and more general "importance weights." Frequencies are supported by using the FREQ statement in SAS procedures; general weights are supported by using the WEIGHT statement. An exception is PROC FREQ, which contains

SAS provides many built-in routines for data analysis. A previous article discusses polychoric correlation, which is a measure of association between two ordinal variables. In SAS, you can use PROC FREQ or PROC CORR to estimate the polychoric correlation, its standard error, and confidence intervals. Although SAS provides a built-in

A previous article discussed how to compute probabilities for the bivariate standard normal distribution. The standard bivariate normal distribution with correlation ρ is denoted BVN(0,ρ). For any point (x,y), you can use the PROBBNRM function in SAS to compute the probability that the random variables (X,Y) ~ BVN(0,ρ) is observed

This article shows how to use SAS to compute the probabilities for two correlated normal variables. Specifically, this article shows how to compute the probabilities for rectangular regions in the plane. A second article discusses the computation over infinite regions such as quadrants. If (X,Y) are random variables that are

The collinearity problem is to determine whether three points in the plane lie along a straight line. You can solve this problem by using middle-school algebra. An algebraic solution requires three steps. First, name the points: p, q, and r. Second, find the parametric equation for the line that passes

Converting a program from one language to another can be a challenge. Even if the languages share many features, there is often syntax that is valid in one language that is not valid in another. Recently, a SAS programmer was converting a program from R to SAS IML. He reached

In several previous articles, I've shown how to use SAS to fit models to data by using maximum likelihood estimation (MLE). However, I have not previously shown how to obtain standard errors for the estimates. This article combines two previous articles to show how to obtain MLE estimates and the

A previous article shows how to use Monte Carlo simulation to approximate the sampling distribution of the sample mean and sample median. When x ~ N(0,1) are normal data, the sample mean is also normal, and there are simple formulas for the expected value and the standard error of the

An elementary course in statistics often includes a discussion of the sampling distribution of a statistic. The canonical example is the sampling distribution of the sample mean. For samples of size n that are drawn from a normally distribution (X ~ N(μ, σ)), the sample mean is normally distributed as

A previous article discusses the birthday problem and its generalizations. The classic birthday problem asks, "In a room that contains N people, what is the probability that two or more people share a birthday?" The probability is much higher than you might think. For example, in a room that contains

The birthday-matching problem (also called the birthday paradox or simply the birthday problem), is a classic problem in probability. Simply stated, the birthday-matching problem asks, "If there are N people in a room, what is the chance that two of them have the same birthday?" The problem is sometimes called

The documentation for Python's SciPy package provides a table that concisely summarizes functions that are associated with continuous probability distributions. This article provides a similar table for SAS functions. For more information on the CDF, PDF, quantile, and random-variate functions, see "Four essential functions for statistical programmers." SAS functions for

Many SAS procedures support a BY statement that enables you to perform an analysis for each unique value of a BY-group variable. The SAS IML language does not support a BY statement, but you can program a loop that iterates over all BY groups. You can emulate BY-group processing by

There are many ways to model a set of raw data by using a continuous probability distribution. It can be challenging, however, to choose the distribution that best models the data. Are the data normal? Lognormal? Is there a theoretical reason to prefer one distribution over another? The SAS has

There are dozens of common probability distributions for a continuous univariate random variable. Familiar examples include the normal, exponential, uniform, gamma, and beta distributions. Where did these distributions come from? Well, some mathematician needed a model for a stochastic process and wrote down the equation for the distribution, typically by

In a previous article, I discussed the Wilcoxon signed rank test, which is a nonparametric test for the location of the median. The Wikipedia article about the signed rank test mentions a variation of the test due to Pratt (1959). Whereas the standard Wilcoxon test excludes values that equal μ0

Wilcoxon's signed rank test is a popular nonparametric alternative to a paired t test. In a paired t test, you analyze measurements for subjects before and after some treatment or intervention. You analyze the difference in the measurements for each subject, and test whether the mean difference is significantly different

A previous article shows an example of a Markov chain model and computes the probability that the system ends up in a terminal state (called an absorbing state). As explained previously, you can often compute exact probabilities for questions about Markov chains. Nevertheless, it can be useful to know how

A previous article shows how to model the probabilities in a discrete-time Markov chain by using a Markov transition matrix. A Markov chain is a discrete-time stochastic process for which the current state of the system determines the probability of the next state. In this process, the probabilities for transitioning