A previous article discusses the birthday problem and its generalizations. The classic birthday problem asks, "In a room that contains N people, what is the probability that two or more people share a birthday?" The probability is much higher than you might think. For example, in a room that contains
Tag: Statistical Programming
The birthday-matching problem (also called the birthday paradox or simply the birthday problem), is a classic problem in probability. Simply stated, the birthday-matching problem asks, "If there are N people in a room, what is the chance that two of them have the same birthday?" The problem is sometimes called
The documentation for Python's SciPy package provides a table that concisely summarizes functions that are associated with continuous probability distributions. This article provides a similar table for SAS functions. For more information on the CDF, PDF, quantile, and random-variate functions, see "Four essential functions for statistical programmers." SAS functions for
Many SAS procedures support a BY statement that enables you to perform an analysis for each unique value of a BY-group variable. The SAS IML language does not support a BY statement, but you can program a loop that iterates over all BY groups. You can emulate BY-group processing by
There are many ways to model a set of raw data by using a continuous probability distribution. It can be challenging, however, to choose the distribution that best models the data. Are the data normal? Lognormal? Is there a theoretical reason to prefer one distribution over another? The SAS has
There are dozens of common probability distributions for a continuous univariate random variable. Familiar examples include the normal, exponential, uniform, gamma, and beta distributions. Where did these distributions come from? Well, some mathematician needed a model for a stochastic process and wrote down the equation for the distribution, typically by
In a previous article, I discussed the Wilcoxon signed rank test, which is a nonparametric test for the location of the median. The Wikipedia article about the signed rank test mentions a variation of the test due to Pratt (1959). Whereas the standard Wilcoxon test excludes values that equal μ0
Wilcoxon's signed rank test is a popular nonparametric alternative to a paired t test. In a paired t test, you analyze measurements for subjects before and after some treatment or intervention. You analyze the difference in the measurements for each subject, and test whether the mean difference is significantly different
A previous article shows an example of a Markov chain model and computes the probability that the system ends up in a terminal state (called an absorbing state). As explained previously, you can often compute exact probabilities for questions about Markov chains. Nevertheless, it can be useful to know how
A previous article shows how to model the probabilities in a discrete-time Markov chain by using a Markov transition matrix. A Markov chain is a discrete-time stochastic process for which the current state of the system determines the probability of the next state. In this process, the probabilities for transitioning