There are many ways to model a set of raw data by using a continuous probability distribution. It can be challenging, however, to choose the distribution that best models the data. Are the data normal? Lognormal? Is there a theoretical reason to prefer one distribution over another? The SAS has

## Tag: **Statistical Programming**

There are dozens of common probability distributions for a continuous univariate random variable. Familiar examples include the normal, exponential, uniform, gamma, and beta distributions. Where did these distributions come from? Well, some mathematician needed a model for a stochastic process and wrote down the equation for the distribution, typically by

In a previous article, I discussed the Wilcoxon signed rank test, which is a nonparametric test for the location of the median. The Wikipedia article about the signed rank test mentions a variation of the test due to Pratt (1959). Whereas the standard Wilcoxon test excludes values that equal μ0

Wilcoxon's signed rank test is a popular nonparametric alternative to a paired t test. In a paired t test, you analyze measurements for subjects before and after some treatment or intervention. You analyze the difference in the measurements for each subject, and test whether the mean difference is significantly different

A previous article shows an example of a Markov chain model and computes the probability that the system ends up in a terminal state (called an absorbing state). As explained previously, you can often compute exact probabilities for questions about Markov chains. Nevertheless, it can be useful to know how

A previous article shows how to model the probabilities in a discrete-time Markov chain by using a Markov transition matrix. A Markov chain is a discrete-time stochastic process for which the current state of the system determines the probability of the next state. In this process, the probabilities for transitioning

Given a set of N points in k-dimensional space, can you find the location that minimizes the sum of the distances to the points? The location that minimizes the distances is called the geometric median of the points. For univariate data, the "points" are merely a set of numbers \(\{p_1,

Labeling objects in graphs can be difficult. SAS has a long history of providing support for labeling markers in scatter plots and for labeling regions on a map. This article discusses how the SGPLOT procedure decides where to put a label for a polygon. It discusses the advantages and disadvantages

SAS supports many ways to compute the rank of a numeric variable and to handle tied values. However, sometimes I need to rank the values in a character categorical variable. For example, the values {"Male", "Female", "Male"} have ranks {2, 1, 2} because, in alphabetical order, "Female" is the first-ranked

A previous article defines the silhouette statistic (Rousseeuw, 1987) and shows how to use it to identify observations in a cluster analysis that are potentially misclassified. The article provides many graphs, including the silhouette plot, which is a bar chart or histogram that displays the distribution of the silhouette statistic