Simulation studies are used for many purposes, one of which is to examine how distributional assumptions affect the coverage probability of a confidence interval. This article describes the "zipper plot," which enables you to compare the coverage probability of a confidence interval when the data do or do not follow
Author
I often claim that the "natural syntax" of the SAS/IML language makes it easy to implement an algorithm or statistical formula as it appears in a textbook or journal. The other day I had an opportunity to test the truth of that statement. A SAS programmer wanted to implement the
About once a month I see a question on the SAS Support Communities that involves what I like to call "computations with combinations." A typical question asks how to find k values (from a set of p values) that maximize or minimize some function, such as "I have 5 variables,
One of my favorite magazines, Significance, printed an intriguing image of a symmetric matrix that shows repetition in a song's lyrics. The image was created by Colin Morris, who has created many similar images. When I saw these images, I knew that I wanted to duplicate the analysis in SAS!
Welcome to my annual Pi Day post. Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi. Pi is a mathematical constant that never changes. Pi is the same value today as it
Data analysts often fit a probability distribution to data. When you have access to the data, a common technique is to use maximum likelihood estimation (MLE) to compute the parameters of a distribution that are "most likely" to have produced the observed data. However, how can you fit a distribution
Many people know that a surface can contain a saddle point, but did you know that you can define the saddle point of a matrix? Saddle points in matrices are somewhat rare, which means that if you choose a random matrix you are unlikely to choose one that has a
This article shows how to use SAS to solve a system of nonlinear equations. When there are n unknowns and n equations, this problem is equivalent to finding a multivariate root of a vector-valued function F(x) = 0 because you can always write the system as f1(x1, x2, ..., xn)
My article about the difference between CLASS variables and BY variables in SAS focused on SAS analytical procedures. However, the BY statement is also useful in the SAS DATA step where it is used to merge data sets and to analyze data at the group level. When you use the
This article describes and implements a fast algorithm that estimates a median for very large samples. The traditional median estimate sorts a sample of size N and returns the middle value (when N is odd). The algorithm in this article uses Monte Carlo techniques to estimate the median much faster.