I recently showed how to compute within-group multivariate statistics by using the SAS/IML language. However, a principal of good software design is to encapsulate functionality and write self-contained functions that compute and return the results. What is the best way to return multiple statistics from a SAS/IML module? A convenient

## Tag: **Data Analysis**

A previous article discusses the pooled variance for two or groups of univariate data. The pooled variance is often used during a t test of two independent samples. For multivariate data, the analogous concept is the pooled covariance matrix, which is an average of the sample covariance matrices of the

The first time I saw a formula for the pooled variance, I was quite confused. It looked like Frankenstein's monster, assembled from bits and pieces of other quantities and brought to life by a madman. However, the pooled variance does not have to be a confusing monstrosity. The verb "to

A SAS customer asked how to specify interaction effects between a classification variable and a spline effect in a SAS regression procedure. There are at least two ways to do this. If the SAS procedure supports the EFFECT statement, you can build the interaction term in the MODEL statement. For

I recently read an article that describes ways to compute confidence intervals for the difference in a percentile between two groups. In Eaton, Moore, and MacKenzie (2019), the authors describe a problem in hydrology. The data are the sizes of pebbles (grains) in rivers at two different sites. The authors

The Kullback–Leibler divergence is a measure of dissimilarity between two probability distributions. An application in machine learning is to measure how distributions in a parametric family differ from a data distribution. This article shows that if you minimize the Kullback–Leibler divergence over a set of parameters, you can find a

During this coronavirus pandemic, there are many COVID-related graphs and curves in the news and on social media. The public, politicians, and pundits scrutinize each day's graphs to determine which communities are winning the fight against coronavirus. Interspersed among these many graphs is the oft-repeated mantra, "Flatten the curve!" As

SAS programmers sometimes ask about ways to perform one-dimensional linear interpolation in SAS. This article shows three ways to perform linear interpolation in SAS: PROC IML (in SAS/IML software), PROC EXPAND (in SAS/ETS software), and PROC TRANSREG (in SAS/STAT software). Of these, PROC IML Is the simplest to use and

A previous article describes the funnel plot (Spiegelhalter, 2005), which can identify samples that have rates or proportions that are much different than expected. The funnel plot is a scatter plot that plots the sample proportion of some quantity against the size of the sample. The variance of the sample

Death is always a difficult topic to discuss, and death has been in the news a lot during this tragic coronavirus pandemic. Many news stories focus on states, counties, or cities that have the most cases or the most deaths. A related statistic is the case fatality rate, which is