When you write a program that simulates data from a statistical model, you should always check that the simulation code is correct. One way to do this is to generate a large simulated sample, estimate the parameters in the simulated data, and make sure that the estimates are close to
Author
The Kronecker product (also called the direct product) is a binary operation that combines two matrices to form a new matrix. The Kronecker product appears in textbooks about the design of experiments and multivariate statistics. The Kronecker product seems intimidating at first, but often one of the matrices in the
Last month a SAS programmer asked how to fit a multivariate Gaussian mixture model in SAS. For univariate data, you can use the FMM Procedure, which fits a large variety of finite mixture models. If your company is using SAS Viya, you can use the MBC or GMM procedures, which
I recently showed how to compute within-group multivariate statistics by using the SAS/IML language. However, a principal of good software design is to encapsulate functionality and write self-contained functions that compute and return the results. What is the best way to return multiple statistics from a SAS/IML module? A convenient
The multivariate normal distribution is used frequently in multivariate statistics and machine learning. In many applications, you need to evaluate the log-likelihood function in order to compare how well different models fit the data. The log-likelihood for a vector x is the natural logarithm of the multivariate normal (MVN) density
A previous article introduces the MAPREDUCE function in the iml action. (The iml action was introduced in Viya 3.5.) The MAPREDUCE function implements the map-reduce paradigm, which is a two-step process for distributing a computation to multiple threads. The example in the previous article adds a set of numbers by
The iml action in SAS Viya (introduced in Viya 3.5) provides a set of general programming tools that you can use to implement a custom parallel algorithm. This makes the iml action different than other Viya actions, which use distributed computations to solve specific problems in statistics, machine learning, and
Testing people for coronavirus is a public health measure that reduces the spread of coronavirus. Dr. Anthony Fauci, a US infectious disease expert, recently mentioned the concept of "pool testing." The verb "to pool" means "to combine from different sources." In a USA Today article, Dr. Deborah Birx, the coordinator
A previous article discusses the pooled variance for two or groups of univariate data. The pooled variance is often used during a t test of two independent samples. For multivariate data, the analogous concept is the pooled covariance matrix, which is an average of the sample covariance matrices of the
The first time I saw a formula for the pooled variance, I was quite confused. It looked like Frankenstein's monster, assembled from bits and pieces of other quantities and brought to life by a madman. However, the pooled variance does not have to be a confusing monstrosity. The verb "to
If you have ever run a Kolmogorov-Smirnov test for normality, you have encountered the Kolmogorov D statistic. The Kolmogorov D statistic is used to assess whether a random sample was drawn from a specified distribution. Although it is frequently used to test for normality, the statistic is "distribution free" in
Sometimes in matrix computations, it is important to display the nonzero elements of a matrix. This can be useful for visualizing the structure of a sparse matrix (one that has many zeros) and it is also useful when describing a matrix algorithm (such as Gaussian elimination) that introduces zeros at
A previous article shows how to use the iml action to read a CAS data table into an IML matrix. This article shows how to write a CAS table from data in an IML matrix. You can read an overview of the iml action, which was introduced in SAS Viya
A previous article compares a SAS/IML program that runs in PROC IML to the same program that runs in the iml action. (You can read an overview of the iml action.) The example in the previous article was very simple and did not read or write data. This article compares
A previous article provides an introduction and overview of the iml action, which is available in SAS Viya 3.5. The article compares the iml action to PROC IML and states that most PROC IML programs can be modified to run in the iml action. This article takes a closer look
This article introduces the iml action, which is available in SAS Viya 3.5. The iml action supports most of the same syntax and functionality as the SAS/IML matrix language, which is implemented in PROC IML. With minimal changes, most programs that run in PROC IML also run in the iml
A SAS customer asked how to specify interaction effects between a classification variable and a spline effect in a SAS regression procedure. There are at least two ways to do this. If the SAS procedure supports the EFFECT statement, you can build the interaction term in the MODEL statement. For
I recently read an article that describes ways to compute confidence intervals for the difference in a percentile between two groups. In Eaton, Moore, and MacKenzie (2019), the authors describe a problem in hydrology. The data are the sizes of pebbles (grains) in rivers at two different sites. The authors
In a previous article, I discussed the definition of the Kullback-Leibler (K-L) divergence between two discrete probability distributions. For completeness, this article shows how to compute the Kullback-Leibler divergence between two continuous distributions. When f and g are discrete distributions, the K-L divergence is the sum of f(x)*log(f(x)/g(x)) over all
The Kullback–Leibler divergence is a measure of dissimilarity between two probability distributions. An application in machine learning is to measure how distributions in a parametric family differ from a data distribution. This article shows that if you minimize the Kullback–Leibler divergence over a set of parameters, you can find a
If you have been learning about machine learning or mathematical statistics, you might have heard about the Kullback–Leibler divergence. The Kullback–Leibler divergence is a measure of dissimilarity between two probability distributions. It measures how much one distribution differs from a reference distribution. This article explains the Kullback–Leibler divergence and shows
This article shows how to perform two-dimensional bilinear interpolation in SAS by using a SAS/IML function. It is assumed that you have observed the values of a response variable on a regular grid of locations. A previous article showed how to interpolate inside one rectangular cell. When you have a
I've previously written about linear interpolation in one dimension. Bilinear interpolation is a method for two-dimensional interpolation on a rectangle. If the value of a function is known at the four corners of a rectangle, an interpolation scheme gives you a way to estimate the function at any point in
This article shows how to find local maxima and maxima on a regression curve, which means finding points where the slope of the curve is zero. An example appears at the right, which shows locations where the loess smoother in a scatter plot has local minima and maxima. Except for
I recently showed how to use linear interpolation in SAS. Linear interpolation is a common way to interpolate between a set of planar points, but the interpolating function (the interpolant) is not smooth. If you want a smoother interpolant, you can use cubic spline interpolation. This article describes how to
During this coronavirus pandemic, there are many COVID-related graphs and curves in the news and on social media. The public, politicians, and pundits scrutinize each day's graphs to determine which communities are winning the fight against coronavirus. Interspersed among these many graphs is the oft-repeated mantra, "Flatten the curve!" As
SAS programmers sometimes ask about ways to perform one-dimensional linear interpolation in SAS. This article shows three ways to perform linear interpolation in SAS: PROC IML (in SAS/IML software), PROC EXPAND (in SAS/ETS software), and PROC TRANSREG (in SAS/STAT software). Of these, PROC IML Is the simplest to use and
Recently I read an excellent blog post by Paul von Hippel entitled "How many imputations do you need?". It is based on a paper (von Hippel, 2018), which provides more details. Suppose you are faced with data that has many missing values. One way to address the missing values is
I've previously written about how to generate points that are uniformly distributed in the unit disk. A seemingly unrelated topic is the distribution of eigenvalues (in the complex plane) of various kinds of random matrices. However, I recently learned that these topics are somewhat related! A mathematical result called the
A previous article describes the funnel plot (Spiegelhalter, 2005), which can identify samples that have rates or proportions that are much different than expected. The funnel plot is a scatter plot that plots the sample proportion of some quantity against the size of the sample. The variance of the sample