A previous article provides an introduction and overview of the iml action, which is available in SAS Viya 3.5. The article compares the iml action to PROC IML and states that most PROC IML programs can be modified to run in iml action. This article takes a closer look at

# Author

This article introduces the iml action, which is available in SAS Viya 3.5. The iml action supports most of the same syntax and functionality as the SAS/IML matrix language, which is implemented in PROC IML. With minimal changes, most programs that run in PROC IML also run in the iml

A SAS customer asked how to specify interaction effects between a classification variable and a spline effect in a SAS regression procedure. There are at least two ways to do this. If the SAS procedure supports the EFFECT statement, you can build the interaction term in the MODEL statement. For

I recently read an article that describes ways to compute confidence intervals for the difference in a percentile between two groups. In Eaton, Moore, and MacKenzie (2019), the authors describe a problem in hydrology. The data are the sizes of pebbles (grains) in rivers at two different sites. The authors

In a previous article, I discussed the definition of the Kullback-Leibler (K-L) divergence between two discrete probability distributions. For completeness, this article shows how to compute the Kullback-Leibler divergence between two continuous distributions. When f and g are discrete distributions, the K-L divergence is the sum of f(x)*log(f(x)/g(x)) over all

The Kullback–Leibler divergence is a measure of dissimilarity between two probability distributions. An application in machine learning is to measure how distributions in a parametric family differ from a data distribution. This article shows that if you minimize the Kullback–Leibler divergence over a set of parameters, you can find a

If you have been learning about machine learning or mathematical statistics, you might have heard about the Kullback–Leibler divergence. The Kullback–Leibler divergence is a measure of dissimilarity between two probability distributions. It measures how much one distribution differs from a reference distribution. This article explains the Kullback–Leibler divergence and shows

This article shows how to perform two-dimensional bilinear interpolation in SAS by using a SAS/IML function. It is assumed that you have observed the values of a response variable on a regular grid of locations. A previous article showed how to interpolate inside one rectangular cell. When you have a

I've previously written about linear interpolation in one dimension. Bilinear interpolation is a method for two-dimensional interpolation on a rectangle. If the value of a function is known at the four corners of a rectangle, an interpolation scheme gives you a way to estimate the function at any point in

This article shows how to find local maxima and maxima on a regression curve, which means finding points where the slope of the curve is zero. An example appears at the right, which shows locations where the loess smoother in a scatter plot has local minima and maxima. Except for

I recently showed how to use linear interpolation in SAS. Linear interpolation is a common way to interpolate between a set of planar points, but the interpolating function (the interpolant) is not smooth. If you want a smoother interpolant, you can use cubic spline interpolation. This article describes how to

During this coronavirus pandemic, there are many COVID-related graphs and curves in the news and on social media. The public, politicians, and pundits scrutinize each day's graphs to determine which communities are winning the fight against coronavirus. Interspersed among these many graphs is the oft-repeated mantra, "Flatten the curve!" As

SAS programmers sometimes ask about ways to perform one-dimensional linear interpolation in SAS. This article shows three ways to perform linear interpolation in SAS: PROC IML (in SAS/IML software), PROC EXPAND (in SAS/ETS software), and PROC TRANSREG (in SAS/STAT software). Of these, PROC IML Is the simplest to use and

Recently I read an excellent blog post by Paul von Hippel entitled "How many imputations do you need?". It is based on a paper (von Hippel, 2018), which provides more details. Suppose you are faced with data that has many missing values. One way to address the missing values is

I've previously written about how to generate points that are uniformly distributed in the unit disk. A seemingly unrelated topic is the distribution of eigenvalues (in the complex plane) of various kinds of random matrices. However, I recently learned that these topics are somewhat related! A mathematical result called the

A previous article describes the funnel plot (Spiegelhalter, 2005), which can identify samples that have rates or proportions that are much different than expected. The funnel plot is a scatter plot that plots the sample proportion of some quantity against the size of the sample. The variance of the sample

Death is always a difficult topic to discuss, and death has been in the news a lot during this tragic coronavirus pandemic. Many news stories focus on states, counties, or cities that have the most cases or the most deaths. A related statistic is the case fatality rate, which is

I previously wrote about the advantages of adding horizontal and vertical reference lines to a graph. You can also add a diagonal reference line to a graph. The SGPLOT procedure in SAS supports two primary ways to add a diagonal reference line: The LINEPARM statement enables you to specify a

Data tell a story. A purpose of data visualization is to convey that story to the reader in a clear and impactful way. Sometimes you can let the data "speak for themselves" in an unadorned graphic, but sometimes it is helpful to add reference lines to a graph to emphasize

Every day we face risks. If we drive to work, we risk a fatal auto accident. If we eat red meat and fatty foods, we risk a heart attack. If we go out in public during a pandemic, we risk contracting a disease. A logical response to risk is to

I have written several articles about how to work with continuous probability distributions in SAS. I always emphasize that it is important to be able to compute the four essential functions for working with a statistical distribution. Namely, you need to know how to generate random values, how to compute

During an epidemic, such as the coronavirus pandemic of 2020, the media often shows graphs of the cumulative numbers of confirmed cases for different countries. Often these graphs use a logarithmic scale for the vertical axis. In these graphs, a straight line indicates that new cases are increasing at an

A cumulative curve shows the total amount of some quantity at multiple points in time. Examples include: Total sales of songs, movies, or books, beginning when the item is released. Total views of blog posts, beginning when the post is published. Total cases of a disease for different countries, beginning

During an outbreak of a disease, such as the coronavirus (COVID-19) pandemic, the media shows daily graphs that convey the spread of the disease. The following two graphs appear frequently: New cases for each day (or week). This information is usually shown as a histogram or needle plot. The graph

When you create a graph by using the SGPLOT procedure in SAS, usually the default tick locations are acceptable. Sometimes, however, you might want to specify a set of custom tick values for one or both axes. This article shows three examples: Specify evenly spaced values. Specify tick values that

A SAS/IML programmer asked about the best way to print multiple SAS/IML variables when each variable needs a different format. He wanted the output to resemble the "Parameter Estimates" table that is produced by PROC REG and other SAS/STAT procedures. This article shows four ways to print SAS/IML vectors in

Books about statistics and machine learning often discuss the tradeoff between bias and variance for an estimator. These discussions are often motivated by a sophisticated predictive model such as a regression or a decision tree. But the basic idea can be seen in much simpler situations. This article presents a

Recently, I saw a graphic on Twitter by @neilrkaye that showed the rapid convergence of a regular polygon to a circle as you increase the number of sides for the polygon. The author remarked that polygons that have 40 or more sides "all look like circles to me." That is,

In a previous article, I discussed the binormal model for a binary classification problem. This model assumes a set of scores that are normally distributed for each population, and the mean of the scores for the Negative population is less than the mean of scores for the Positive population. I

Suppose that a data set contains a set of parameter values. For each row of parameters, you need to perform some computation. A recent discussion on the SAS Support Communities mentions an important point: if there are duplicate rows in the data, a program might repeat the same computation several