Blogs

Blogs

Author

Rick Wicklin

Rick Wicklin RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Learn SAS | Programming Tips

Architecture of an MPP session in SAS Viya. The client calls an action, which can use multiple nodes and threads.

Rick WicklinJune 10, 2020 0

An introduction to the iml action in SAS Viya

This article introduces the iml action, which is available in SAS Viya 3.5. The iml action supports most of the same syntax and functionality as the SAS/IML matrix language, which is implemented in PROC IML. With minimal changes, most programs that run in PROC IML also run in the iml

Read More

Analytics

Rick WicklinJune 8, 2020 0

Interactions with spline effects in regression models

A SAS customer asked how to specify interaction effects between a classification variable and a spline effect in a SAS regression procedure. There are at least two ways to do this. If the SAS procedure supports the EFFECT statement, you can build the interaction term in the MODEL statement. For

Read More

Analytics | Learn SAS

Rick WicklinJune 3, 2020 0

How to estimate the difference between percentiles

I recently read an article that describes ways to compute confidence intervals for the difference in a percentile between two groups. In Eaton, Moore, and MacKenzie (2019), the authors describe a problem in hydrology. The data are the sizes of pebbles (grains) in rivers at two different sites. The authors

Read More

Advanced Analytics | Machine Learning

Rick WicklinJune 1, 2020 0

The Kullback–Leibler divergence between continuous probability distributions

In a previous article, I discussed the definition of the Kullback-Leibler (K-L) divergence between two discrete probability distributions. For completeness, this article shows how to compute the Kullback-Leibler divergence between two continuous distributions. When f and g are discrete distributions, the K-L divergence is the sum of f(x)*log(f(x)/g(x)) over all

Read More

Advanced Analytics | Machine Learning

Rick WicklinMay 28, 2020 0

Minimizing the Kullback–Leibler divergence

The Kullback–Leibler divergence is a measure of dissimilarity between two probability distributions. An application in machine learning is to measure how distributions in a parametric family differ from a data distribution. This article shows that if you minimize the Kullback–Leibler divergence over a set of parameters, you can find a

Read More

Advanced Analytics | Machine Learning

Rick WicklinMay 26, 2020 0

The Kullback–Leibler divergence between discrete probability distributions

If you have been learning about machine learning or mathematical statistics, you might have heard about the Kullback–Leibler divergence. The Kullback–Leibler divergence is a measure of dissimilarity between two probability distributions. It measures how much one distribution differs from a reference distribution. This article explains the Kullback–Leibler divergence and shows

Read More

Programming Tips

Rick WicklinMay 20, 2020 0

Bilinear interpolation in SAS

This article shows how to perform two-dimensional bilinear interpolation in SAS by using a SAS/IML function. It is assumed that you have observed the values of a response variable on a regular grid of locations. A previous article showed how to interpolate inside one rectangular cell. When you have a

Read More

Programming Tips

Rick WicklinMay 18, 2020 0

What is bilinear interpolation?

I've previously written about linear interpolation in one dimension. Bilinear interpolation is a method for two-dimensional interpolation on a rectangle. If the value of a function is known at the four corners of a rectangle, an interpolation scheme gives you a way to estimate the function at any point in

Read More

Analytics | Data Visualization

Rick WicklinMay 13, 2020 0

Find points where a regression curve has zero slope

This article shows how to find local maxima and maxima on a regression curve, which means finding points where the slope of the curve is zero. An example appears at the right, which shows locations where the loess smoother in a scatter plot has local minima and maxima. Except for

Read More

Analytics | Programming Tips

Rick WicklinMay 11, 2020 0

Cubic spline interpolation in SAS

I recently showed how to use linear interpolation in SAS. Linear interpolation is a common way to interpolate between a set of planar points, but the interpolating function (the interpolant) is not smooth. If you want a smoother interpolant, you can use cubic spline interpolation. This article describes how to

Read More

Analytics | Data Visualization

Rick WicklinMay 6, 2020 0

What does 'flatten the curve' mean? To which curve does it apply?

During this coronavirus pandemic, there are many COVID-related graphs and curves in the news and on social media. The public, politicians, and pundits scrutinize each day's graphs to determine which communities are winning the fight against coronavirus. Interspersed among these many graphs is the oft-repeated mantra, "Flatten the curve!" As

Read More

Analytics | Programming Tips

Rick WicklinMay 4, 2020 0

Linear interpolation in SAS

SAS programmers sometimes ask about ways to perform one-dimensional linear interpolation in SAS. This article shows three ways to perform linear interpolation in SAS: PROC IML (in SAS/IML software), PROC EXPAND (in SAS/ETS software), and PROC TRANSREG (in SAS/STAT software). Of these, PROC IML Is the simplest to use and

Read More

Analytics

Rick WicklinApril 29, 2020 0

How many imputations are enough?

Recently I read an excellent blog post by Paul von Hippel entitled "How many imputations do you need?". It is based on a paper (von Hippel, 2018), which provides more details. Suppose you are faced with data that has many missing values. One way to address the missing values is

Read More

Analytics | Programming Tips

Rick WicklinApril 27, 2020 0

The circular law for eigenvalues

I've previously written about how to generate points that are uniformly distributed in the unit disk. A seemingly unrelated topic is the distribution of eigenvalues (in the complex plane) of various kinds of random matrices. However, I recently learned that these topics are somewhat related! A mathematical result called the

Read More

Analytics | Data Visualization

Rick WicklinApril 22, 2020 0

Visualize the case fatality rate for COVID-19 in US counties

A previous article describes the funnel plot (Spiegelhalter, 2005), which can identify samples that have rates or proportions that are much different than expected. The funnel plot is a scatter plot that plots the sample proportion of some quantity against the size of the sample. The variance of the sample

Read More

Analytics | Data Visualization

Rick WicklinApril 20, 2020 0

Use a funnel plot to visualize rates: The case fatality rate for COVID-19 in North Carolina counties

Death is always a difficult topic to discuss, and death has been in the news a lot during this tragic coronavirus pandemic. Many news stories focus on states, counties, or cities that have the most cases or the most deaths. A related statistic is the case fatality rate, which is

Read More

Data Visualization | Learn SAS

Rick WicklinApril 15, 2020 0

Add diagonal reference lines to SAS graphs: The LINEPARM and VECTOR statements

I previously wrote about the advantages of adding horizontal and vertical reference lines to a graph. You can also add a diagonal reference line to a graph. The SGPLOT procedure in SAS supports two primary ways to add a diagonal reference line: The LINEPARM statement enables you to specify a

Read More

Data Visualization | Learn SAS

Rick WicklinApril 13, 2020 0

Add horizontal and vertical reference lines to SAS graphs: The REFLINE statement

Data tell a story. A purpose of data visualization is to convey that story to the reader in a clear and impactful way. Sometimes you can let the data "speak for themselves" in an unadorned graphic, but sometimes it is helpful to add reference lines to a graph to emphasize

Read More

Data Visualization

Rick WicklinApril 8, 2020 0

On reducing the spread of coronavirus

Every day we face risks. If we drive to work, we risk a fatal auto accident. If we eat red meat and fatty foods, we risk a heart attack. If we go out in public during a pandemic, we risk contracting a disease. A logical response to risk is to

Read More

Data Visualization | Learn SAS

Rick WicklinApril 6, 2020 0

The geometric distribution in SAS

I have written several articles about how to work with continuous probability distributions in SAS. I always emphasize that it is important to be able to compute the four essential functions for working with a statistical distribution. Namely, you need to know how to generate random values, how to compute

Read More

Analytics | Data Visualization

Rick WicklinApril 1, 2020 0

Estimates of doubling time for exponential growth

During an epidemic, such as the coronavirus pandemic of 2020, the media often shows graphs of the cumulative numbers of confirmed cases for different countries. Often these graphs use a logarithmic scale for the vertical axis. In these graphs, a straight line indicates that new cases are increasing at an

Read More

Data Visualization | Learn SAS

Rick WicklinMarch 30, 2020 0

Smokestack plots: A visualization technique for comparing cumulative curves

A cumulative curve shows the total amount of some quantity at multiple points in time. Examples include: Total sales of songs, movies, or books, beginning when the item is released. Total views of blog posts, beginning when the post is published. Total cases of a disease for different countries, beginning

Read More

Data Visualization

Rick WicklinMarch 25, 2020 0

How to read a cumulative frequency graph

During an outbreak of a disease, such as the coronavirus (COVID-19) pandemic, the media shows daily graphs that convey the spread of the disease. The following two graphs appear frequently: New cases for each day (or week). This information is usually shown as a histogram or needle plot. The graph

Read More

Data Visualization | Learn SAS

Rick WicklinMarch 23, 2020 0

Add custom tick marks to a SAS graph

When you create a graph by using the SGPLOT procedure in SAS, usually the default tick locations are acceptable. Sometimes, however, you might want to specify a set of custom tick values for one or both axes. This article shows three examples: Specify evenly spaced values. Specify tick values that

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 18, 2020 0

Print SAS/IML variables with formats

A SAS/IML programmer asked about the best way to print multiple SAS/IML variables when each variable needs a different format. He wanted the output to resemble the "Parameter Estimates" table that is produced by PROC REG and other SAS/STAT procedures. This article shows four ways to print SAS/IML vectors in

Read More

Analytics | Programming Tips

Rick WicklinMarch 16, 2020 0

Predict a random integer: The tradeoff between bias and variance

Books about statistics and machine learning often discuss the tradeoff between bias and variance for an estimator. These discussions are often motivated by a sophisticated predictive model such as a regression or a decision tree. But the basic idea can be seen in much simpler situations. This article presents a

Read More

Analytics

Regular polygons approximate a circle

Rick WicklinMarch 11, 2020 0

Polygons, pi, and linear approximations

Recently, I saw a graphic on Twitter by @neilrkaye that showed the rapid convergence of a regular polygon to a circle as you increase the number of sides for the polygon. The author remarked that polygons that have 40 or more sides "all look like circles to me." That is,

Read More

Advanced Analytics | Data Visualization | Programming Tips

Rick WicklinMarch 9, 2020 0

ROC curves for a binormal sample

In a previous article, I discussed the binormal model for a binary classification problem. This model assumes a set of scores that are normally distributed for each population, and the mean of the scores for the Negative population is less than the mean of scores for the Positive population. I

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 4, 2020 0

Store pre-computed matrices in a list

Suppose that a data set contains a set of parameter values. For each row of parameters, you need to perform some computation. A recent discussion on the SAS Support Communities mentions an important point: if there are duplicate rows in the data, a program might repeat the same computation several

Read More

Data Visualization | Learn SAS

Rick WicklinMarch 2, 2020 0

Create a deviation plot to visualize values relative to a baseline

A colleague recently posted an article about how to use SAS Visual Analytics to create a circular graph that displays a year's worth of temperature data. Specifically, the graph shows the air temperature for each day in a year relative to some baseline temperature, such as 65F (18C). Days warmer

Read More

Previous 1 … 13 14 15 16 17 … 53 Next