There are many statistics that measure whether two continuous random variables are independent or whether they are related to each other in some way. The most well-known statistic is Pearson's correlation, which is a parametric measure of the linear relationship between two variables. A related measure is Spearman's rank correlation,

# Author

SAS/IML programmers often create and call user-defined modules. Recall that a module is a user-defined subroutine or function. A function returns a value; a subroutine can change one or more of its input arguments. I have written a complete guide to understanding SAS/IML modules, which contains many tips for working

Ranking is a fundamental concept in statistics. Ranks of univariate data are used by statisticians to estimate statistics such as percentiles (quantiles) and empirical distributions. A more advanced use is to compute various rank-based measures of correlation or association between pairs of variables. For example, ranks are used to compute

The ranks of a set of data values are used in many nonparametric statistics and statistical tests. When you request a statistic or nonparametric test in SAS, the procedure will automatically compute the ranks that are needed. However, sometimes it is useful to know how to compute the ranks yourself.

It can be frustrating to receive an error message from statistical software. In the early days of the SAS statistical graphics (SG) procedures, an error message that I dreaded was ERROR: Attempting to overlay incompatible plot or chart types. This error message appears when you attempt to use PROC SGPLOT

Most introductory statistics courses introduce the bar chart as a way to visualize the frequency (counts) for a categorical variable. A vertical bar chart places the categories along the horizontal (X) axis and shows the counts (or percentages) on the vertical (Y) axis. The vertical bar chart is a precursor

As mentioned in my article about Monte Carlo estimate of (one-dimensional) integrals, one of the advantages of Monte Carlo integration is that you can perform multivariate integrals on complicated regions. This article demonstrates how to use SAS to obtain a Monte Carlo estimate of a double integral over rectangular and

A previous article shows how to use Monte Carlo simulation to estimate a one-dimensional integral on a finite interval. A larger random sample will (on average) result in an estimate that is closer to the true value of the integral than a smaller sample. This article shows how you can

Numerical integration is important in many areas of applied mathematics and statistics. For one-dimensional integrals on the interval (a, b), SAS software provides two important tools for numerical integration: For common univariate probability distributions, you can use the CDF function to integrate the density, thus obtaining the probability that a

A previous article discusses how to interpret regression diagnostic plots that are produced by SAS regression procedures such as PROC REG. In that article, two of the plots indicate influential observations and outliers. Intuitively, an observation is influential if its presence changes the parameter estimates for the regression by "more

When you fit a regression model, it is useful to check diagnostic plots to assess the quality of the fit. SAS, like most statistical software, makes it easy to generate regression diagnostics plots. Most SAS regression procedures support the PLOTS= option, which you can use to generate a panel of

This article shows how to use PROC SGPLOT in SAS to create the scatter plot shown to the right. The scatter plot has the following features: The colors of markers are determined by the value of a third variable. The outline of each marker is the same color (such as

Here is an interesting math question: How many reduced fractions in the interval (0, 1) have a denominator less than 100? The question is difficult is because of the word "reduced." If we only care about the total number of fractions in (0,1) whose denominator is less than 100, we

A SAS customer wanted to compute the cumulative distribution function (CDF) of the generalized gamma distribution. For any continuous distribution, the CDF is the integral of the probability density function (PDF), which usually has an explicit formula. Accordingly, he wanted to compute the CDF by using the QUAD function in

This is my Pi Day post for 2021. Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi. Most years I write about lower-case pi (π), which is the ratio of a circle's

I recently learned about a new feature in PROC QUANTREG that was added in SAS/STAT 15.1 (part of SAS 9.4M6). Recall that PROC QUANTREG enables you to perform quantile regression in SAS. (If you are not familiar with quantile regression, see an earlier article that describes quantile regression and provides

I have previously shown how you can use the FRACTw. format in SAS to display numbers as fractions. But did you know that you can also use the format to obtain the numerator and denominator of the fraction as numbers in a program? All you need to do is to

I recently wrote about a simple statistical formula that approximates the wind chill temperature, which is the cumulative effect of air temperature and wind on the human body. The formula uses two independent variables (air temperature and wind speed) to predict the wind chill temperature. This article describes how to

In cold and blustery conditions, the weather forecast often includes two temperatures: the actual air temperature and the wind chill temperature. The wind chill temperature conveys the cumulative effect of air temperature and wind on the human body. The goal of the wind-chill scale is to communicate the effect of

A previous article describes how to use the SGPANEL procedure to visualize subgroups of data. It focuses on using headers to display information about each graph. In the example, the data are time series for the price of several stocks, and the headers include information about whether the stock price

Many characteristics of a graph are determined by the underlying data at run time. A familiar example is when you use colors to indicate different groups in the data. If the data have three groups, you see three colors. If the data have four groups, you see four colors. The

I've previously written about how to generate all pairwise interactions for a regression model in SAS. For a model that contains continuous effects, the easiest way is to use the EFFECT statement in PROC GLMSELECT to generate second-degree "polynomial effects." However, a SAS programmer was running a simulation study and

One of the first things I learned in SAS was how to use PROC PRINT to display parts of a data set. I usually do not need to see all the data, so my favorite way to use PROC PRINT is to use the OBS= data set option to display

Look at the following matrices. Do you notice anything that these matrices have in common? If you noticed that the rows of each matrix are arithmetic progressions, good for you. For each row, there is a constant difference (also called the "increment") between adjacent elements. For these examples: In the

In a previous article, I showed how to generate random points uniformly inside a d-dimensional sphere. In that article, I stated the following fact: If Y is drawn from the uncorrelated multivariate normal distribution, then S = Y / ||Y|| has the uniform distribution on the unit sphere. I was

Imagine an animal that is searching for food in a vast environment where food is scarce. If no prey is nearby, the animal's senses (such as smell and sight) are useless. In that case, a reasonable search strategy is a random walk. The animal can choose a random direction, walk/swim/fly

The inverse gamma distribution is a continuous probability distribution that is used in Bayesian analysis and in some statistical models. The inverse gamma distribution is closely related to the gamma distribution. For any probability distribution, it is essential to know how to compute four functions: the PDF function, which returns

Years ago, I wrote about how to compute the incomplete beta function in SAS. Recently, a SAS programmer asked about a similar function, called the incomplete gamma function. The incomplete gamma function is a "special function" that arises in applied math, physics, and statistics. You should not confuse the gamma

This is the third and last introductory article about how to bootstrap time series in SAS. In the first article, I presented the simple block bootstrap and discussed why bootstrapping a time series is more complicated than for regression models that assume independent errors. Briefly, when you perform residual resampling

Have you ever heard of the DOLIST syntax? You might know the syntax even if you are not familiar with the name. The DOLIST syntax is a way to specify a list of numerical values to an option in a SAS procedure. Applications include: Specify the end points for bins