This article shows how to simulate beta-binomial data in SAS and how to compute the density function (PDF). The beta-binomial distribution is a discrete compound distribution. The "binomial" part of the name means that the discrete random variable X follows a binomial distribution with parameters N (number of trials) and

# Author

Did you know that a SAS/IML function can recover from a run-time error? You can specify how to handle run-time errors by using a programming technique that is similar to the modern "try-catch" technique, although the SAS/IML technique is an older implementation. Preventing errors versus handling errors In general, SAS/IML

Debugging is the bane of every programmer. SAS supports a DATA step debugger, but that debugger can't be used for debugging SAS/IML programs. In lieu of a formal debugger, many SAS/IML programmers resort to inserting multiple PRINT statements into a function definition. However, there is an easier way to query

A SAS programmer wanted to display a table in which the rows have different formats. An example is shown below. The programmer wanted columns that represent statistics and rows that represent variables. She wanted to display formats (such as DOLLAR) for some variables—but only for certain statistics. For example, the

Have you ever seen the "Fit Summary" table from PROC LOESS, as shown to the right? Or maybe you've seen the "Model Information" table that is displayed by some SAS analytical procedures? These tables provide brief interesting facts about a statistical procedure, hence they are called factoids. In SAS, a

A SAS/IML programmer asked whether you can pass the name of a function as an argument to a SAS/IML module and have the module call the function that is passed in. The answer is "yes." The basic idea is to create a string that represents the function call and then

This article demonstrates a SAS programming technique that I call Kuhfeld's template modification technique. The technique enables you to dynamically modify an ODS template and immediately call the modified template to produce a new graph or table. By following the five steps in this article, you can implement the technique

This article describes the advantages and disadvantages of principal component regression (PCR). This article also presents alternative techniques to PCR. In a previous article, I showed how to compute a principal component regression in SAS. Recall that principal component regression is a technique for handling near collinearities among the regression

A common question on discussion forums is how to compute a principal component regression in SAS. One reason people give for wanting to run a principal component regression is that the explanatory variables in the model are highly correlated which each other, a condition known as multicollinearity. Although principal component

In a previous article, I discussed the lines plot for multiple comparisons of means. Another graph that is frequently used for multiple comparisons is the diffogram, which indicates whether the pairwise differences between means of groups are statistically significant. This article discusses how to interpret a diffogram. Two related plots

In a previous article, I discussed the lines plot for multiple comparisons of means. Another graph that is frequently used for multiple comparisons is the diffogram, which indicates whether the pairwise differences between means of groups are statistically significant. This article discusses how to interpret a diffogram. Two related plots

Last week Warren Kuhfeld wrote about a graph called the "lines plot" that is produced by SAS/STAT procedures in SAS 9.4M5. (Notice that the "lines plot" has an 's'; it is not a line plot!) The lines plot is produced as part of an analysis that performs multiple comparisons of

The article "Fisher's transformation of the correlation coefficient" featured a Monte Carlo simulation that generated sample correlations from bivariate normal data. The simulation used three steps: Simulate B samples of size N from a bivariate normal distribution with correlation ρ. Use PROC CORR to compute the sample correlation matrix for

Correlations between variables are typically displayed in a matrix. Because the correlation matrix is determined by the order of the variables, it is difficult to find the largest and smallest correlations, which is why analysts sometimes use colors to visualize the correlation matrix. Another visualization option is the pairwise correlation

If you perform a weighted statistical analysis, it can be useful to produce a statistical graph that also incorporates the weights. This article shows how to construct and interpret a weighted histogram in SAS. How to construct a weighted histogram Before constructing a weighted histogram, let's review the construction of

How can you specify weights for a statistical analysis? Hmmm, that's a "weighty" question! Many people on discussion forums ask "What is a weight variable?" and "How do you choose a weight for each observation?" This article gives a brief overview of weight variables in statistics and includes examples of

In a large simulation study, it can be convenient to have a "control file" that contains the parameters for the study. My recent article about how to simulate multivariate normal clusters demonstrates a simple example of this technique. The simulation in that article uses an input data set that contains

My article about Fisher's transformation of the Pearson correlation contained a simulation. The simulation uses the RANDNORMAL function in SAS/IML software to simulate multivariate normal data. If you are a SAS programmer who does not have access to SAS/IML software, you can use the SIMNORMAL procedure in SAS/STAT software to

Pearson's correlation measures the linear association between two variables. Because the correlation is bounded between [-1, 1], the sampling distribution for highly correlated variables is highly skewed. Even for bivariate normal data, the skewness makes it challenging to estimate confidence intervals for the correlation, to run one-sample hypothesis tests ("Is

Toe bone connected to the foot bone, Foot bone connected to the leg bone, Leg bone connected to the knee bone,... — American Spiritual, "Dem Bones" Last week I read an interesting article on Robert Kosara's data visualization blog. Kosara connected the geographic centers of the US zip codes in

This article shows how to simulate data from a mixture of multivariate normal distributions, which is also called a Gaussian mixture. You can use this simulation to generate clustered data. The adjacent graph shows three clusters, each simulated from a four-dimensional normal distribution. Each cluster has its own within-cluster covariance,

Did you know that you can get SAS to compute symbolic (analytical) derivatives of simple functions, including applying the product rule, quotient rule, and chain rule? SAS can form the symbolic derivatives of single-variable functions and partial derivatives of multivariable functions. Furthermore, the derivatives are output in a form that

If you use SAS regression procedures, you are probably familiar with the "stars and bars" notation, which enables you to construct interaction effects in regression models. Although you can construct many regression models by using that classical notation, a friend recently reminded me that the EFFECT statement in SAS provides

Correlation is a fundamental statistical concept that measures the linear association between two variables. There are multiple ways to think about correlation: geometrically, algebraically, with matrices, with vectors, with regression, and more. To paraphrase the great songwriter Paul Simon, there must be 50 ways to view your correlation! But don't

A previous article discussed the mathematical properties of the singular value decomposition (SVD) and showed how to use the SVD subroutine in SAS/IML software. This article uses the SVD to construct a low-rank approximation to an image. Applications include image compression and denoising an image. Construct a grayscale image The

The singular value decomposition (SVD) could be called the "billion-dollar algorithm" since it provides the mathematical basis for many modern algorithms in data science, including text mining, recommender systems (think Netflix and Amazon), image processing, and classification problems. Although the SVD was mathematically discovered in the late 1800s, computers have

All statisticians are familiar with the classical arithmetic mean. Some statisticians are also familiar with the geometric mean. Whereas the arithmetic mean of n numbers is the sum divided by n, the geometric mean of n nonnegative numbers is the n_th root of the product of the numbers. The geometric

When you implement a statistical algorithm in a vector-matrix language such as SAS/IML, R, or MATLAB, you should measure the performance of your implementation, which means that you should time how long a program takes to analyze data of varying sizes and characteristics. There are some general tips that can

Visualizing the correlations between variables often provides insight into the relationships between variables. I've previously written about how to use a heat map to visualize a correlation matrix in SAS/IML, and Chris Hemedinger showed how to use Base SAS to visualize correlations between variables. Recently a SAS programmer asked how

When someone refers to the correlation between two variables, they are probably referring to the Pearson correlation, which is the standard statistic that is taught in elementary statistics courses. Elementary courses do not usually mention that there are other measures of correlation. Why would anyone want a different estimate of