I previously wrote about one way to solve the partition problem in SAS. In the partition problem, you divide (or partition) a set of N items into two groups of size k and N-k such that the sum of the items' weights is the same in each group. For example,

# Author

The partition problem has many variations, but recently I encountered it as an interactive puzzle on a computer. (Try a similar game yourself!) The player is presented with an old-fashioned pan-balance scale and a set of objects of different weights. The challenge is to divide (or partition) the objects into

A statistical programmer asked how to simulate event-trials data for groups. The subjects in each group have a different probability of experiencing the event. This article describes one way to simulate this scenario. The simulation is similar to simulating from a mixture distribution. This article also shows three different ways

A colleague spent a lot of time creating a panel of graphs to summarize some data. She did not use SAS software to create the graph, but I used SAS to create a simplified version of her graph, which is shown to the right. (The colors are from her graph.)

A colleague spent a lot of time creating a panel of graphs to summarize some data. She did not use SAS software to create the graph, but I used SAS to create a simplified version of her graph, which is shown to the right. (The colors are from her graph.)

The number of possible bootstrap samples for a sample of size N is big. Really big. Recall that the bootstrap method is a powerful way to analyze the variation in a statistic. To implement the standard bootstrap method, you generate B random bootstrap samples. A bootstrap sample is a sample

You can use the bootstrap method to estimate confidence intervals. Unlike formulas, which assume that the data are drawn from a specified distribution (usually the normal distribution), the bootstrap method does not assume a distribution for the data. There are many articles about how to use SAS to bootstrap statistics

For graphing multivariate data, it is important to be able to convert the data between "wide form" (a separate column for each variable) and "long form" (which contains an indicator variable that assigns a group to each observation). If the data are numeric, the wide data can be represented as

This article shows how to create a "sliced survival plot" for proportional-hazards models that are created by using PROC PHREG in SAS. Graphing the result of a statistical regression model is a valuable way to communicate the predictions of the model. Many SAS procedures use ODS graphics to produce graphs

A previous article discusses the geometry of weighted averages and shows how choosing different weights can lead to different rankings of the subjects. As an example, I showed how college programs might rank applicants by using a weighted average of factors such as test scores. "The best" applicant is determined

People love rankings. You've probably seen articles about the best places to live, the best colleges to attend, the best pizza to order, and so on. Each of these is an example of a ranking that is based on multiple characteristics. For example, a list of the best places to

One of the benefits of using the SWEEP operator is that it enables you to "sweep in" columns (add effects to a model) in any order. This article shows that if you use the SWEEP operator, you can compute a SSCP matrix and use it repeatedly to estimate any linear

Do you ever use a permutation matrix to change the order of rows or columns in a matrix? Did you know that there is a more efficient way in matrix-oriented languages such as SAS/IML, MATLAB, and R? Remember the following tip: Never multiply with a large permutation matrix! Instead, use

In a previous article, I discussed a beautiful painting called "Phantomâ€™s Shadow, 2018" by the Nigerian-born artist, Odili Donald Odita. I noted that if you overlay a 4 x 4 grid on the painting, then each cell contains a four-bladed pinwheel shape. The cells display rotations and reflections of the pinwheel. The

Art evokes an emotional response in the viewer, but sometimes art also evokes a cerebral response. When I see patterns and symmetries in art, I think about a related mathematical object or process. Recently, a Twitter user tweeted about a painting called "Phantomâ€™s Shadow, 2018" by the Nigerian-born artist, Odili

A SAS programmer recently asked why his SAS program and his colleague's R program display different estimates for the quantiles of a very small data set (less than 10 observations). I pointed the programmer to my article that compares the nine common definitions for sample quantiles. The article has a

To get better at something, you need to practice. That maxim applies to sports, music, and programming. If you want to be a better programmer, you need to write many programs. This article provides an example of forming the intersection of items in a SAS/IML list. It then provides several

After my recent articles on simulating data by using copulas, many readers commented about the power of copulas. Yes, they are powerful, and the geometry of copulas is beautiful. However, it is important to be aware of the limitations of copulas. This article creates a bizarre example of bivariate data,

In a previous article, I discussed various ways to solve a least-square linear regression model. I discussed the SWEEP operator (used by many SAS regression routines), the LU-based methods (SOLVE and INV in SAS/IML), and the QR decomposition (CALL QR in SAS/IML). Each method computes the estimates for the regression

In computational statistics, there are often several ways to solve the same problem. For example, there are many ways to solve for the least-squares solution of a linear regression model. A SAS programmer recently mentioned that some open-source software uses the QR algorithm to solve least-squares regression problems and asked

In general, it is hard to simulate multivariate data that has a specified correlation structure. Copulas make that task easier for continuous distributions. A previous article presented the geometry behind a copula and explained copulas in an intuitive way. Although I strongly believe that statistical practitioners should be familiar with

Do you know what a copula is? It is a popular way to simulate multivariate correlated data. The literature for copulas is mathematically formidable, but this article provides an intuitive introduction to copulas by describing the geometry of the transformations that are involved in the simulation process. Although there are

A recent article about how to estimate a two-dimensional distribution function in SAS inspired me to think about a related computation: a 2-D cumulative sum. Suppose you have numbers in a matrix, X. A 2-D cumulative sum is a second matrix, C, such that the C[p,q] gives the sum of

This article shows how to estimate and visualize a two-dimensional cumulative distribution function (CDF) in SAS. SAS has built-in support for this computation. Although the bivariate CDF is not used as much as the univariate CDF, the bivariate version is still a useful tool in understanding the probable values of

This article uses simulation to demonstrate the fact that any continuous distribution can be transformed into the uniform distribution on (0,1). The function that performs this transformation is a familiar one: it is the cumulative distribution function (CDF). A continuous CDF is defined as an integral, so the transformation is

A SAS programmer noticed that his SAS output was not displaying multiple blanks in his strings. He had some strings with leading blanks, others with trailing blanks, and others with multiple blanks in the middle. Yet, every time he used SAS to print the strings to the HTML destination, something

A previous article showed how to simulate multivariate correlated data by using the Iman-Conover transformation (Iman and Conover, 1982). The transformation preserves the marginal distributions of the original data but permutes the values (columnwise) to induce a new correlation among the variables. When I first read about the Iman-Conover transformation,

Simulating univariate data is relatively easy. Simulating multivariate data is much harder. The main difficulty is to generate variables that have given univariate distributions but also are correlated with each other according to a specified correlation matrix. However, Iman and Conover (1982, "A distribution-free approach to inducing rank correlation among

Many nonparametric statistical methods use the ranks of observations to compute distribution-free statistics. In SAS, two procedures that use ranks are PROC NPAR1WAY and PROC CORR. Whereas the SPEARMAN option in PROC CORR (which computes rank correlation) uses only the "raw" tied ranks, PROC NPAR1WAY uses transformations of the ranks,

For many univariate statistics (mean, median, standard deviation, etc.), the order of the data is unimportant. If you sort univariate data, the mean and standard deviation do not change. However, you cannot sort an individual variable (independently) if you want to preserve its relationship with other variables. This statement is