## Tag: Statistical Programming

0
A principal component analysis of color palettes

In a previous article, I visualized seven Christmas-themed palettes of colors, as shown to the right. You can see that the palettes include many red, green, and golden colors. Clearly, the colors in the Christmas palettes are not a random sample from the space of RGB colors. Rather, they represent

0
The expected number of points on a convex hull

While discussing how to compute convex hulls in SAS with a colleague, we wondered how the size of the convex hull compares to the size of the sample. For most distributions of points, I claimed that the size of the convex hull is much less than the size of the

Learn SAS
0
What is a CAS-enabled procedure?

I attended a seminar last week whose purpose was to inform SAS 9 programmers about SAS Viya. I could tell from the programmer's questions that some programmers were confused about three basic topics: What are the computing environments in Viya, and how should a programmer think about them? What procedures

0
The order of vertices on a convex polygon

In a previous article, I showed how to use theCVEXHULL function in SAS/IML to compute the convex hull of a finite set of planar points. The convex hull is a convex polygon, which is defined by its vertices. To visualize the polygon, you need to know the vertices in sequential

0
Two-dimensional convex hulls in SAS

Given a cloud of points in the plane, it can be useful to identify the convex hull of the points. The convex hull is the smallest convex set that contains the observations. For a finite set of points, it is a convex polygon that has some of the points as

0
Fit a mixture of Weibull distributions in SAS

A previous article discusses how to use SAS regression procedures to fit a two-parameter Weibull distribution in SAS. The article shows how to convert the regression output into the more familiar scale and shape parameters for the Weibull probability distribution, which are fit by using PROC UNIVARIATE. Although PROC UNIVARIATE

0
Crossover and mutation: An introduction to two operations in genetic algorithms

This article uses an example to introduce to genetic algorithms (GAs) for optimization. It discusses two operators (mutation and crossover) that are important in implementing a genetic algorithm. It discusses choices that you must make when you implement these operations. Some programmers love using genetic algorithms. Genetic algorithms are heuristic

0
The partition problem

The partition problem has many variations, but recently I encountered it as an interactive puzzle on a computer. (Try a similar game yourself!) The player is presented with an old-fashioned pan-balance scale and a set of objects of different weights. The challenge is to divide (or partition) the objects into

0
Bootstrap correlation coefficients in SAS

You can use the bootstrap method to estimate confidence intervals. Unlike formulas, which assume that the data are drawn from a specified distribution (usually the normal distribution), the bootstrap method does not assume a distribution for the data. There are many articles about how to use SAS to bootstrap statistics

0
Convert a symmetric matrix from wide to long form

For graphing multivariate data, it is important to be able to convert the data between "wide form" (a separate column for each variable) and "long form" (which contains an indicator variable that assigns a group to each observation). If the data are numeric, the wide data can be represented as

0
More on the SWEEP operator for least-square regression models

One of the benefits of using the SWEEP operator is that it enables you to "sweep in" columns (add effects to a model) in any order. This article shows that if you use the SWEEP operator, you can compute a SSCP matrix and use it repeatedly to estimate any linear

0
Never multiply with a large permutation matrix

Do you ever use a permutation matrix to change the order of rows or columns in a matrix? Did you know that there is a more efficient way in matrix-oriented languages such as SAS/IML, MATLAB, and R? Remember the following tip: Never multiply with a large permutation matrix! Instead, use

Programming Tips
0
Compare the default definitions for sample quantiles in SAS, R, and Python

A SAS programmer recently asked why his SAS program and his colleague's R program display different estimates for the quantiles of a very small data set (less than 10 observations). I pointed the programmer to my article that compares the nine common definitions for sample quantiles. The article has a

Learn SAS
0
Operations on lists in SAS/IML

To get better at something, you need to practice. That maxim applies to sports, music, and programming. If you want to be a better programmer, you need to write many programs. This article provides an example of forming the intersection of items in a SAS/IML list. It then provides several

0
Copulas and multivariate distributions with normal marginals

After my recent articles on simulating data by using copulas, many readers commented about the power of copulas. Yes, they are powerful, and the geometry of copulas is beautiful. However, it is important to be aware of the limitations of copulas. This article creates a bizarre example of bivariate data,

0
An introduction to simulating correlated data by using copulas

Do you know what a copula is? It is a popular way to simulate multivariate correlated data. The literature for copulas is mathematically formidable, but this article provides an intuitive introduction to copulas by describing the geometry of the transformations that are involved in the simulation process. Although there are

0
Compute 2-D cumulative sums and ogives

A recent article about how to estimate a two-dimensional distribution function in SAS inspired me to think about a related computation: a 2-D cumulative sum. Suppose you have numbers in a matrix, X. A 2-D cumulative sum is a second matrix, C, such that the C[p,q] gives the sum of

0
Estimate a bivariate CDF in SAS

This article shows how to estimate and visualize a two-dimensional cumulative distribution function (CDF) in SAS. SAS has built-in support for this computation. Although the bivariate CDF is not used as much as the univariate CDF, the bivariate version is still a useful tool in understanding the probable values of

Programming Tips
0
The probability integral transform

This article uses simulation to demonstrate the fact that any continuous distribution can be transformed into the uniform distribution on (0,1). The function that performs this transformation is a familiar one: it is the cumulative distribution function (CDF). A continuous CDF is defined as an integral, so the transformation is

0
The geometry of the Iman-Conover transformation

A previous article showed how to simulate multivariate correlated data by using the Iman-Conover transformation (Iman and Conover, 1982). The transformation preserves the marginal distributions of the original data but permutes the values (columnwise) to induce a new correlation among the variables. When I first read about the Iman-Conover transformation,

0
Simulate correlated variables by using the Iman-Conover transformation

Simulating univariate data is relatively easy. Simulating multivariate data is much harder. The main difficulty is to generate variables that have given univariate distributions but also are correlated with each other according to a specified correlation matrix. However, Iman and Conover (1982, "A distribution-free approach to inducing rank correlation among

0
Rank-based scores and tied values

Many nonparametric statistical methods use the ranks of observations to compute distribution-free statistics. In SAS, two procedures that use ranks are PROC NPAR1WAY and PROC CORR. Whereas the SPEARMAN option in PROC CORR (which computes rank correlation) uses only the "raw" tied ranks, PROC NPAR1WAY uses transformations of the ranks,

0
Permutation tests and independent sorting of data

For many univariate statistics (mean, median, standard deviation, etc.), the order of the data is unimportant. If you sort univariate data, the mean and standard deviation do not change. However, you cannot sort an individual variable (independently) if you want to preserve its relationship with other variables. This statement is

0
The running median as a time series smoother

When data contain outliers, medians estimate the center of the data better than means do. In general, robust estimates of location and sale are preferred over classical moment-based estimates when the data contain outliers or are from a heavy-tailed distribution. Thus, instead of using the mean and standard deviation of

0
Standardized regression coefficients in PROC GLIMMIX

I previously wrote about how to understand standardized regression coefficients in PROC REG in SAS. You can obtain the standardized estimates by using the STDB option on the MODEL statement in PROC REG. Several readers have written to ask whether I could write a similar article about the STDCOEF option

0
Nonstandard ways to standardize variables

You can standardize a numerical variable by subtracting a location parameter from each observation and then dividing by a scale parameter. Often, the parameters depend on the data that you are standardizing. For example, the most common way to standardize a variable is to subtract the sample mean and divide

0
Odani's truism for fractions that are near each other

Odani's truism is a mathematical result that says that if you want to compare the fractions a/b and c/d, it often is sufficient to compare the sums (a+d) and (b+c) rather than the products a*d and b*c. (All of the integers a, b, c, and d are positive.) If you

0
Compute the Hoeffding D statistic in SAS

There are many statistics that measure whether two continuous random variables are independent or whether they are related to each other in some way. The most well-known statistic is Pearson's correlation, which is a parametric measure of the linear relationship between two variables. A related measure is Spearman's rank correlation,

0
The generalized gamma distribution

A SAS customer wanted to compute the cumulative distribution function (CDF) of the generalized gamma distribution. For any continuous distribution, the CDF is the integral of the probability density function (PDF), which usually has an explicit formula. Accordingly, he wanted to compute the CDF by using the QUAD function in

0
Generate all quadratic interactions in a regression model

I've previously written about how to generate all pairwise interactions for a regression model in SAS. For a model that contains continuous effects, the easiest way is to use the EFFECT statement in PROC GLMSELECT to generate second-degree "polynomial effects." However, a SAS programmer was running a simulation study and