# Author Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

0
Compute moments of probability distributions in SAS

A previous article discusses the definitions of three kinds of moments for a continuous probability distribution: raw moments, central moments, and standardized moments. These are defined in terms of integrals over the support of the distribution. Moments are connected to the familiar shape features of a distribution: the mean, variance,

Analytics
0
Definitions of moments in probability and statistics

The moments of a continuous probability distribution are often used to describe the shape of the probability density function (PDF). The first four moments (if they exist) are well known because they correspond to familiar descriptive statistics: The first raw moment is the mean of a distribution. For a random

0
Display correlations in a list format

The correlations between p variables are usually displayed by using a symmetric p x p matrix of correlations. However, sometimes you might prefer to see the correlations listed in "long form" as a three-column table, as shown to the right. In this table, each row shows a pair of variables and the

0
The noncentral t distribution in SAS

The noncentral t distribution is a probability distribution that is used in power analysis and hypothesis testing. The distribution generalizes the Student t distribution by adding a noncentrality parameter, δ. When δ=0, the noncentral t distribution is the usual (central) t distribution, which is a symmetric distribution. When δ >

0
Generate random ID values for subjects in SAS

A common question on SAS discussion forums is how to use SAS to generate random ID values. The use case is to generate a set of random strings to assign to patients in a clinical study. If you assign each patient a unique ID and delete the patients' names, you

0
Base 26: A mapping from integers to strings

I recently showed how to represent positive integers in any base and gave examples of base 2 (binary), base 8 (octal), and base 16 (hexadecimal). One fun application is that you can use base 26 to associate a positive integer to every string of English characters. This article shows how

0
Convert integers from base 10 to another base

An integer can be represented in many ways. This article shows how to represent a positive integer in any base b. The most common base is b=10, but other popular bases are b=2 (binary numbers), b=8 (octal), and b=16 (hexadecimal). Each base represents integers in different ways. Think of a

0
A test for monotonic sequences and functions

Monotonic transformations occur frequently in math and statistics. Analysts use monotonic transformations to transform variable values, with Tukey's ladder of transformations and the Box-Cox transformations being familiar examples. Monotonic distributions figure prominently in probability theory because the cumulative distribution is a monotonic increasing function. For a continuous distribution that is

0
Two types of syntax for the SELECT-WHEN statement in SAS

The SELECT-WHEN statement in the SAS DATA step is an alternative to using a long sequence of IF-THEN/ELSE statements. Although logically equivalent to IF-THEN/ELSE statements, the SELECT-WHEN statement can be easier to read. This article discusses the two distinct ways to specify the SELECT-WHEN statement. You can use the first

0
Order the bars in a bar chart with PROC SGPLOT

A SAS programmer was trying to understand how PROC SGPLOT orders categories and segments in a stacked bar chart. As with all problems, it is often useful to start with a simpler version of the problem. After you understand the simpler situation, you can apply that understanding to the more

0
How to stagger labels on an axis in PROC SGPLOT

A SAS programmer asked how to display long labels at irregular locations along the horizontal axis of scatter plot. The labels indicate various phases of a clinical study. This article discusses the problem and shows how to use the FITPOLICY=STAGGER option on the XAXIS or X2AXIS statement to avoid collisions

0
The univariate Box-Cox transformation

A SAS customer asked how to use the Box-Cox transformation to normalize a single variable. Recall that a normalizing transformation is a function that attempts to convert a set of data to be as nearly normal as possible. For positive-valued data, introductory statistics courses often mention the log transformation or

0
The Box-Cox transformation for a dependent variable in a regression

In the 1960s and '70s, before nonparametric regression methods became widely available, it was common to apply a nonlinear transformation to the dependent variable before fitting a linear regression model. This is still done today, with the most common transformation being a logarithmic transformation of the dependent variable, which fits

0
Tukey's ladder of variable transformations

John Tukey was an influential statistician who proposed many statistical concepts. In the 1960s and 70s, he was fundamental in the discovery and exposition of robust statistical methods, and he was an ardent proponent of exploratory data analysis (EDA). In his 1977 book, Exploratory Data Analysis, he discussed a small

0
Means and medians as minimizers of a loss function

On Twitter, I saw a tweet from @DataSciFact that read, "The sum of (x_i - x)^2 over a set of data points x_i is minimized when x is the sample mean." I (@RickWicklin) immediately tweeted out a reply: "And the sum of |x_i - x| is minimized by the sample

Learn SAS
0
How to examine the status of ODS destinations in SAS

A SAS programmer asked for help on a discussion forum: "My SAS session will not display any tables or graphs! I try to use PROC PRINT and other procedures, but no output is displayed! What can I do?" The most common reasons why you might not see any output when

0
Introductory examples of Monte Carlo simulation in SAS

When I was writing Simulating Data with SAS (Wicklin, 2013), I read a lot of introductory textbooks about Monte Carlo simulation. One of my favorites is Sheldon Ross's book Simulation. (I read the 4th Edition (2006); the 5th Edition was published in 2013.) I love that the book brings together

0
Monte Carlo estimates of area

I've previously shown how to use Monte Carlo simulation to estimate probabilities and areas. I illustrated the Monte Carlo method by estimating π ≈ 3.14159... by generating points uniformly at random in a unit square and computing the proportion of those points that were inside the unit circle. The previous

0
A tip for visualizing the density function of a distribution

It isn't easy to draw the graph of a function when you don't know what the graph looks like. To draw the graph by using a computer, you need to know the domain of the function for the graph: the minimum value (xMin) and the maximum value (xMax) for plotting

0
Tips for computing right-tail probabilities and quantiles

A colleague was struggling to compute a right-tail probability for a distribution. Recall that the cumulative distribution function (CDF) is defined as a left-tail probability. For a continuous random variable, X, with density function f, the CDF at the value x is     F(x) = Pr(X ≤ x) = ∫

0
Use ODS to arrange graphs in a panel

A SAS programmer wanted to create a panel that contained two of the graphs side-by-side. The graphs were created by using calls to two different SAS procedures. This article shows how to select the graphs and arrange them side-by-side by using the ODS LAYOUT GRIDDED statement. The end of the

0
Confidence bands for partial leverage regression plots

I previously wrote about partial leverage plots for regression diagnostics and why they are useful. You can generate a partial leverage plot in SAS by using the PLOTS=PARTIALPLOT option in PROC REG. One useful property of partial leverage plots is the ability to graphically represent the null hypothesis that a

0
Use functions in a WHERE statement to filter observations

Many people know that you can use "WHERE processing" in SAS to filter observations. A typical use is to process only observations that match some criterion. For example, the following WHERE statement processes only observations for male patients who have high blood pressure: WHERE Sex='Male' & Systolic > 140; In

0
Compute the multivariate t density function

A previous article shows how to compute the probability density function (PDF) for the multivariate normal distribution. In a similar way, you can compute the density function for the multivariate t distribution. This article discusses the density function for the multivariate t distribution, shows how to compute it, and visualizes

0
The derivative of a quantile function

Recently, I needed to solve an optimization problem in which the objective function included a term that involved the quantile function (inverse CDF) of the t distribution, which is shown to the right for DF=5 degrees of freedom. I casually remarked to my colleague that the optimizer would have to

0
Partial leverage plots

For a linear regression model, a useful but underutilized diagnostic tool is the partial regression leverage plot. Also called the partial regression plot, this plot visualizes the parameter estimates table for the regression. For each effect in the model, you can visualize the following statistics: The estimate for each regression

0
PUSH, POP, and reset options for ODS graphics

The ODS GRAPHICS statement in SAS supports more than 30 options that enable you to configure the attributes of graphs that you create in SAS. Did you know that you can display the current set of graphical options? Furthermore, did you know that you can temporarily set certain options and

0
Detect palindromes and rotational palindromes in SAS

A palindrome is a sequence of letters that is the same when read forward and backward. In brief, if you reverse the sequence of letters, the word is unchanged. For example, 'mom' and 'racecar' are palindromes. You can extend the definition to phrases by removing all spaces and punctuation marks

0
The effect of weight functions in a robust regression method

M estimation is a robust regression technique that assigns a weight to each observation based on the magnitude of the residual for that observation. Large residuals are downweighted (assigned weights less than 1) whereas observations with small residuals are given weights close to 1. By iterating the reweighting and fitting

0
Weights for residuals in robust regression

An early method for robust regression was iteratively reweighted least-squares regression (Huber, 1964). This is an iterative procedure in which each observation is assigned a weight. Initially, all weights are 1. The method fits a least-squares model to the weighted data and uses the size of the residuals to determine

1 2 3 46