Blogs

Blogs

Author

Rick Wicklin

Rick Wicklin RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Analytics | Learn SAS

Diffogram for multiple comparisons of means in SAS

Rick WicklinOctober 18, 2017 0

The diffogram and other graphs for multiple comparisons of means

In a previous article, I discussed the lines plot for multiple comparisons of means. Another graph that is frequently used for multiple comparisons is the diffogram, which indicates whether the pairwise differences between means of groups are statistically significant. This article discusses how to interpret a diffogram. Two related plots

Read More

Analytics | Data Visualization | Learn SAS

Lines plot for multiple comparison of means in SAS

Rick WicklinOctober 16, 2017 0

Graphs for multiple comparisons of means: The lines plot

Last week Warren Kuhfeld wrote about a graph called the "lines plot" that is produced by SAS/STAT procedures in SAS 9.4M5. (Notice that the "lines plot" has an 's'; it is not a line plot!) The lines plot is produced as part of an analysis that performs multiple comparisons of

Read More

Programming Tips

Rick WicklinOctober 11, 2017 0

Simulate correlations by using the Wishart distribution

The article "Fisher's transformation of the correlation coefficient" featured a Monte Carlo simulation that generated sample correlations from bivariate normal data. The simulation used three steps: Simulate B samples of size N from a bivariate normal distribution with correlation ρ. Use PROC CORR to compute the sample correlation matrix for

Read More

Programming Tips

Heat map of correlations between variables

Rick WicklinOctober 9, 2017 0

Order correlations by magnitude

Correlations between variables are typically displayed in a matrix. Because the correlation matrix is determined by the order of the variables, it is difficult to find the largest and smallest correlations, which is why analysts sometimes use colors to visualize the correlation matrix. Another visualization option is the pairwise correlation

Read More

Analytics | Data Visualization | Learn SAS

Weighted histogram in SAS

Rick WicklinOctober 4, 2017 0

Create and interpret a weighted histogram

If you perform a weighted statistical analysis, it can be useful to produce a statistical graph that also incorporates the weights. This article shows how to construct and interpret a weighted histogram in SAS. How to construct a weighted histogram Before constructing a weighted histogram, let's review the construction of

Read More

Analytics | Learn SAS

Visualization of regression that uses a weight variable in SAS

Rick WicklinOctober 2, 2017 0

How to understand weight variables in statistical analyses

How can you specify weights for a statistical analysis? Hmmm, that's a "weighty" question! Many people on discussion forums ask "What is a weight variable?" and "How do you choose a weight for each observation?" This article gives a brief overview of weight variables in statistics and includes examples of

Read More

Learn SAS | Programming Tips

Results of a data-driven simulation in which parameters are stored in a file and processed by a SAS program

Rick WicklinSeptember 27, 2017 0

Data-driven simulation

In a large simulation study, it can be convenient to have a "control file" that contains the parameters for the study. My recent article about how to simulate multivariate normal clusters demonstrates a simple example of this technique. The simulation in that article uses an input data set that contains

Read More

Analytics | Programming Tips

Rick WicklinSeptember 25, 2017 0

Simulate multivariate normal data in SAS by using PROC SIMNORMAL

My article about Fisher's transformation of the Pearson correlation contained a simulation. The simulation uses the RANDNORMAL function in SAS/IML software to simulate multivariate normal data. If you are a SAS programmer who does not have access to SAS/IML software, you can use the SIMNORMAL procedure in SAS/STAT software to

Read More

Analytics | Learn SAS

Rick WicklinSeptember 20, 2017 0

Fisher's transformation of the correlation coefficient

Pearson's correlation measures the linear association between two variables. Because the correlation is bounded between [-1, 1], the sampling distribution for highly correlated variables is highly skewed. Even for bivariate normal data, the skewness makes it challenging to estimate confidence intervals for the correlation, to run one-sample hypothesis tests ("Is

Read More

Data Visualization | Learn SAS

Rick WicklinSeptember 18, 2017 0

The path of zip codes

Toe bone connected to the foot bone, Foot bone connected to the leg bone, Leg bone connected to the knee bone,... — American Spiritual, "Dem Bones" Last week I read an interesting article on Robert Kosara's data visualization blog. Kosara connected the geographic centers of the US zip codes in

Read More

Advanced Analytics | Learn SAS | Programming Tips

Simulate clustered data from a Gaussian mixture distribution

Rick WicklinSeptember 13, 2017 0

Simulate multivariate clusters in SAS

This article shows how to simulate data from a mixture of multivariate normal distributions, which is also called a Gaussian mixture. You can use this simulation to generate clustered data. The adjacent graph shows three clusters, each simulated from a four-dimensional normal distribution. Each cluster has its own within-cluster covariance,

Read More

Analytics | Programming Tips

Rick WicklinSeptember 11, 2017 0

Symbolic derivatives in SAS

Did you know that you can get SAS to compute symbolic (analytical) derivatives of simple functions, including applying the product rule, quotient rule, and chain rule? SAS can form the symbolic derivatives of single-variable functions and partial derivatives of multivariable functions. Furthermore, the derivatives are output in a form that

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 7, 2017 0

Construct polynomial effects in SAS regression models

If you use SAS regression procedures, you are probably familiar with the "stars and bars" notation, which enables you to construct interaction effects in regression models. Although you can construct many regression models by using that classical notation, a friend recently reminded me that the EFFECT statement in SAS provides

Read More

Analytics

Rick WicklinSeptember 5, 2017 0

7 ways to view correlation

Correlation is a fundamental statistical concept that measures the linear association between two variables. There are multiple ways to think about correlation: geometrically, algebraically, with matrices, with vectors, with regression, and more. To paraphrase the great songwriter Paul Simon, there must be 50 ways to view your correlation! But don't

Read More

Advanced Analytics

Rick WicklinAugust 30, 2017 0

The singular value decomposition and low-rank approximations

A previous article discussed the mathematical properties of the singular value decomposition (SVD) and showed how to use the SVD subroutine in SAS/IML software. This article uses the SVD to construct a low-rank approximation to an image. Applications include image compression and denoising an image. Construct a grayscale image The

Read More

Advanced Analytics

Geometric interpretation of the singular value decomposition (SVD) as the product of a rotation/reflection, followed by a scaling, followed by another rotation/reflection.

Rick WicklinAugust 28, 2017 0

The singular value decomposition: A fundamental technique in multivariate data analysis

The singular value decomposition (SVD) could be called the "billion-dollar algorithm" since it provides the mathematical basis for many modern algorithms in data science, including text mining, recommender systems (think Netflix and Amazon), image processing, and classification problems. Although the SVD was mathematically discovered in the late 1800s, computers have

Read More

Programming Tips

Rick WicklinAugust 23, 2017 0

The arithmetic-geometric mean

All statisticians are familiar with the classical arithmetic mean. Some statisticians are also familiar with the geometric mean. Whereas the arithmetic mean of n numbers is the sum divided by n, the geometric mean of n nonnegative numbers is the n_th root of the product of the numbers. The geometric

Read More

Programming Tips

Rick WicklinAugust 21, 2017 0

6 tips for timing the performance of algorithms

When you implement a statistical algorithm in a vector-matrix language such as SAS/IML, R, or MATLAB, you should measure the performance of your implementation, which means that you should time how long a program takes to analyze data of varying sizes and characteristics. There are some general tips that can

Read More

Analytics | Data Visualization

Bar chart of pairwise correlations between variables

Rick WicklinAugust 16, 2017 0

Use a bar chart to visualize pairwise correlations

Visualizing the correlations between variables often provides insight into the relationships between variables. I've previously written about how to use a heat map to visualize a correlation matrix in SAS/IML, and Chris Hemedinger showed how to use Base SAS to visualize correlations between variables. Recently a SAS programmer asked how

Read More

Analytics | Learn SAS

Rick WicklinAugust 14, 2017 0

What is rank correlation?

When someone refers to the correlation between two variables, they are probably referring to the Pearson correlation, which is the standard statistic that is taught in elementary statistics courses. Elementary courses do not usually mention that there are other measures of correlation. Why would anyone want a different estimate of

Read More

Advanced Analytics

Classical and robust principal component scores for crime data, computed in SAS

Rick WicklinAugust 9, 2017 0

Robust principal component analysis in SAS

Recently, I was asked whether SAS can perform a principal component analysis (PCA) that is robust to the presence of outliers in the data. A PCA requires a data matrix, an estimate for the center of the data, and an estimate for the variance/covariance of the variables. Classically, these estimates

Read More

Programming Tips

Rick WicklinAugust 7, 2017 0

The curse of non-unique eigenvectors

A SAS customer asked, "I computed the eigenvectors of a matrix in SAS and in another software package. I got different answers? How do I know which answer is correct?" I've been asked variations of this question dozens of times. The answer is usually "both answers are correct." The mathematical

Read More

Programming Tips

Broken-stick method for retaining principal components

Rick WicklinAugust 2, 2017 0

Dimension reduction: Guidelines for retaining principal components

Last week I blogged about the broken-stick problem in probability, which reminded me that the broken-stick model is one of the many techniques that have been proposed for choosing the number of principal components to retain during a principal component analysis. Recall that for a principal component analysis (PCA) of

Read More

Learn SAS | Programming Tips

Rick WicklinJuly 31, 2017 0

Flip it. Flip it good.

A SAS user needed to convert a program from MATLAB into the SAS/IML matrix language and asked whether there is a SAS/IML equivalent to the fliplr and flipud functions in MATLAB. These functions flip the columns or rows (respectively) of a matrix; "LR" stands for "left-right" and "UD" stands for

Read More

Programming Tips

Broken stick problem: What is the probability that three randomly chosen points will break a segment into a triangle?

Rick WicklinJuly 26, 2017 0

Random segments and broken sticks

A classical problem in elementary probability asks for the expected lengths of line segments that result from randomly selecting k points along a segment of unit length. It is both fun and instructive to simulate such problems. This article uses simulation in the SAS/IML language to estimate solutions to the

Read More

Programming Tips

Rick WicklinJuly 24, 2017 0

Difference operators as matrices

For a time series { y1, y2, ..., yN }, the difference operator computes the difference between two observations. The kth-order difference is the series { yk+1 - y1, ..., yN - yN-k }. In SAS, the DIF function in the DATA step computes differences between observations. The DIF function

Read More

Analytics

Rick WicklinJuly 19, 2017 0

A quantile definition for skewness

Skewness is a measure of the asymmetry of a univariate distribution. I have previously shown how to compute the skewness for data distributions in SAS. The previous article computes Pearson's definition of skewness, which is based on the standardized third central moment of the data. Moment-based statistics are sensitive to

Read More

Advanced Analytics | Data Visualization

Prediction regions for a classification problem with two outcomes. Graph created in SAS.

Rick WicklinJuly 17, 2017 0

3 ways to visualize prediction regions for classification problems

An important problem in machine learning is the "classification problem." In this supervised learning problem, you build a statistical model that predicts a set of categorical outcomes (responses) based on a set of input features (explanatory variables). You do this by training the model on data for which the outcomes

Read More

Advanced Analytics | Programming Tips

Rick WicklinJuly 12, 2017 0

The bias-corrected and accelerated (BCa) bootstrap interval

I recently showed how to compute a bootstrap percentile confidence interval in SAS. The percentile interval is a simple "first-order" interval that is formed from quantiles of the bootstrap distribution. However, it has two limitations. First, it does not use the estimate for the original data; it is based only

Read More

Programming Tips

Rick WicklinJuly 10, 2017 0

Bootstrap estimates in SAS/IML

I previously wrote about how to compute a bootstrap confidence interval in Base SAS. As a reminder, the bootstrap method consists of the following steps: Compute the statistic of interest for the original data Resample B times from the data to form B bootstrap samples. B is usually a large

Read More

Previous 1 … 22 23 24 25 26 … 53 Next