The DO Loop

Graphical comparison of two methods for estimating confidence intervals of eigenvalues of a correlation matrix

Rick WicklinOctober 26, 2020 3

Confidence intervals for eigenvalues of a correlation matrix

A fundamental principle of data analysis is that a statistic is an estimate of a parameter for the population. A statistic is calculated from a random sample. This leads to uncertainty in the estimate: a different random sample would have produced a different statistic. To quantify the uncertainty, SAS procedures

English

Analytics | Programming Tips

Rick WicklinSeptember 10, 2020 0

Iterative proportional fitting in SAS

I previously wrote about the RAS algorithm, which is a simple algorithm that performs matrix balancing. Matrix balancing refers to adjusting the cells of a frequency table to match known values of the row and column sums. Ideally, the balanced matrix will reflect the structural relationships in the original matrix.

English

Programming Tips

Rick WicklinAugust 10, 2020 2

4 ways to standardize data in SAS

A common operation in statistical data analysis is to center and scale a numerical variable. This operation is conceptually easy: you subtract the mean of the variable and divide by the variable's standard deviation. Recently, I wanted to perform a slight variation of the usual standardization: Perform a different standardization

English

Learn SAS | Programming Tips

Rick WicklinJuly 20, 2020 0

Compute within-group multivariate statistics and store them in a list

I recently showed how to compute within-group multivariate statistics by using the SAS/IML language. However, a principal of good software design is to encapsulate functionality and write self-contained functions that compute and return the results. What is the best way to return multiple statistics from a SAS/IML module? A convenient

English

Analytics | Data Visualization | Learn SAS

Rick WicklinJuly 1, 2020 4

Pooled, within-group, and between-group covariance matrices

A previous article discusses the pooled variance for two or groups of univariate data. The pooled variance is often used during a t test of two independent samples. For multivariate data, the analogous concept is the pooled covariance matrix, which is an average of the sample covariance matrices of the

English

Programming Tips

Rick WicklinJune 29, 2020 4

What is a pooled variance?

The first time I saw a formula for the pooled variance, I was quite confused. It looked like Frankenstein's monster, assembled from bits and pieces of other quantities and brought to life by a madman. However, the pooled variance does not have to be a confusing monstrosity. The verb "to

English

Analytics

Rick WicklinJune 8, 2020 0

Interactions with spline effects in regression models

A SAS customer asked how to specify interaction effects between a classification variable and a spline effect in a SAS regression procedure. There are at least two ways to do this. If the SAS procedure supports the EFFECT statement, you can build the interaction term in the MODEL statement. For

English

Analytics | Learn SAS

Rick WicklinJune 3, 2020 3

How to estimate the difference between percentiles

I recently read an article that describes ways to compute confidence intervals for the difference in a percentile between two groups. In Eaton, Moore, and MacKenzie (2019), the authors describe a problem in hydrology. The data are the sizes of pebbles (grains) in rivers at two different sites. The authors

English

Advanced Analytics | Machine Learning

Rick WicklinMay 28, 2020 1

Minimizing the Kullback–Leibler divergence

The Kullback–Leibler divergence is a measure of dissimilarity between two probability distributions. An application in machine learning is to measure how distributions in a parametric family differ from a data distribution. This article shows that if you minimize the Kullback–Leibler divergence over a set of parameters, you can find a

English

Analytics | Data Visualization

Rick WicklinMay 6, 2020 3

What does 'flatten the curve' mean? To which curve does it apply?

During this coronavirus pandemic, there are many COVID-related graphs and curves in the news and on social media. The public, politicians, and pundits scrutinize each day's graphs to determine which communities are winning the fight against coronavirus. Interspersed among these many graphs is the oft-repeated mantra, "Flatten the curve!" As

English

Blogs

Blogs

Tag: Data Analysis