The DO Loop

Rick WicklinJuly 15, 2020 1

How to evaluate the multivariate normal log likelihood

The multivariate normal distribution is used frequently in multivariate statistics and machine learning. In many applications, you need to evaluate the log-likelihood function in order to compare how well different models fit the data. The log-likelihood for a vector x is the natural logarithm of the multivariate normal (MVN) density

English

Analytics | Data Visualization | Learn SAS

Rick WicklinJuly 1, 2020 4

Pooled, within-group, and between-group covariance matrices

A previous article discusses the pooled variance for two or groups of univariate data. The pooled variance is often used during a t test of two independent samples. For multivariate data, the analogous concept is the pooled covariance matrix, which is an average of the sample covariance matrices of the

English

Analytics | Programming Tips

Rick WicklinJune 24, 2020 2

The Kolmogorov D distribution and exact critical values

If you have ever run a Kolmogorov-Smirnov test for normality, you have encountered the Kolmogorov D statistic. The Kolmogorov D statistic is used to assess whether a random sample was drawn from a specified distribution. Although it is frequently used to test for normality, the statistic is "distribution free" in

English

Advanced Analytics | Machine Learning

Rick WicklinMay 26, 2020 0

The Kullback–Leibler divergence between discrete probability distributions

If you have been learning about machine learning or mathematical statistics, you might have heard about the Kullback–Leibler divergence. The Kullback–Leibler divergence is a measure of dissimilarity between two probability distributions. It measures how much one distribution differs from a reference distribution. This article explains the Kullback–Leibler divergence and shows

English

Learn SAS | Programming Tips

Rick WicklinMarch 18, 2020 4

Print SAS/IML variables with formats

A SAS/IML programmer asked about the best way to print multiple SAS/IML variables when each variable needs a different format. He wanted the output to resemble the "Parameter Estimates" table that is produced by PROC REG and other SAS/STAT procedures. This article shows four ways to print SAS/IML vectors in

English

Analytics | Programming Tips

Rick WicklinMarch 16, 2020 2

Predict a random integer: The tradeoff between bias and variance

Books about statistics and machine learning often discuss the tradeoff between bias and variance for an estimator. These discussions are often motivated by a sophisticated predictive model such as a regression or a decision tree. But the basic idea can be seen in much simpler situations. This article presents a

English

Advanced Analytics | Data Visualization | Programming Tips

Rick WicklinMarch 9, 2020 2

ROC curves for a binormal sample

In a previous article, I discussed the binormal model for a binary classification problem. This model assumes a set of scores that are normally distributed for each population, and the mean of the scores for the Negative population is less than the mean of scores for the Positive population. I

English

Learn SAS | Programming Tips

Rick WicklinMarch 4, 2020 0

Store pre-computed matrices in a list

Suppose that a data set contains a set of parameter values. For each row of parameters, you need to perform some computation. A recent discussion on the SAS Support Communities mentions an important point: if there are duplicate rows in the data, a program might repeat the same computation several

English

Analytics | Data Visualization

Rick WicklinFebruary 26, 2020 0

The binormal model for ROC curves

The ROC curve is a graphical method that summarizes how well a binary classifier can discriminate between two populations, often called the "negative" population (individuals who do not have a disease or characteristic) and the "positive" population (individuals who do have it). As shown in a previous article, there is

English

Learn SAS | Programming Tips

Rick WicklinFebruary 19, 2020 3

A list of SAS DATA step functions that do not run in CAS

Are you a statistical programmer whose company has adopted SAS Viya? If so, you probably know that the DATA step can run in parallel in SAS Cloud Analytic Services (CAS). As Sekosky (2017) says, "running in a single thread in SAS is different from running in many threads in CAS."

English

Blogs

Blogs

Tag: Statistical Programming