Correlation is a statistic that measures how closely two variables are related to each other. The most popular definition of correlation is the Pearson product-moment correlation, which is a measurement of the linear relationship between two variables. Many textbooks stress the linear nature of the Pearson correlation and emphasize that
Tag: Data Analysis
One of my favorite magazines, Significance, printed an intriguing image of a symmetric matrix that shows repetition in a song's lyrics. The image was created by Colin Morris, who has created many similar images. When I saw these images, I knew that I wanted to duplicate the analysis in SAS!
When I first learned to program in SAS, I remember being confused about the difference between CLASS statements and BY statements. A novice SAS programmer recently asked when to use one instead of the other, so this article explains the difference between the CLASS statement and BY variables in SAS
Last week I wrote about the 10 most popular articles from The DO Loop in 2017. My most popular articles tend to be about elementary statistics or SAS programming tips. Less popular are the articles about advanced statistical and programming techniques. However, these technical articles fill an important niche. Not
A SAS programmer asked how to label multiple regression lines that are overlaid on a single scatter plot. Specifically, he asked to label the curves that are produced by using the REG statement with the GROUP= option in PROC SGPLOT. He wanted the labels to be the slope and intercept
I wrote more than 100 posts for The DO Loop blog in 2017. The most popular articles were about SAS programming tips, statistical data analysis, and simulation and bootstrap methods. Here are the most popular articles from 2017 in each category. General SAS programming techniques INTCK and INTNX: Do you
I previously showed an easy way to visualize a regression model that has several continuous explanatory variables: use the SLICEFIT option in the EFFECTPLOT statement in SAS to create a sliced fit plot. The EFFECTPLOT statement is directly supported by the syntax of the GENMOD, LOGISTIC, and ORTHOREG procedures in
Slice, slice, baby! You've got to slice, slice, baby! When you fit a regression model that has multiple explanatory variables, it is a challenge to effectively visualize the predicted values. This article describes how to visualize the regression model by slicing the explanatory variables. In SAS, you can use the
In a previous article, I showed how to use SAS to perform mean imputation. However, there are three problems with using mean-imputed variables in statistical analyses: Mean imputation reduces the variance of the imputed variables. Mean imputation shrinks standard errors, which invalidates most hypothesis tests and the calculation of confidence
Missing values present challenges for the statistical analyst and data scientist. Many modeling techniques (such as regression) exclude observations that contain missing values, which can reduce the sample size and reduce the power of a statistical analysis. Before you try to deal with missing values in an analysis (for example,