Blogs

Blogs

Tag: Data Analysis

Analytics | Data Visualization

Decile calibration curve for a misspecified logistic regression model

Rick WicklinMay 16, 2018 0

Decile calibration plots in SAS

In my article about how to construct calibration plots for logistic regression models in SAS, I mentioned that there are several popular variations of the calibration plot. The previous article showed how to construct a loess-based calibration curve. Austin and Steyerberg (2013) recommend the loess-based curve on the basis of

Read More

Analytics | Data Visualization

Calibration plot for a misspecified logistic model

Rick WicklinMay 14, 2018 0

Calibration plots in SAS

A logistic regression model is a way to predict the probability of a binary response based on values of explanatory variables. It is important to be able to assess the accuracy of a predictive model. This article shows how to construct a calibration plot in SAS. A calibration plot is

Read More

Analytics | Data Visualization

Rick WicklinMay 2, 2018 0

Order variables in a heat map or scatter plot matrix

Order matters. When you create a graph that has a categorical axis (such as a bar chart), it is important to consider the order in which the categories appear. Most software defaults to alphabetical order, which typically gives no insight into how the categories relate to each other. Alphabetical order

Read More

Analytics | Data Visualization

Rick WicklinApril 30, 2018 0

Assign colors in heat maps: A study of married couples and college majors

Some say that opposites attract. Others say that birds of a feather flock together. Which is it? Phillip N. Cohen, a professor of sociology at the University of Maryland, recently posted an interesting visualization that indicates that married couples who are college graduates tend to be birds of a feather.

Read More

Analytics | Programming Tips

Rick WicklinApril 25, 2018 0

An easier way to run thousands of regressions

SAS programmers on SAS discussion forums sometimes ask how to run thousands of regressions of the form Y = B0 + B1*X_i, where i=1,2,.... A similar question asks how to solve thousands of regressions of the form Y_i = B0 + B1*X for thousands of response variables. I have previously

Read More

Data Visualization

Rick WicklinApril 23, 2018 0

The 80-20 rule for blogs

You've probably heard about the "80-20 Rule," which describes many natural and manmade phenomena. This rule is sometimes called the "Pareto Principle" because it was discovered by Vilfredo Pareto (1848–1923) who used it to describe the unequal distribution of wealth. Specifically, in his study, 80% of the wealth was held

Read More

Programming Tips

Rick WicklinApril 4, 2018 0

Distance correlation

Correlation is a statistic that measures how closely two variables are related to each other. The most popular definition of correlation is the Pearson product-moment correlation, which is a measurement of the linear relationship between two variables. Many textbooks stress the linear nature of the Pearson correlation and emphasize that

Read More

Data Visualization

Rick WicklinMarch 14, 2018 0

Visualize repetition in song lyrics

One of my favorite magazines, Significance, printed an intriguing image of a symmetric matrix that shows repetition in a song's lyrics. The image was created by Colin Morris, who has created many similar images. When I saw these images, I knew that I wanted to duplicate the analysis in SAS!

Read More

Analytics | Learn SAS

Rick WicklinFebruary 14, 2018 0

The difference between CLASS statements and BY statements in SAS

When I first learned to program in SAS, I remember being confused about the difference between CLASS statements and BY statements. A novice SAS programmer recently asked when to use one instead of the other, so this article explains the difference between the CLASS statement and BY variables in SAS

Read More

Learn SAS | Programming Tips

Rick WicklinJanuary 10, 2018 0

10 posts from 2017 that deserve a second look

Last week I wrote about the 10 most popular articles from The DO Loop in 2017. My most popular articles tend to be about elementary statistics or SAS programming tips. Less popular are the articles about advanced statistical and programming techniques. However, these technical articles fill an important niche. Not

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinJanuary 8, 2018 0

Label multiple regression lines in SAS

A SAS programmer asked how to label multiple regression lines that are overlaid on a single scatter plot. Specifically, he asked to label the curves that are produced by using the REG statement with the GROUP= option in PROC SGPLOT. He wanted the labels to be the slope and intercept

Read More

Learn SAS | Programming Tips

Rick WicklinJanuary 3, 2018 0

The top 10 posts from The DO Loop in 2017

I wrote more than 100 posts for The DO Loop blog in 2017. The most popular articles were about SAS programming tips, statistical data analysis, and simulation and bootstrap methods. Here are the most popular articles from 2017 in each category. General SAS programming techniques INTCK and INTNX: Do you

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinDecember 20, 2017 0

How to create a sliced fit plot in SAS

I previously showed an easy way to visualize a regression model that has several continuous explanatory variables: use the SLICEFIT option in the EFFECTPLOT statement in SAS to create a sliced fit plot. The EFFECTPLOT statement is directly supported by the syntax of the GENMOD, LOGISTIC, and ORTHOREG procedures in

Read More

Analytics | Data Visualization | Learn SAS

Visualize multivariate regression model by slicing the continuous variables. Graph created by using the EFFECTPLOT SLICEFIT statement in SAS.

Rick WicklinDecember 18, 2017 0

Visualize multivariate regression models by slicing continuous variables

Slice, slice, baby! You've got to slice, slice, baby! When you fit a regression model that has multiple explanatory variables, it is a challenge to effectively visualize the predicted values. This article describes how to visualize the regression model by slicing the explanatory variables. In SAS, you can use the

Read More

Analytics

Bias in regression for mean-imputed explanatory variables

Rick WicklinDecember 6, 2017 0

3 problems with mean imputation

In a previous article, I showed how to use SAS to perform mean imputation. However, there are three problems with using mean-imputed variables in statistical analyses: Mean imputation reduces the variance of the imputed variables. Mean imputation shrinks standard errors, which invalidates most hypothesis tests and the calculation of confidence

Read More

Data Visualization | Programming Tips

Rick WicklinNovember 29, 2017 0

Visualize patterns of missing values

Missing values present challenges for the statistical analyst and data scientist. Many modeling techniques (such as regression) exclude observations that contain missing values, which can reduce the sample size and reduce the power of a statistical analysis. Before you try to deal with missing values in an analysis (for example,

Read More

Analytics

Principal component regression in SAS: Loadings plot

Rick WicklinOctober 25, 2017 0

Should you use principal component regression?

This article describes the advantages and disadvantages of principal component regression (PCR). This article also presents alternative techniques to PCR. In a previous article, I showed how to compute a principal component regression in SAS. Recall that principal component regression is a technique for handling near collinearities among the regression

Read More

Analytics | Learn SAS

Diffogram for multiple comparisons of means in SAS

Rick WicklinOctober 18, 2017 0

The diffogram and other graphs for multiple comparisons of means

In a previous article, I discussed the lines plot for multiple comparisons of means. Another graph that is frequently used for multiple comparisons is the diffogram, which indicates whether the pairwise differences between means of groups are statistically significant. This article discusses how to interpret a diffogram. Two related plots

Read More

Analytics | Learn SAS

Rick WicklinOctober 18, 2017 0

The diffogram and other graphs for multiple comparisons of means

In a previous article, I discussed the lines plot for multiple comparisons of means. Another graph that is frequently used for multiple comparisons is the diffogram, which indicates whether the pairwise differences between means of groups are statistically significant. This article discusses how to interpret a diffogram. Two related plots

Read More

Analytics | Data Visualization | Learn SAS

Lines plot for multiple comparison of means in SAS

Rick WicklinOctober 16, 2017 0

Graphs for multiple comparisons of means: The lines plot

Last week Warren Kuhfeld wrote about a graph called the "lines plot" that is produced by SAS/STAT procedures in SAS 9.4M5. (Notice that the "lines plot" has an 's'; it is not a line plot!) The lines plot is produced as part of an analysis that performs multiple comparisons of

Read More

Programming Tips

Heat map of correlations between variables

Rick WicklinOctober 9, 2017 0

Order correlations by magnitude

Correlations between variables are typically displayed in a matrix. Because the correlation matrix is determined by the order of the variables, it is difficult to find the largest and smallest correlations, which is why analysts sometimes use colors to visualize the correlation matrix. Another visualization option is the pairwise correlation

Read More

Analytics | Data Visualization | Learn SAS

Weighted histogram in SAS

Rick WicklinOctober 4, 2017 0

Create and interpret a weighted histogram

If you perform a weighted statistical analysis, it can be useful to produce a statistical graph that also incorporates the weights. This article shows how to construct and interpret a weighted histogram in SAS. How to construct a weighted histogram Before constructing a weighted histogram, let's review the construction of

Read More

Analytics | Learn SAS

Visualization of regression that uses a weight variable in SAS

Rick WicklinOctober 2, 2017 0

How to understand weight variables in statistical analyses

How can you specify weights for a statistical analysis? Hmmm, that's a "weighty" question! Many people on discussion forums ask "What is a weight variable?" and "How do you choose a weight for each observation?" This article gives a brief overview of weight variables in statistics and includes examples of

Read More

Analytics | Learn SAS

Rick WicklinSeptember 20, 2017 0

Fisher's transformation of the correlation coefficient

Pearson's correlation measures the linear association between two variables. Because the correlation is bounded between [-1, 1], the sampling distribution for highly correlated variables is highly skewed. Even for bivariate normal data, the skewness makes it challenging to estimate confidence intervals for the correlation, to run one-sample hypothesis tests ("Is

Read More

Learn SAS | Programming Tips

Rick WicklinSeptember 7, 2017 0

Construct polynomial effects in SAS regression models

If you use SAS regression procedures, you are probably familiar with the "stars and bars" notation, which enables you to construct interaction effects in regression models. Although you can construct many regression models by using that classical notation, a friend recently reminded me that the EFFECT statement in SAS provides

Read More

Analytics

Rick WicklinSeptember 5, 2017 0

7 ways to view correlation

Correlation is a fundamental statistical concept that measures the linear association between two variables. There are multiple ways to think about correlation: geometrically, algebraically, with matrices, with vectors, with regression, and more. To paraphrase the great songwriter Paul Simon, there must be 50 ways to view your correlation! But don't

Read More

Advanced Analytics

Rick WicklinAugust 30, 2017 0

The singular value decomposition and low-rank approximations

A previous article discussed the mathematical properties of the singular value decomposition (SVD) and showed how to use the SVD subroutine in SAS/IML software. This article uses the SVD to construct a low-rank approximation to an image. Applications include image compression and denoising an image. Construct a grayscale image The

Read More

Analytics | Data Visualization

Bar chart of pairwise correlations between variables

Rick WicklinAugust 16, 2017 0

Use a bar chart to visualize pairwise correlations

Visualizing the correlations between variables often provides insight into the relationships between variables. I've previously written about how to use a heat map to visualize a correlation matrix in SAS/IML, and Chris Hemedinger showed how to use Base SAS to visualize correlations between variables. Recently a SAS programmer asked how

Read More

Analytics | Learn SAS

Rick WicklinAugust 14, 2017 0

What is rank correlation?

When someone refers to the correlation between two variables, they are probably referring to the Pearson correlation, which is the standard statistic that is taught in elementary statistics courses. Elementary courses do not usually mention that there are other measures of correlation. Why would anyone want a different estimate of

Read More

Advanced Analytics

Classical and robust principal component scores for crime data, computed in SAS

Rick WicklinAugust 9, 2017 0

Robust principal component analysis in SAS

Recently, I was asked whether SAS can perform a principal component analysis (PCA) that is robust to the presence of outliers in the data. A PCA requires a data matrix, an estimate for the center of the data, and an estimate for the variance/covariance of the variables. Classically, these estimates

Read More

Previous 1 … 6 7 8 9 10 … 17 Next