## Tag: Data Analysis

0
Meaningful names for columns of a design matrix

A programmer recently asked a question on a SAS discussion forum about design matrices for categorical variables. He had generated a design matrix by using PROC GLMMOD and wanted to use the design columns in a subsequent procedure. However, the columns were named COL1, COL2, COL3,..., so he couldn't tell

Programming Tips
0
Color cells in a mosaic plot by deviation from independence

Back in SAS 9.3M2 (SAS/STAT 12.1), PROC FREQ introduced mosaic plots to visualize the joint frequencies in a contingency table. By default, the cells in a mosaic plot are colored according to levels of one of the categorical variables in the analysis. However, in 2013 I showed how you can

0
Compute derivatives for nonparametric regression models

SAS enables you to evaluate a regression model at any location within the range of the data. However, sometimes you might be interested in how the predicted response is increasing or decreasing at specified locations. You can use finite differences to compute the slope (first derivative) of a regression model.

Data Visualization
0
Ranking US presidents

Which president of the United States is ranked the greatest by presidential historians? This article visualizes the results of the 2018 Presidential Greatness Survey, which was created and administered by B. Rottinghaus and J. Vaughn. They analyzed 166 responses from experts in political science who ranked the 44 US presidents

0
Reduced models: A way to choose initial parameters for a mixed model

This article describes how to obtain an initial guess for nonlinear regression models, especially nonlinear mixed models. The technique is to first fit a simpler fixed-effects model by replacing the random effects with their expected values. The parameter estimates for the fixed-effects model are often good initial guesses for the

Analytics
0
Use a grid search to find initial parameter values for regression models in SAS

When you fit nonlinear fixed-effect or mixed models, it is difficult to guess the model parameters that fit the data. Yet, most nonlinear regression procedures (such as PROC NLIN and PROC NLMIXED in SAS) require that you provide a good guess! If your guess is not good, the fitting algorithm,

Programming Tips
0
An easy way to make a "Top 10" table and bar chart in SAS

Years ago, I wrote an article about how to create a Top 10 table and bar chart. The program can be trivially modified to create a "Top N" table and plot, such as Top 5, Top 20, or even Top 100. Not long after the article was written, the developer

0
Use a fringe plot to visualize binary data in logistic models

A previous article showed how to use a calibration plot to visualize the goodness-of-fit for a logistic regression model. It is common to overlay a scatter plot of the binary response on a predicted probability plot (below, left) and on a calibration plot (below, right): The SAS program that creates

0
Decile calibration plots in SAS

In my article about how to construct calibration plots for logistic regression models in SAS, I mentioned that there are several popular variations of the calibration plot. The previous article showed how to construct a loess-based calibration curve. Austin and Steyerberg (2013) recommend the loess-based curve on the basis of

0
Calibration plots in SAS

A logistic regression model is a way to predict the probability of a binary response based on values of explanatory variables. It is important to be able to assess the accuracy of a predictive model. This article shows how to construct a calibration plot in SAS. A calibration plot is

0
Order variables in a heat map or scatter plot matrix

Order matters. When you create a graph that has a categorical axis (such as a bar chart), it is important to consider the order in which the categories appear. Most software defaults to alphabetical order, which typically gives no insight into how the categories relate to each other. Alphabetical order

0
Assign colors in heat maps: A study of married couples and college majors

Some say that opposites attract. Others say that birds of a feather flock together. Which is it? Phillip N. Cohen, a professor of sociology at the University of Maryland, recently posted an interesting visualization that indicates that married couples who are college graduates tend to be birds of a feather.

0
An easier way to run thousands of regressions

SAS programmers on SAS discussion forums sometimes ask how to run thousands of regressions of the form Y = B0 + B1*X_i, where i=1,2,.... A similar question asks how to solve thousands of regressions of the form Y_i = B0 + B1*X for thousands of response variables. I have previously

Data Visualization
0
The 80-20 rule for blogs

You've probably heard about the "80-20 Rule," which describes many natural and manmade phenomena. This rule is sometimes called the "Pareto Principle" because it was discovered by Vilfredo Pareto (1848–1923) who used it to describe the unequal distribution of wealth. Specifically, in his study, 80% of the wealth was held

Programming Tips
0
Distance correlation

Correlation is a statistic that measures how closely two variables are related to each other. The most popular definition of correlation is the Pearson product-moment correlation, which is a measurement of the linear relationship between two variables. Many textbooks stress the linear nature of the Pearson correlation and emphasize that

Data Visualization
0
Visualize repetition in song lyrics

One of my favorite magazines, Significance, printed an intriguing image of a symmetric matrix that shows repetition in a song's lyrics. The image was created by Colin Morris, who has created many similar images. When I saw these images, I knew that I wanted to duplicate the analysis in SAS!

0
The difference between CLASS statements and BY statements in SAS

When I first learned to program in SAS, I remember being confused about the difference between CLASS statements and BY statements. A novice SAS programmer recently asked when to use one instead of the other, so this article explains the difference between the CLASS statement and BY variables in SAS

0
10 posts from 2017 that deserve a second look

Last week I wrote about the 10 most popular articles from The DO Loop in 2017. My most popular articles tend to be about elementary statistics or SAS programming tips. Less popular are the articles about advanced statistical and programming techniques. However, these technical articles fill an important niche. Not

0
Label multiple regression lines in SAS

A SAS programmer asked how to label multiple regression lines that are overlaid on a single scatter plot. Specifically, he asked to label the curves that are produced by using the REG statement with the GROUP= option in PROC SGPLOT. He wanted the labels to be the slope and intercept

0
The top 10 posts from The DO Loop in 2017

I wrote more than 100 posts for The DO Loop blog in 2017. The most popular articles were about SAS programming tips, statistical data analysis, and simulation and bootstrap methods. Here are the most popular articles from 2017 in each category. General SAS programming techniques INTCK and INTNX: Do you

0
How to create a sliced fit plot in SAS

I previously showed an easy way to visualize a regression model that has several continuous explanatory variables: use the SLICEFIT option in the EFFECTPLOT statement in SAS to create a sliced fit plot. The EFFECTPLOT statement is directly supported by the syntax of the GENMOD, LOGISTIC, and ORTHOREG procedures in

0
Visualize multivariate regression models by slicing continuous variables

Slice, slice, baby! You've got to slice, slice, baby! When you fit a regression model that has multiple explanatory variables, it is a challenge to effectively visualize the predicted values. This article describes how to visualize the regression model by slicing the explanatory variables. In SAS, you can use the

Analytics
0
3 problems with mean imputation

In a previous article, I showed how to use SAS to perform mean imputation. However, there are three problems with using mean-imputed variables in statistical analyses: Mean imputation reduces the variance of the imputed variables. Mean imputation shrinks standard errors, which invalidates most hypothesis tests and the calculation of confidence

0
Visualize patterns of missing values

Missing values present challenges for the statistical analyst and data scientist. Many modeling techniques (such as regression) exclude observations that contain missing values, which can reduce the sample size and reduce the power of a statistical analysis. Before you try to deal with missing values in an analysis (for example,

Analytics
0
Should you use principal component regression?

This article describes the advantages and disadvantages of principal component regression (PCR). This article also presents alternative techniques to PCR. In a previous article, I showed how to compute a principal component regression in SAS. Recall that principal component regression is a technique for handling near collinearities among the regression

0
The diffogram and other graphs for multiple comparisons of means

In a previous article, I discussed the lines plot for multiple comparisons of means. Another graph that is frequently used for multiple comparisons is the diffogram, which indicates whether the pairwise differences between means of groups are statistically significant. This article discusses how to interpret a diffogram. Two related plots

0
The diffogram and other graphs for multiple comparisons of means

In a previous article, I discussed the lines plot for multiple comparisons of means. Another graph that is frequently used for multiple comparisons is the diffogram, which indicates whether the pairwise differences between means of groups are statistically significant. This article discusses how to interpret a diffogram. Two related plots

0
Graphs for multiple comparisons of means: The lines plot

Last week Warren Kuhfeld wrote about a graph called the "lines plot" that is produced by SAS/STAT procedures in SAS 9.4M5. (Notice that the "lines plot" has an 's'; it is not a line plot!) The lines plot is produced as part of an analysis that performs multiple comparisons of

Programming Tips
0
Order correlations by magnitude

Correlations between variables are typically displayed in a matrix. Because the correlation matrix is determined by the order of the variables, it is difficult to find the largest and smallest correlations, which is why analysts sometimes use colors to visualize the correlation matrix. Another visualization option is the pairwise correlation

0
Create and interpret a weighted histogram

If you perform a weighted statistical analysis, it can be useful to produce a statistical graph that also incorporates the weights. This article shows how to construct and interpret a weighted histogram in SAS. How to construct a weighted histogram Before constructing a weighted histogram, let's review the construction of