I regularly see questions on a SAS discussion forum about how to visualize the predicted values for a mixed model that has at least one continuous variable, a categorical variable, and possibly an interaction term. SAS procedures such as GLM, GENMOD, and LOGISTIC can automatically produce plots of the predicted
Tag: Data Analysis
Many data analysts use a quantile-quantile plot (Q-Q plot) to graphically assess whether data can be modeled by a probability distribution such as the normal, lognormal, or gamma distribution. You can use the QQPLOT statement in PROC UNIVARIATE to create a Q-Q plot for about a dozen common distributions. However,
Recently a SAS programmer wanted to obtain a table of counts that was based on a histogram. I showed him how you can use the OUTHIST= option on the HISTOGRAM statement in PROC UNIVARIATE to obtain that information. For example, the following call to PROC UNIVARIATE creates a histogram for
I remember the first time I used PROC GLM in SAS to include a classification effect in a regression model. I thought I had done something wrong because the parameter estimates table was followed by a scary-looking note: Note: The X'X matrix has been found to be singular, and a
Last week my colleague, Robert Allison, visualized data regarding immunization rates for kindergarten classes in North Carolina. One of his graphs was a scatter plot that displayed the proportion of unimmunized students versus the size of the class for 1,885 kindergarten classes in NC. This scatter plot is the basis
A data analyst asked how to compute parameter estimates in a linear regression model when the underlying data matrix is rank deficient. This situation can occur if one of the variables in the regression is a linear combination of other variables. It also occurs when you use the GLM parameterization
An ROC curve graphically summarizes the tradeoff between true positives and true negatives for a rule or model that predicts a binary response variable. An ROC curve is a parametric curve that is constructed by varying the cutpoint value at which estimated probabilities are considered to predict the binary event.
Will the real Pareto distribution please stand up? SAS supports three different distributions that are named "Pareto." The Wikipedia page for the Pareto distribution lists five different "Pareto" distributions, including the three that SAS supports. This article shows how to fit the two-parameter Pareto distribution in SAS and discusses the
In a recent article about nonlinear least squares, I wrote, "you can often fit one model and use the ESTIMATE statement to estimate the parameters in a different parameterization." This article expands on that statement. It shows how to fit a model for one set of parameters and use the
There are several ways to use SAS to get the unique values for a data variable. In Base SAS, you can use the TABLES statement in PROC FREQ to generate a table of unique values (and the counts). You can also use the DISTINCT function in PROC SQL to get
This article shows how to use SAS to fit a growth curve to data. Growth curves model the evolution of a quantity over time. Examples include population growth, the height of a child, and the growth of a tumor cell. This article focuses on using PROC NLIN to estimate the
Programmers on a SAS discussion forum recently asked about the chi-square test for proportions as implemented in PROC FREQ in SAS. One person asked the basic question, "how do I test the null hypothesis that the observed proportions are equal to a set of known proportions?" Another person said that
Have you ever tried to type a movie title by using a TV remote control? Both Netflix and Amazon Video provide an interface (a virtual keyboard) that enables you to use the four arrow keys of a standard remote control to type letters. The letters are arranged in a regular
A frequent topic on SAS discussion forums is how to check the assumptions of an ordinary least squares linear regression model. Some posts indicate misconceptions about the assumptions of linear regression. In particular, I see incorrect statements such as the following: Help! A histogram of my variables shows that they
My colleague, Robert Allison, recently published an interesting visualization of the relationship between chess ratings and age. His post was inspired by the article "Age vs Elo — Your battle against time," which was published on the chess.com website. ("Elo" is one of the rating systems in chess.) Robert Allison's
This article shows how to score (evaluate) a quantile regression model on new data. SAS supports several procedures for quantile regression, including the QUANTREG, QUANTSELECT, and HPQUANTSELECT procedures. The first two procedures do not support any of the modern methods for scoring regression models, so you must use the "missing
When you use a regression procedure in SAS that supports variable selection (GLMSELECT or QUANTSELECT), did you know that the procedures automatically produce a macro variable that contains the names of the selected variables? This article provides examples and details. A previous article provides an overview of the 'SELECT' procedures
A programmer recently asked a question on a SAS discussion forum about design matrices for categorical variables. He had generated a design matrix by using PROC GLMMOD and wanted to use the design columns in a subsequent procedure. However, the columns were named COL1, COL2, COL3,..., so he couldn't tell
Back in SAS 9.3M2 (SAS/STAT 12.1), PROC FREQ introduced mosaic plots to visualize the joint frequencies in a contingency table. By default, the cells in a mosaic plot are colored according to levels of one of the categorical variables in the analysis. However, in 2013 I showed how you can
SAS enables you to evaluate a regression model at any location within the range of the data. However, sometimes you might be interested in how the predicted response is increasing or decreasing at specified locations. You can use finite differences to compute the slope (first derivative) of a regression model.
Which president of the United States is ranked the greatest by presidential historians? This article visualizes the results of the 2018 Presidential Greatness Survey, which was created and administered by B. Rottinghaus and J. Vaughn. They analyzed 166 responses from experts in political science who ranked the 44 US presidents
This article describes how to obtain an initial guess for nonlinear regression models, especially nonlinear mixed models. The technique is to first fit a simpler fixed-effects model by replacing the random effects with their expected values. The parameter estimates for the fixed-effects model are often good initial guesses for the
When you fit nonlinear fixed-effect or mixed models, it is difficult to guess the model parameters that fit the data. Yet, most nonlinear regression procedures (such as PROC NLIN and PROC NLMIXED in SAS) require that you provide a good guess! If your guess is not good, the fitting algorithm,
Years ago, I wrote an article about how to create a Top 10 table and bar chart. The program can be trivially modified to create a "Top N" table and plot, such as Top 5, Top 20, or even Top 100. Not long after the article was written, the developer
A previous article showed how to use a calibration plot to visualize the goodness-of-fit for a logistic regression model. It is common to overlay a scatter plot of the binary response on a predicted probability plot (below, left) and on a calibration plot (below, right): The SAS program that creates
In my article about how to construct calibration plots for logistic regression models in SAS, I mentioned that there are several popular variations of the calibration plot. The previous article showed how to construct a loess-based calibration curve. Austin and Steyerberg (2013) recommend the loess-based curve on the basis of
A logistic regression model is a way to predict the probability of a binary response based on values of explanatory variables. It is important to be able to assess the accuracy of a predictive model. This article shows how to construct a calibration plot in SAS. A calibration plot is
Order matters. When you create a graph that has a categorical axis (such as a bar chart), it is important to consider the order in which the categories appear. Most software defaults to alphabetical order, which typically gives no insight into how the categories relate to each other. Alphabetical order
Some say that opposites attract. Others say that birds of a feather flock together. Which is it? Phillip N. Cohen, a professor of sociology at the University of Maryland, recently posted an interesting visualization that indicates that married couples who are college graduates tend to be birds of a feather.
SAS programmers on SAS discussion forums sometimes ask how to run thousands of regressions of the form Y = B0 + B1*X_i, where i=1,2,.... A similar question asks how to solve thousands of regressions of the form Y_i = B0 + B1*X for thousands of response variables. I have previously