Given a rectangular grid with unit spacing, what is the expected distance between two random vertices, where distance is measured in the L1 metric? (Here "random" means "uniformly at random.") I recently needed this answer for some small grids, such as the one to the right, which is a 7 x 6

## Tag: **Statistical Graphics**

The SGPLOT procedure in SAS makes it easy to create graphs that overlay various groups in the data. Many statements support the GROUP= option, which specifies that the graph should overlay group information. For example, you can create side-by-side bar charts and box plots, and you can overlay multiple scatter

This article shows how to score (evaluate) a quantile regression model on new data. SAS supports several procedures for quantile regression, including the QUANTREG, QUANTSELECT, and HPQUANTSELECT procedures. The first two procedures do not support any of the modern methods for scoring regression models, so you must use the "missing

Which president of the United States is ranked the greatest by presidential historians? This article visualizes the results of the 2018 Presidential Greatness Survey, which was created and administered by B. Rottinghaus and J. Vaughn. They analyzed 166 responses from experts in political science who ranked the 44 US presidents

If you use PROC SGPLOT to create ODS graphics, "ATTRS" are everywhere. ATTRS is an abbreviation of "attributes." Most options that change the attributes of a graphical element end with the ATTRS suffix. For example, the MARKERATTRS option modifies attributes of markers, the LINEATTRS option modifies attributes of lines, and

A previous article showed how to use a calibration plot to visualize the goodness-of-fit for a logistic regression model. It is common to overlay a scatter plot of the binary response on a predicted probability plot (below, left) and on a calibration plot (below, right): The SAS program that creates

This article shows how to construct a butterfly plot in SAS. A butterfly plot (also called a butterfly chart) is a comparative bar chart or histogram that displays the distribution of a variable for two subpopulations. A butterfly plot for the cholesterol readings of 5,057 patients in a medical study

In my article about how to construct calibration plots for logistic regression models in SAS, I mentioned that there are several popular variations of the calibration plot. The previous article showed how to construct a loess-based calibration curve. Austin and Steyerberg (2013) recommend the loess-based curve on the basis of

A logistic regression model is a way to predict the probability of a binary response based on values of explanatory variables. It is important to be able to assess the accuracy of a predictive model. This article shows how to construct a calibration plot in SAS. A calibration plot is

Order matters. When you create a graph that has a categorical axis (such as a bar chart), it is important to consider the order in which the categories appear. Most software defaults to alphabetical order, which typically gives no insight into how the categories relate to each other. Alphabetical order

You've probably heard about the "80-20 Rule," which describes many natural and manmade phenomena. This rule is sometimes called the "Pareto Principle" because it was discovered by Vilfredo Pareto (1848–1923) who used it to describe the unequal distribution of wealth. Specifically, in his study, 80% of the wealth was held

Simulation studies are used for many purposes, one of which is to examine how distributional assumptions affect the coverage probability of a confidence interval. This article describes the "zipper plot," which enables you to compare the coverage probability of a confidence interval when the data do or do not follow

Did you know that SAS can combine or "merge" a symbol and a line pattern into a single legend item, as shown below? This kind of legend is useful when you are overlaying a group of curves onto a scatter plot. It enables the reader to quickly associate values of

This article shows how to construct a "stacked band plot" in SAS, as shown to the right. (Click to enlarge.) You are probably familiar with a stacked bar chart in which the cumulative amount of some quantity is displayed by stacking the contributions of several groups. A canonical example is

Money magazine (Jan/Feb 2018) contains an article about how much it costs to give birth in the US. The costs, which are based on insurance data, include prenatal care and hospital delivery but exclude infant care. The data are compiled for each state (including Washington, DC) and by type of

Last week I wrote about the 10 most popular articles from The DO Loop in 2017. My most popular articles tend to be about elementary statistics or SAS programming tips. Less popular are the articles about advanced statistical and programming techniques. However, these technical articles fill an important niche. Not

A SAS programmer asked how to label multiple regression lines that are overlaid on a single scatter plot. Specifically, he asked to label the curves that are produced by using the REG statement with the GROUP= option in PROC SGPLOT. He wanted the labels to be the slope and intercept

I previously showed an easy way to visualize a regression model that has several continuous explanatory variables: use the SLICEFIT option in the EFFECTPLOT statement in SAS to create a sliced fit plot. The EFFECTPLOT statement is directly supported by the syntax of the GENMOD, LOGISTIC, and ORTHOREG procedures in

Missing values present challenges for the statistical analyst and data scientist. Many modeling techniques (such as regression) exclude observations that contain missing values, which can reduce the sample size and reduce the power of a statistical analysis. Before you try to deal with missing values in an analysis (for example,

This article demonstrates a SAS programming technique that I call Kuhfeld's template modification technique. The technique enables you to dynamically modify an ODS template and immediately call the modified template to produce a new graph or table. By following the five steps in this article, you can implement the technique

In a previous article, I discussed the lines plot for multiple comparisons of means. Another graph that is frequently used for multiple comparisons is the diffogram, which indicates whether the pairwise differences between means of groups are statistically significant. This article discusses how to interpret a diffogram. Two related plots

In a previous article, I discussed the lines plot for multiple comparisons of means. Another graph that is frequently used for multiple comparisons is the diffogram, which indicates whether the pairwise differences between means of groups are statistically significant. This article discusses how to interpret a diffogram. Two related plots

Last week Warren Kuhfeld wrote about a graph called the "lines plot" that is produced by SAS/STAT procedures in SAS 9.4M5. (Notice that the "lines plot" has an 's'; it is not a line plot!) The lines plot is produced as part of an analysis that performs multiple comparisons of

If you perform a weighted statistical analysis, it can be useful to produce a statistical graph that also incorporates the weights. This article shows how to construct and interpret a weighted histogram in SAS. How to construct a weighted histogram Before constructing a weighted histogram, let's review the construction of

Toe bone connected to the foot bone, Foot bone connected to the leg bone, Leg bone connected to the knee bone,... — American Spiritual, "Dem Bones" Last week I read an interesting article on Robert Kosara's data visualization blog. Kosara connected the geographic centers of the US zip codes in

An important problem in machine learning is the "classification problem." In this supervised learning problem, you build a statistical model that predicts a set of categorical outcomes (responses) based on a set of input features (explanatory variables). You do this by training the model on data for which the outcomes

By default, when you use the SERIES statement in PROC SGPLOT to create a line plot, the observations are connected (in order) by straight line segments. However, SAS 9.4m1 introduced the SMOOTHCONNECT option which, as the name implies, uses a smooth curve to connect the observations. In Sanjay Matange's blog,

If a financial analyst says it is "likely" that a company will be profitable next year, what probability would you ascribe to that statement? If an intelligence report claims that there is "little chance" of a terrorist attack against an embassy, should the ambassador interpret this as a one-in-a-hundred chance,

Most SAS regression procedures support a CLASS statement which internally generates dummy variables for categorical variables. I have previously described what dummy variables are and how are they used. I have also written about how to create design matrices that contain dummy variables in SAS, and in particular how to

There are several ways to visualize data in a two-way ANOVA model. Most visualizations show a statistical summary of the response variable for each category. However, for small data sets, it can be useful to overlay the raw data. This article shows a simple trick that you can use to