Some colors have names, such as "Red," "Magenta," and "Dark Olive Green." But the most common way to specify a color is to use a hexadecimal value such as CX556B2F. It is not obvious that "Dark Olive Green" and CX556B2F represent the same color, but they do! I like to

## Tag: **Statistical Graphics**

*The DO Loop*in 2021

Last year, I wrote almost 100 posts for The DO Loop blog. My most popular articles were about data visualization, statistics and data analysis, and simulation and bootstrapping. If you missed any of these gems when they were first published, here are some of the most popular articles from 2021:

In a previous article, I visualized seven Christmas-themed palettes of colors, as shown to the right. You can see that the palettes include many red, green, and golden colors. Clearly, the colors in the Christmas palettes are not a random sample from the space of RGB colors. Rather, they represent

In data visualization, colors can represent the values of a variable in a choropleth map, a heatmap, or a scatter plot. But how do you visualize a palette of colors from the RGB or hexadecimal values of the colors? One way is to use the HEATMAPDISC subroutine in SAS/IML, which

Given a cloud of points in the plane, it can be useful to identify the convex hull of the points. The convex hull is the smallest convex set that contains the observations. For a finite set of points, it is a convex polygon that has some of the points as

I was recently asked how to create a frequency polygon in SAS. A frequency polygon is an alternative to a histogram that shows similar information about the distribution of univariate data. It is the piecewise linear curve formed by connecting the midpoints of the tops of the bins. The graph

A SAS programmer asked whether it is possible to add reference lines to the categorical axis of a bar chart. The answer is yes. You can use the VBAR statement, but I prefer to use the VBARBASIC (or VBARPARM) statement, which enables you to overlay a wide variety of graphs

Graphing data is almost always more informative than displaying a table of summary statistics. In a recent article about "dynamite plots," I briefly mentioned that graphs such as box plots and strip plots are better at showing data than graphs that merely show the mean and standard deviation. This article

A statistical programmer asked how to simulate event-trials data for groups. The subjects in each group have a different probability of experiencing the event. This article describes one way to simulate this scenario. The simulation is similar to simulating from a mixture distribution. This article also shows three different ways

A colleague spent a lot of time creating a panel of graphs to summarize some data. She did not use SAS software to create the graph, but I used SAS to create a simplified version of her graph, which is shown to the right. (The colors are from her graph.)

A colleague spent a lot of time creating a panel of graphs to summarize some data. She did not use SAS software to create the graph, but I used SAS to create a simplified version of her graph, which is shown to the right. (The colors are from her graph.)

This article shows how to create a "sliced survival plot" for proportional-hazards models that are created by using PROC PHREG in SAS. Graphing the result of a statistical regression model is a valuable way to communicate the predictions of the model. Many SAS procedures use ODS graphics to produce graphs

In a previous article, I discussed a beautiful painting called "Phantomâ€™s Shadow, 2018" by the Nigerian-born artist, Odili Donald Odita. I noted that if you overlay a 4 x 4 grid on the painting, then each cell contains a four-bladed pinwheel shape. The cells display rotations and reflections of the pinwheel. The

This article shows how to estimate and visualize a two-dimensional cumulative distribution function (CDF) in SAS. SAS has built-in support for this computation. Although the bivariate CDF is not used as much as the univariate CDF, the bivariate version is still a useful tool in understanding the probable values of

It can be frustrating to receive an error message from statistical software. In the early days of the SAS statistical graphics (SG) procedures, an error message that I dreaded was ERROR: Attempting to overlay incompatible plot or chart types. This error message appears when you attempt to use PROC SGPLOT

Most introductory statistics courses introduce the bar chart as a way to visualize the frequency (counts) for a categorical variable. A vertical bar chart places the categories along the horizontal (X) axis and shows the counts (or percentages) on the vertical (Y) axis. The vertical bar chart is a precursor

A previous article discusses how to interpret regression diagnostic plots that are produced by SAS regression procedures such as PROC REG. In that article, two of the plots indicate influential observations and outliers. Intuitively, an observation is influential if its presence changes the parameter estimates for the regression by "more

When you fit a regression model, it is useful to check diagnostic plots to assess the quality of the fit. SAS, like most statistical software, makes it easy to generate regression diagnostics plots. Most SAS regression procedures support the PLOTS= option, which you can use to generate a panel of

This article shows how to use PROC SGPLOT in SAS to create the scatter plot shown to the right. The scatter plot has the following features: The colors of markers are determined by the value of a third variable. The outline of each marker is the same color (such as

I recently learned about a new feature in PROC QUANTREG that was added in SAS/STAT 15.1 (part of SAS 9.4M6). Recall that PROC QUANTREG enables you to perform quantile regression in SAS. (If you are not familiar with quantile regression, see an earlier article that describes quantile regression and provides

I recently wrote about a simple statistical formula that approximates the wind chill temperature, which is the cumulative effect of air temperature and wind on the human body. The formula uses two independent variables (air temperature and wind speed) to predict the wind chill temperature. This article describes how to

In cold and blustery conditions, the weather forecast often includes two temperatures: the actual air temperature and the wind chill temperature. The wind chill temperature conveys the cumulative effect of air temperature and wind on the human body. The goal of the wind-chill scale is to communicate the effect of

A previous article describes how to use the SGPANEL procedure to visualize subgroups of data. It focuses on using headers to display information about each graph. In the example, the data are time series for the price of several stocks, and the headers include information about whether the stock price

Many characteristics of a graph are determined by the underlying data at run time. A familiar example is when you use colors to indicate different groups in the data. If the data have three groups, you see three colors. If the data have four groups, you see four colors. The

Have you ever heard of the DOLIST syntax? You might know the syntax even if you are not familiar with the name. The DOLIST syntax is a way to specify a list of numerical values to an option in a SAS procedure. Applications include: Specify the end points for bins

*The DO Loop*in 2020

Last year, I wrote more than 100 posts for The DO Loop blog. In previous years, the most popular articles were about SAS programming tips, statistical analysis, and data visualization. But not in 2020. In 2020, when the world was ravaged by the coronavirus pandemic, the most-read articles were related

I previously showed how to create a decile calibration plot for a logistic regression model in SAS. A decile calibration plot (or "decile plot," for short) is used in some fields to visualize agreement between the data and a regression model. It can be used to diagnose an incorrectly specified

I have previously written about how to plot a discontinuous function in SAS. That article shows how to use the GROUP= option on the SERIES statement to graph a discontinuous function. An alternative approach is to place a missing value for the Y variable at the locations at which the

The REFLINE statement in PROC SGPLOT is one of my favorite ways to augment statistical graphics such as scatter plots, series plots, and histograms. The REFLINE statement overlays a vertical or horizontal reference line on a graph. You can specify the location of the reference lines on the REFLINE statement.

The triangulation theorem for polygons says that every simple polygon can be triangulated. In fact, if the polygon has V vertices, you can decompose it into V-2 non-overlapping triangles. In this article, a "polygon" always means a simple polygon. Also, a "random point" means one that is drawn at random