The purpose of this article is to show how to use SAS to create a graph that illustrates a basic idea in a binary classification analysis, such as discriminant analysis and logistic regression. The graph, shown at right, shows two populations. Subjects in the "negative" population do not have some

## Tag: **Statistical Graphics**

A SAS programmer wanted to create a graph that illustrates how Deming regression differs from ordinary least squares regression. The main idea is shown in the panel of graphs below. The first graph shows the geometry of least squares regression when we regress Y onto X. ("Regress Y onto X"

In my book Simulating Data with SAS, I show how to use a graphical tool, called the moment-ratio diagram, to characterize and compare continuous probability distributions based on their skewness and kurtosis (Wicklin, 2013, Chapter 16). The idea behind the moment-ratio diagram is that skewness and kurtosis are essential for

Did you add "learn something new" to your list of New Year's resolutions? Last week, I wrote about the most popular articles from The DO Loop in 2019. The most popular articles are about elementary topics in SAS programming or univariate statistics because those topics have broad appeal. Advanced topics

Many SAS procedures can automatically create a graph that overlays multiple prediction curves and their prediction limits. This graph (sometimes called a "fit plot" or a "sliced fit plot") is useful when you want to visualize a model in which a continuous response variable depends on one continuous explanatory variable

*The DO Loop*in 2019

Last year, I wrote more than 100 posts for The DO Loop blog. The most popular articles were about SAS programming tips for data analysis, statistical analysis, and data visualization. Here are the most popular articles from 2019 in each category. SAS programming tips Create training, testing, and validation data

A 2-D "bin plot" counts the number of observations in each cell in a regular 2-D grid. The 2-D bin plot is essentially a 2-D version of a histogram: it provides an estimate for the density of a 2-D distribution. As I discuss in the article, "The essential guide to

Recently I showed how to visualize and analyze longitudinal data in which subjects are measured at multiple time points. A very common situation is that the data are collected at two time points. For example, in medicine it is very common to measure some quantity (blood pressure, cholesterol, white-blood cell

This is a second article about analyzing longitudinal data, which features measurements that are repeatedly taken on subjects at several points in time. The previous article discusses a response-profile analysis, which uses an ANOVA method to determine differences between the means of an experimental group and a placebo group. The

My colleague, Mike Drutar, recently showed how to create a "strip plot" that shows the distribution of temperatures for each calendar month at a particular location. Mike created the strip plot in SAS Visual Analytics by using a point-and-click interface. This article shows how to create a similar graph by

Understanding multivariate statistics requires mastery of high-dimensional geometry and concepts in linear algebra such as matrix factorizations, basis vectors, and linear subspaces. Graphs can help to summarize what a multivariate analysis is telling us about the data. This article looks at four graphs that are often part of a principal

Computing rates and proportions is a common task in data analysis. When you are computing several proportions, it is helpful to visualize how the rates vary among subgroups of the population. Examples of proportions that depend on subgroups include: Mortality rates for various types of cancers Incarceration rates by race

The EFFECT statement is supported by more than a dozen SAS/STAT regression procedures. Among other things, it enables you to generate spline effects that you can use to fit nonlinear relationships in data. Recently there was a discussion on the SAS Support Communities about how to interpret the parameter estimates

It is always great to read an old paper or blog post and think, "This task is so much easier in SAS 9.4!" I had that thought recently when I stumbled on a 2007 paper by Wei Cheng titled "Graphical Representation of Mean Measurement over Time." A substantial portion of

One of the strengths of the SAS/IML language is its flexibility. Recently, a SAS programmer asked how to generalize a program in a previous article. The original program solved one optimization problem. The reader said that she wants to solve this type of problem 300 times, each time using a

In a scatter plot that displays many points, it can be important to visualize the density of the points. Scatter plots (indeed, all plots that show individual markers) can suffer from overplotting, which means that the graph does not indicate how many observations are at a specific (x, y) location.

I often use axis tables in PROC SGPLOT in SAS to add a table of text to a graph so that the table values are aligned with the data. But axis tables are not the only way to display tabular data in a graph. You can also use the TEXT

The TEXT statement in PROC SGPLOT supports the ROTATE= option to rotate the specified text. It is worth knowing how the ROTATE= option interacts with the POSITION= option, which determines the anchor point at which the text is positioned. Briefly, the text is positioned FIRST, then the rotation occurs. The

A SAS programmer asked an intriguing question on the SAS Support Communities: Can you use SAS to create a graph that shows how the elements in a box-and-whiskers plot relate to the data? The SAS documentation has several examples that explain how to read a box plot. One of the

Heat maps have many uses. You can use a heat map to visualize correlation matrices, to visualize longitudinal data ("lasagna plots"), and to visualize counts in any two-dimensional table. As of SAS 9.4m3, you can create heat maps in SAS by using the HEATMAP and HEATMAPPARM statements in PROC SGPLOT.

I recently showed how to create an annotation data set that will overlay cell counts or percentages on a mosaic plot. A mosaic plot is a visual representation of a cross-tabulation of observed frequencies for two categorical variables. The mosaic plot with cell counts is shown to the right. The

The mosaic plot is a graphical visualization of a frequency table. In previous articles, I showed how to create a mosaic plot in SAS by using PROC FREQ and how to define a template in the Graph Template Language (GTL) by using the MOSAICPARM statement. This article shows how to

Math and statistics are everywhere, and I always rejoice when I spot a rather sophisticated statistical idea "in the wild." For example, I am always pleased when I see a graph that shows the distribution of race times in a typical race (such as a 5K), as shown to the

When fitting a least squares regression model to data, it is often useful to create diagnostic plots of the residuals versus the explanatory variables. If the model fits the data well, the plots of the residuals should not display any patterns. Systematic patterns can indicate that you need to include

A family of curves is generated by an equation that has one or more parameters. To visualize the family, you might want to display a graph that overlays four of five curves that have different parameter values, as shown to the right. The graph shows members of a family of

Statistical programmers and analysts often use two kinds of rectangular data sets, popularly known as wide data and long data. Some analytical procedures require that the data be in wide form; others require long form. (The "long format" is sometimes called "narrow" or "tall" data.) Fortunately, the statistical graphics procedures

Knowing how to visualize a regression model is a valuable skill. A good visualization can help you to interpret a model and understand how its predictions depend on explanatory factors in the model. Visualization is especially important in understanding interactions between factors. Recently I read about work by Jacob A.

An analyst was using SAS to analyze some data from an experiment. He noticed that the response variable is always positive (such as volume, size, or weight), but his statistical model predicts some negative responses. He posted the data and asked if it is possible to modify the graph so

A previous article shows how to use a scatter plot to visualize the average SAT scores for all high schools in North Carolina. The schools are grouped by school districts and ranked according to the median value of the schools in the district. For the school districts that have many

Standardized tests like the SAT and ACT can cause stress for both high school students and their parents, but according to a Wall Street Journal article, the SAT and ACT "provide an invaluable measure of how students are likely to perform in college and beyond." Naturally, students wonder how their