How to visualize a kernel density estimate

A kernel density estimate (KDE) is a nonparametric estimate for the density of a data sample. A KDE can help an analyst determine how to model the data: Does the KDE look like a normal curve? Like a mixture of normals? Is there evidence of outliers in the data? In […]
Post a Comment

Color markers in a scatter plot by a third variable in SAS

One of my favorite new features in PROC SGPLOT in SAS 9.4m2 is addition of the COLORRESPONSE= and COLORMODEL= options to the SCATTER statement. By using these options, it is easy to color markers in a scatter plot so that the colors indicate the values of a continuous third variable. […]
Post a Comment

In praise of simple graphics

'Tis a gift to be simple. -- Shaker hymn In June 2015 I published a short article for Significance, a magazine that features statistical and data-related articles that are of general interest to a wide a range of scientists. The title of my article is "In Praise of Simple Graphics." […]
Post a Comment

Use the EFFECTPLOT statement to visualize regression models in SAS

Graphs enable you to visualize how the predicted values for a regression model depend on the model effects. You can gain an intuitive understanding of a model by using the EFFECTPLOT statement in SAS to create graphs like the one shown at the top of this article. Many SAS regression […]
Post a Comment

Overlay plots on a box plot in SAS: Continuous X axis

I have previously shown how to overlay basic plots on box plots when all plots share a common discrete X axis. It is interesting to note that box plots can also be overlaid on a continuous (interval) axis. You often need to bin the data before you create the plot. […]
Post a Comment

Overlay plots on a box plot in SAS: Discrete X axis

Box plots summarize the distribution of a continuous variable. You can display multiple box plots in a single graph by specifying a categorical variable. The resulting graph shows the distribution of subpopulations, such as different experimental groups. In the SGPLOT procedure, you can use the CATEGORY= option on the VBOX […]
Post a Comment

Lasagna plots in SAS: When spaghetti plots don't suffice

Last week I discussed how to create spaghetti plots in SAS. A spaghetti plot is a type of line plot that contains many lines. Spaghetti plots are used in longitudinal studies to show trends among individual subjects, which can be patients, hospitals, companies, states, or countries. I showed ways to […]
Post a Comment

Create spaghetti plots in SAS

What is a spaghetti plot? Spaghetti plots are line plots that involve many overlapping lines. Like spaghetti on your plate, they can be hard to unravel, yet for many analysts they are a delicious staple of data visualization. This article presents the good, the bad, and the messy about spaghetti […]
Post a Comment

How much do New Yorkers tip taxi drivers?

When I read Robert Allison's article about the cost of a taxi ride in New York City, I was struck by the scatter plot (shown at right; click to enlarge) that plots the tip amount against the total bill for 12 million taxi rides. The graph clearly reveals diagonal and […]
Post a Comment

Set attributes of markers in PROC SGPLOT by using ODS style elements

The SG procedures in SAS use aesthetically pleasing default colors, shapes, and styles, but sometimes it is necessary to override the default attributes. The MARKERATTRS= option enables you to override the default colors, symbols, and sizes of markers in scatter plots and other graphs. Similarly, the LINEATTRS= option enables you […]
Post a Comment