Did you know that you can embed one graph inside another by using PROC SGPLOT in SAS? A typical example is shown to the right. The large graph shows kernel density estimates for the distribution of the Cholesterol variable among male and female patients in a heart study. The small

## Tag: **Statistical Graphics**

I don't often use the SG annotation facility in SAS for adding annotations to statistical graphics, but when I do, I enjoy the convenience of the SG annotation macros. I can never remember the details of the SG annotation commands, but I know that the SG annotation macros will create

A SAS programmer wanted to use PROC SGPLOT in SAS to visualize a regression model. The programmer wanted to visualize confidence limits for the predicted mean at certain values of the explanatory variable. This article shows two options for adding confidence limits to a scatter plot. You can use a

The acceptance-rejection method (sometimes called rejection sampling) is a method that enables you to generate a random sample from an arbitrary distribution by using only the probability density function (PDF). This is in contrast to the inverse CDF method, which uses the cumulative distribution function (CDF) to generate a random

Since the COVID-19 pandemic began, video presentations and webcasts have become a regular routine for many of us. On days that I will be using my webcam, I wear a solid-color shirt. If I don't plan to be on camera, I can wear a pinstripe Oxford shirt. Why the difference?

Real-world data often exhibits extreme skewness. It is not unusual to have data span many orders of magnitude. Classic examples are the distributions of incomes (impoverished and billionaires) and population sizes (small countries and populous nations). The readership of books and blog posts show a similar distribution, which is sometimes

Labeling objects in graphs can be difficult. SAS has a long history of providing support for labeling markers in scatter plots and for labeling regions on a map. This article discusses how the SGPLOT procedure decides where to put a label for a polygon. It discusses the advantages and disadvantages

In a previous article, I showed how to overlay a density estimate on a histogram by using the Graph Template Language (GTL). However, a SAS programmer asked how to overlay a curve on a histogram when the curve is not a density estimate. In this case, the vertical axis for

When the SAS statistical graphics (SG) procedures were designed in the early 2000s, a goal was to create a comprehensive Graph Template Language (GTL) and leverage the GTL by using SG procedures that perform common tasks easily without having to write any GTL. This project was hugely successful, and "ODS

A previous article discusses how to compute the union, intersection, and other subsets of a pair of sets. In that article, I displayed a simple Venn diagram (reproduced to the right) that illustrates the intersection and difference between two sets. The diagram uses a red disk for one set, a

SAS supports the ColorBrewer system of color palettes from the ColorBrewer website (Brewer and Harrower, 2002). The ColorBrewer color ramps are available in SAS by using the PALETTE function in SAS IML software. The PALETTE function supports all ColorBrewer palettes, but some palettes are not interpretable by people with color

Did you know that about 8% of the world's men are colorblind? (More correctly, 8% of men are "color vision deficient," since they see colors, but not all colors.) Because of the "birthday paradox," in a room that contains eight men, the probability is 50% that at least one is

A previous article shows that you can use the Intercept parameter to control the ratio of events to nonevents in a simulation of data from a logistic regression model. If you decrease the intercept parameter, the probability of the event decreases; if you increase the intercept parameter, the probability of

For Christmas 2021, I wrote an article about palettes of Christmas colors, chiefly shades of red, green, silver, and gold. One of my readers joked that she would like to use my custom palette to design her own Christmas wrapping paper! I remembered her jest when I saw some artwork

A profile plot is a way to display multivariate values for many subjects. The optimal linear profile plot was introduced by John Hartigan in his book Clustering Algorithms (1975). In Michael Friendly's book (SAS System for Statistical Graphics, 1991), Friendly shows how to construct an optimal linear profile by using

A profile plot is a compact way to visualize many variables for a set of subjects. It enables you to investigate which subjects are similar to or different from other subjects. Visually, a profile plot can take many forms. This article shows several profile plots: a line plot of the

A SAS programmer asked how to create a graph that shows whether missing values in one variable are associated with certain values of another variable. For example, a patient who is supposed to monitor his blood glucose daily might have more missing measurements near holidays and in the summer months

I recently showed how to represent positive integers in any base and gave examples of base 2 (binary), base 8 (octal), and base 16 (hexadecimal). One fun application is that you can use base 26 to associate a positive integer to every string of English characters. This article shows how

The Graph Template Language (GTL) is a powerful tool for creating a wide range of graphic displays. One feature GTL has is the ability to combine independent plots together into one paneled display. The SG procedures have some limited capabilities in this area; but in this post, I am going

A SAS programmer was trying to understand how PROC SGPLOT orders categories and segments in a stacked bar chart. As with all problems, it is often useful to start with a simpler version of the problem. After you understand the simpler situation, you can apply that understanding to the more

A SAS programmer asked how to display long labels at irregular locations along the horizontal axis of scatter plot. The labels indicate various phases of a clinical study. This article discusses the problem and shows how to use the FITPOLICY=STAGGER option on the XAXIS or X2AXIS statement to avoid collisions

For a linear regression model, a useful but underutilized diagnostic tool is the partial regression leverage plot. Also called the partial regression plot, this plot visualizes the parameter estimates table for the regression. For each effect in the model, you can visualize the following statistics: The estimate for each regression

The ODS GRAPHICS statement in SAS supports more than 30 options that enable you to configure the attributes of graphs that you create in SAS. Did you know that you can display the current set of graphical options? Furthermore, did you know that you can temporarily set certain options and

When creating bar charts, it is very common to display labels with the bars to make it easier to determine the bar values or to provide additional information in the chart. However, these labels can take away valuable data space, particularly if you generate a smaller-sized graph. As you see

Recently, I showed how to use a heat map to visualize measurements over time for a set of patients in a longitudinal study. The visualization is sometimes called a lasagna plot because it presents an alternative to the usual spaghetti plot. A reader asked whether a similar visualization can be

Oh, no! Your boss just told you to change the way that SAS displays certain features in graphs, such as missing values. But you have a library of hundreds of SAS programs! Do you need to modify all of your previous programs? Fortunately, the answer is no. SAS provides ODS

In an article about how to visualize missing data in a heat map, I noted that the SAS SG procedures (such as PROC SGPLOT) use the GraphMissing style element to color a bar or tile that represents a missing value. In the HTMLBlue ODS style, the color for missing values

Longitudinal data are measurements for a set of subjects at multiple points in time. Also called "panel data" or "repeated measures data," this kind of data is common in clinical trials in which patients are tracked over time. Recently, a SAS programmer asked how to visualize missing values in a

A SAS programmer asked an interesting question: If data in a time series has missing values, can you plot a dashed line to indicate that the response is missing at some times? A simple way to achieve this is by overlaying two lines. The first line (the "bottom" line in

Some colors have names, such as "Red," "Magenta," and "Dark Olive Green." But the most common way to specify a color is to use a hexadecimal value such as CX556B2F. It is not obvious that "Dark Olive Green" and CX556B2F represent the same color, but they do! I like to