Happy Pi Day! Every year on March 14th (written 3/14 in the US), people in the mathematical sciences celebrate all things pi-related because 3.14 is the three-decimal approximation to π ≈ 3.14159265358979.... Pi is a mathematical constant defined as the ratio of a circle's circumference (C) to its diameter (D).
Tag: Statistical Graphics
I sometimes see analysts overuse colors in statistical graphics. My rule of thumb is that you do not need to use color to represent a variable that is already represented in a graph. For example, it is redundant to use a continuous color ramp to represent the lengths of bars
In a previous article, I presented some of the most popular blog posts from 2023. The popular articles tend to discuss elementary topics that have broad appeal. However, I also wrote many technical articles about advanced topics. The following articles didn't make the Top 10 list, but they deserve a
In 2023, I wrote 90 articles for The DO Loop blog. My most popular articles were about SAS programming, data visualization, and statistics. In addition, several "general interest" articles were popular, including my article for Pi Day and an article about AI chatbots. If you missed any of these articles,
These are a few of my favorite things. —Maria in The Sound of Music For my annual Christmas-themed post, I decided to forgo fractal Christmas trees and animated greeting cards and instead present a compilation of some of my favorite data visualization tips for advanced SAS users. Hopefully, this
Plot rates, not counts. This maxim is often stated by data visualization experts, but often ignored by practitioners. You might also hear the related phrases "plot proportions" or "plot percentages," which mean the same thing but expresses the idea alliteratively. An example in a previous article about avoiding alphabetical ordering
Howard Wainer, who used to write the "Visual Revelations" column in Chance magazine, often reminded his readers that "we are almost never interested in seeing Alabama first" (2005, Graphic Discovery, p. 72). His comment is a reminder that when we plot data for a large number of categories (states, countries,
Sometimes it is helpful to display a table of statistics directly on a graph. A simple example is displaying the number of observations and the mean or median on a histogram. In SAS, the term inset is used to describe a table that is displayed on a graph. This article
Did you know that you can embed one graph inside another by using PROC SGPLOT in SAS? A typical example is shown to the right. The large graph shows kernel density estimates for the distribution of the Cholesterol variable among male and female patients in a heart study. The small
I don't often use the SG annotation facility in SAS for adding annotations to statistical graphics, but when I do, I enjoy the convenience of the SG annotation macros. I can never remember the details of the SG annotation commands, but I know that the SG annotation macros will create
A SAS programmer wanted to use PROC SGPLOT in SAS to visualize a regression model. The programmer wanted to visualize confidence limits for the predicted mean at certain values of the explanatory variable. This article shows two options for adding confidence limits to a scatter plot. You can use a
The acceptance-rejection method (sometimes called rejection sampling) is a method that enables you to generate a random sample from an arbitrary distribution by using only the probability density function (PDF). This is in contrast to the inverse CDF method, which uses the cumulative distribution function (CDF) to generate a random
Since the COVID-19 pandemic began, video presentations and webcasts have become a regular routine for many of us. On days that I will be using my webcam, I wear a solid-color shirt. If I don't plan to be on camera, I can wear a pinstripe Oxford shirt. Why the difference?
Real-world data often exhibits extreme skewness. It is not unusual to have data span many orders of magnitude. Classic examples are the distributions of incomes (impoverished and billionaires) and population sizes (small countries and populous nations). The readership of books and blog posts show a similar distribution, which is sometimes
Labeling objects in graphs can be difficult. SAS has a long history of providing support for labeling markers in scatter plots and for labeling regions on a map. This article discusses how the SGPLOT procedure decides where to put a label for a polygon. It discusses the advantages and disadvantages
In a previous article, I showed how to overlay a density estimate on a histogram by using the Graph Template Language (GTL). However, a SAS programmer asked how to overlay a curve on a histogram when the curve is not a density estimate. In this case, the vertical axis for
When the SAS statistical graphics (SG) procedures were designed in the early 2000s, a goal was to create a comprehensive Graph Template Language (GTL) and leverage the GTL by using SG procedures that perform common tasks easily without having to write any GTL. This project was hugely successful, and "ODS
A previous article discusses how to compute the union, intersection, and other subsets of a pair of sets. In that article, I displayed a simple Venn diagram (reproduced to the right) that illustrates the intersection and difference between two sets. The diagram uses a red disk for one set, a
SAS supports the ColorBrewer system of color palettes from the ColorBrewer website (Brewer and Harrower, 2002). The ColorBrewer color ramps are available in SAS by using the PALETTE function in SAS IML software. The PALETTE function supports all ColorBrewer palettes, but some palettes are not interpretable by people with color
Did you know that about 8% of the world's men are colorblind? (More correctly, 8% of men are "color vision deficient," since they see colors, but not all colors.) Because of the "birthday paradox," in a room that contains eight men, the probability is 50% that at least one is
A previous article shows that you can use the Intercept parameter to control the ratio of events to nonevents in a simulation of data from a logistic regression model. If you decrease the intercept parameter, the probability of the event decreases; if you increase the intercept parameter, the probability of
For Christmas 2021, I wrote an article about palettes of Christmas colors, chiefly shades of red, green, silver, and gold. One of my readers joked that she would like to use my custom palette to design her own Christmas wrapping paper! I remembered her jest when I saw some artwork
A profile plot is a way to display multivariate values for many subjects. The optimal linear profile plot was introduced by John Hartigan in his book Clustering Algorithms (1975). In Michael Friendly's book (SAS System for Statistical Graphics, 1991), Friendly shows how to construct an optimal linear profile by using
A profile plot is a compact way to visualize many variables for a set of subjects. It enables you to investigate which subjects are similar to or different from other subjects. Visually, a profile plot can take many forms. This article shows several profile plots: a line plot of the
A SAS programmer asked how to create a graph that shows whether missing values in one variable are associated with certain values of another variable. For example, a patient who is supposed to monitor his blood glucose daily might have more missing measurements near holidays and in the summer months
I recently showed how to represent positive integers in any base and gave examples of base 2 (binary), base 8 (octal), and base 16 (hexadecimal). One fun application is that you can use base 26 to associate a positive integer to every string of English characters. This article shows how
The Graph Template Language (GTL) is a powerful tool for creating a wide range of graphic displays. One feature GTL has is the ability to combine independent plots together into one paneled display. The SG procedures have some limited capabilities in this area; but in this post, I am going
A SAS programmer was trying to understand how PROC SGPLOT orders categories and segments in a stacked bar chart. As with all problems, it is often useful to start with a simpler version of the problem. After you understand the simpler situation, you can apply that understanding to the more
A SAS programmer asked how to display long labels at irregular locations along the horizontal axis of scatter plot. The labels indicate various phases of a clinical study. This article discusses the problem and shows how to use the FITPOLICY=STAGGER option on the XAXIS or X2AXIS statement to avoid collisions
For a linear regression model, a useful but underutilized diagnostic tool is the partial regression leverage plot. Also called the partial regression plot, this plot visualizes the parameter estimates table for the regression. For each effect in the model, you can visualize the following statistics: The estimate for each regression