Nested bar charts in SAS

2

After giving a talk about how to create effective statistical graphics in SAS, I was asked a question: "When do you suggest using the graph template language (GTL) to build graphs?" I replied that I turn to the GTL when I cannot create the graph I want by using PROC SGPLOT. A common situation is when I want to overlay two so-called "incompatible" chart types. (An example is overlaying a custom density curve on a histogram.)

The very next day, I stumbled across a blog post by Mike Drutar about how to create a "nested bar chart" by using the SAS Graph builder in SAS Visual Analytics. An example of a nested bar chart is shown to the right. The nested bar chart cannot be created in PROC SGPLOT by using two VBAR statements. That is because one bar chart uses a grouping variable, and the other bar chart does not. The nested bar chart is a good example to demonstrate why SAS graphical programmers sometimes turn to the GTL to create graphs that are not available out-of-the-box by using the SGPLOT procedure.

What is a nested bar chart?

A nested bar chart attempts to visually communicate a two-way analysis of counts (or sums). For example, you might have sales that are measured each quarter, and you want to display both the sales for each quarter and also the sales for the entire year. The variables are "nested" in the sense that each year contains four quarters. The nested bar chart is similar to a mosaic plot or a stacked bar chart. Both of those plots stack the sub-units (for example, quarters) so that the length of the stack represents the accumulated counts (for example, the year). In contrast, the nested bar chart does not stack the sub-units but instead places them side by side.

Let's look at an example. Although I often prefer horizontal bar charts, Drutar's blog post used vertical bar charts, so I will do the same. The following DATA step creates fictional data for quarterly sales for three years. You can use PROC SGPLOT to create a stacked bar chart of the data, as follows:

data Bars;
do Year = 2020 to 2022;
   do Quarter = 1 to 4;
      input Sales @;
      output;
   end;
end;
datalines;
100  87  92 125
118  97 108 153
128 109 105 142
;
 
proc sgplot data=Bars;
   vbar Year / response=Sales group=Quarter groupdisplay=stack seglabel;
yaxis grid;
run;

In contrast, a nested bar chart will nest the four quarters (side by side) for each year inside a larger bar that represents the sum of the quarterly sales. In other words, a nested bar chart overlays the following two charts:


A nested bar chart puts the chart for sales by year in the background. It overlays the chart of sales by quarters in the foreground. You might think that you can simply put two VBAR statements in the same call to PROC SGPLOT, but, unfortunately, the two charts are not compatible. If you submit the following statements, you will get an error:

/* ERROR: you cannot overlay these bar chart by using PROC SGPLOT! */
proc sgplot data=Bars;
   vbar Year / response=Sales;
   vbar Year / response=Sales group=Quarter groupdisplay=cluster clusterwidth=0.8;
run;
ERROR: Once a GROUP variable is used in a categorical chart, that GROUP
       variable must be used in all overlaid charts.

The error message tells you that PROC SGPLOT cannot overlay these two VBAR statements because one chart uses GROUP=Quarter and the other does not. However, as shown in the next section, you can easily overlay the chart in the GTL. (An easier option is to switch to the VBARBASIC statement, which "creates a vertical bar chart that is compatible with other categorization charts." Thanks to KSharp for reminding me of the "BASIC" chart types!)

Create a nested bar chart with GTL: First attempt

When you first create a template for a graph by using the GTL, I strongly suggest that you hard-code the variables for one set of data. After you successfully create the graph, you can modify the template to enable you visualize similar data sets that have different variables. The following call to PROC TEMPLATE creates a template that is designed for the variables in the BARS data set. It overlays two bar charts and a legend for the GROUP= option:

proc template;
define statgraph Nestedbar0;
 begingraph;
 entrytitle "Overlay Two Bar Charts";
 layout overlay / yaxisopts=(griddisplay=on);
   barchart x=YEAR y=SALES / primary=true legendlabel="y" name="annual";
   barchart x=YEAR y=SALES / group=QUARTER name="quarter" groupdisplay=cluster clusterwidth=0.8;
   discretelegend "quarter"/ title="Quarter";
 endlayout;
 endgraph;
end;
run;
 
/* call the template for specific variables */
proc sgrender data=Bars template=Nestedbar0;
run;

The graph is shown at the top of this article. The template overlays the bar charts for four quarters on top of the bar charts for the years. The smaller bar charts indicate a pattern: most sales occur in Q1 and Q4. The heights of the larger bars equal the heights of the smaller bars inside. The larger bars indicate that sales are rising year over year.

Create a nested bar chart with GTL: Generalize the template

If you only need to create one nested bar chart, then you are done. However, it doesn't require much additional effort to generalize the template to add dynamic variables. When you add dynamic variables, you can reuse the template to create nested bar charts for variables in other data sets. The following call to PROC TEMPLATE creates a template that includes dynamic variables and a dynamic title for the graph. The values of these dynamic variables are passed in by using the DYNAMIC statement in PROC SGRENDER.

proc template;
define statgraph Nestedbar;
dynamic _X _Y _GROUP _Title;
 begingraph;
 entrytitle _Title;
 layout overlay / yaxisopts=(griddisplay=on);
   barchart x=_X y=_Y / primary=true legendlabel="y" datatransparency=0.1;
   barchart x=_X y=_Y / group=_GROUP name="groups" 
            groupdisplay=cluster clusterwidth=0.8 datatransparency=0.1;
   discretelegend "groups" / title=_GROUP;  /* or add another DYNAMIC variable for the legend title */
 endlayout;
 endgraph;
end;
run;
 
proc sgrender data=Bars template=nestedbar;
   dynamic _X="Year" _Y="Sales" _GROUP="Quarter"
          _TITLE="Sales by Quarter for Each Year";
run;

Using the nested bar chart template

The new template enables you to create nested bar charts of other variables in other data sets. To ensure that the colors for the clustered bar chart do not vary, you can add the GROUPORDER= option on the BARCHART statement, or simply sort the data before you render the graph. The following statements sort the sashelp.baseball data by the LEAGUE and DIVISION categories before creating a nested bar chart that uses those same variables:

/* You should sort the data by the nested variables */
proc sort data=sashelp.baseball out=baseball;
by League Division;
run;
 
proc sgrender data=baseball template=nestedbar;
   dynamic _X="League" _Y="Salary" _GROUP="Division"
          _TITLE="Salaries of Players by Division for Each League";
run;

Summary

I like to tell people that the SG procedures in SAS (such as PROC SGPLOT) are designed by using the 95-95 rule. By that I mean that the SG procedures provide a syntax that can easily create 95% of the graphs for 95% of the users. Less common graphs may require additional effort and the use of the GTL. This design strategy is used throughout SAS. Simple syntax is provided for common tasks; more complex programming is required for more complex tasks.

Mike Drutar previously showed how to create a nested bar chart by using the SAS Graph builder in SAS Visual Analytics. This article shows how to create a nested bar chart by using the GTL. You cannot create this type of plot by overlaying two bar charts in PROC SGPLOT. However, there are some clever programming tricks that you can use to emulate the graph. If you would like to share a way to create a similar graph by using only PROC SGPLOT, feel free to share your ideas in a comment.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

2 Comments

  1. Rick,
    Actually PROC SGPLOT can overlay these two charts by this:

    proc sgplot data=Bars;
    vbarbasic Year / response=Sales transparency=0.6;
    vbarbasic Year / response=Sales group=Quarter groupdisplay=cluster clusterwidth=0.8;
    run;

Leave A Reply

Back to Top