Boxplot with connect


This blog post is motivated by a post by a user on the communities page about creating a box plot with colored boxes by category and multiple connect lines.

Box_ConnectNormally, a box plot can be drawn by category, with a single connect line for one of the statistical values of the box plot, say mean or median as shown in the graph on the right.  This is very straightforward, and supported by the SGPLOT procedure and GTL.  The SGPLOT code for this use case is shown below.  We have used the VBOX statement, with CONNECT=mean.  The connect line joins the specified statistic across all the categories for a group.

title 'Cholesterol by Cause of Death';
  proc sgplot data=sashelp.heart noautolegend ;
  vbox cholesterol / category=deathcause connect=mean;
  xaxis display=(nolabel);

Box_Group_Connect_2If there is more than one group, the values are connected by group as shown on the right.  Here we have used GROUP=sex, resulting in a box plot with Male and Female boxes by Death Cause.  The mean values of the boxes are connected by group.  The boxes and connect lines are colored by sex, as shown in the legend.

The unique use case the user had was that he wants the boxes displayed by category without groups, but each box is colored by the category variable.

Box_GroupThis can be achieved by setting the GROUP='category variable', resulting in a graph where the boxes are colored by the category, but really because the group role is used.  In the graph on the right, we have set GROUP=DeathCause, the same variable as the CATEGORY role.  This colors the boxes by category.  We have used CONNECT=mean, but no connect line is shown.  This is due to the fact that each of the boxes belongs to a different group, and there is only one of each.  So, no connect line is possible.

title 'Cholesterol by Cause of Death';
proc sgplot data=sashelp.heart noautolegend noborder;
  vbox cholesterol / category=deathcause group=deathcause  connect=mean;
  xaxis display=(nolabel);

The user not only wants a connect line in this case, but actually wants to see the connect line for multiple statistics, say "Mean" and "Median".  This is now getting beyond what the SGPLOT procedure can do with simple options.  Now, we need to draw the boxes and the connect lines ourselves using either an overlay of a SERIES plot (SAS 9.4M1) or SGAnnotate (SAS 9.3 +).

DataUsing SAS 9.4M1, we have processed the data using the MEANS Procedure to compute the Mean and Median statistics by DeathCause.  Then, we have merged this summary data into the original data, and created additional columns.  The last few observations are shown on the right.  The merged data contains all the original observations for Death Cause and Cholesterol, with missing "Mean" and "Median".  Then, we have 5 additional observations for the Mean and Median columns with missing Cholesterol.

Box_Group_Multi_ConnectWe can use this data set to plot the box plot of cholesterol by Deathcause, and overlay that with two SERIES plots, one for Mean and one for Median with different line attributes.

Note how the boxes are displayed colored by death cause.  Additionally, we have a "connect" of the "Mean" and the "Median" values using the SERIES plot.  The legend for the connect lines is displayed inside the plot area.

title 'Cholesterol by Cause of Death';
proc sgplot data=heart2 noautolegend noborder;
  vbox cholesterol / category=deathcause group=deathcause;
  series x=deathcause y=mean / name='mean' legendlabel='Mean';
  series x=deathcause y=median / lineattrs=(pattern=dash) name='median' legendlabel='Median';
  keylegend "mean" "median" / linelength=32 location=inside across=1 position=topright;
  xaxis display=(nolabel);

SAS 9,40M1 allows the overlaying of "Basic" plots with a VBOX statement.  Prior to SAS 9.40M1, overlay of basic plots on a VBOX was disallowed.  This restriction was removed for SAS 9.40M1 expressly because many users want to overlay detailed data on a box plot, such as the actual observations themselves as in the Margin Plot example.

If you have an earlier SAS 9.3 + version, you can do the same by using SGANNOTATE to draw the connect lines.  See Dan's paper on SGAnnotate, or Warren's  Advanced ODS Graphics Examples.

FullSAS 9.40M1 code: Box_Connect


About Author

Sanjay Matange

Director, R&D

Sanjay Matange is R&D Director in the Data Visualization Division responsible for the development and support of the ODS Graphics system, including the Graph Template Language (GTL), Statistical Graphics (SG) procedures, ODS Graphics Designer and related software. Sanjay has co-authored a book on SG Procedures with SAS/PRESS.

Related Posts

Back to Top