Clinical graphs: Waterfall plot ++


Waterfall plots have gained in popularity as a means to visualize the change in tumor size for subjects in a study.  The graph displays the reduction in tumor size in ascending order with the subjects with the most reduction on the right.  Each subject is represented by a bar classified by the treatment.  The type of response is often shown at the end of the bar, such as CR - Complete Response, PD - Progressive Disease etc.  See this PharmaSUG paper for complete information.  Ways to create such plots using SGPLOT procedure are presented in the referenced paper, in a previous article in this blog and also in my book "Clinical Graphs using SAS".  The graph is shown below.

Recently, an example of a 3D Waterfall plot was sent to me by a SAS user.  The user indicated that such graphs (shown below) are being requested due to the ability to display more information in the same graph.

Reviewing the 3D graph provided by the user, certain aspects of the graph became evident.  While the 3D visual is certainly an eye-catcher, there may be some significant drawbacks in this graph:

  • The data depicted is really not 3D in nature.  There is only one independent variable in the data, that is the subject id.  These are placed along the bottom sorted by the change in tumor size.  I will call this the x-axis.
  • There are multiple measures being displayed by subject id.  The tumor size is displayed on the vertical axis (z-axis).  The  duration of treatment is displayed along the axis going into the page (y-axis).
  • Some additional indicators are also plotted on the duration bars indicating subjects who discontinued.  There are some other indicators that are not very clear.
  • Some bars can be (and are) occluded behind other bars.
  • The x-axis does not show the subject id, but instead some other classification (maybe type).  It is better to move these indicators closer to the bar ends.
  • It is hard to line up the x-axis values with the tumor size bars and also the duration bars.
  • There is a lot of wasted "blue" space in this visual.
  • This visual uses perspective projection, so it is harder to visually compare the bar lengths.
  • I don't know what the red dot in the middle is for, or what it is aligned with.

Since there is only one independent variable (SubjectId), it is possible to display all the necessary information in a 2D visual, as shown below.  The visual is very clean and easy to understand, and shows all the information in the 3D graph.  Let us keep in mind my data is purely simulated using random number functions.

The subjects are again displayed along the horizontal x-axis in increasing order of reduction in tumor size.  In this case, we have extended the original graph as follows:

  • Display the duration of treatment for each subject in the upper part of the graph, each bar is correctly aligned with the subject id, and very easy to see.
  • The actual duration in days can easily be displayed on top of the duration bar.
  • A red star is displayed in the upper bar to indicate subjects that discontinued.
  • Subdued blue alternate bands are displayed to help the eye line up the bars.  These can be visible or not based on your screen settings and can be adjusted or removed.
  • Additional response data can easily be incorporated as additional plot elements above or below this visual.
  • Additional  measures can be easily overlaid on either the tumor reduction, or duration bars.

In general, when there is only one independent variable in the data, displaying the multiple responses in a 2D graphs is very effective.  Magnitude of the measures can be correctly compared, and additional indicators can be placed near the relevant item for easier decoding of the data.

In this graph, the tumor response bars are colored by treatment, but could also be colored by the dosage or other measures.

SGPLOT Program:

title 'Change in Tumor Size';
title2 'ITT Population';
proc sgplot data=TumorSizeSort nowall noborder nocycleattrs;
  styleattrs datacolors=(cxbf0000 cx4f4f4f) datacontrastcolors=(black) axisextent=data;
  symbolchar name=mystar char='002a'x / voffset=-0.5 scale=3;
  vbarparm category=cid response=change / group=group datalabel=label dataskin=pressed
                 datalabelattrs=(size=5 weight=bold) groupdisplay=cluster clusterwidth=1;
  vbarparm category=cid response=duration / datalabel=duration y2axis dataskin=pressed
                 datalabelattrs=(size=5 weight=bold) groupdisplay=cluster clusterwidth=1
  scatter x=cid y=drop / y2axis markerattrs=(symbol=mystar color=red size=10);
  refline 20 -30 / lineattrs=(pattern=shortdash);
  xaxis display=none colorbands=odd colorbandsattrs=(transparency=0.6);
  yaxis values=(60 to -100 by -20) offsetmax=0.45 labelpos=datacenter offsetmin=0;
  y2axis offsetmin=0.6 offsetmax=0.02 labelpos=datacenter;
  inset ("C="="CR" "R="="PR" "S="="SD" "P="="PD" "N="="NE") / title='BCR'
            position=bottomleft border textattrs=(size=6 weight=bold) titleattrs=(size=7);
  keylegend / title='' border;


With SGPLOT, it is possible to create this graph with two data areas as shown here, using the y and Y2 axes. It would be better to created this graph using GTL.  GTL provides us extended functionality to create multiple data areas with axis alignment.  This will allows us to place both the y axes on the left (or right), and add more data displays to include additional relevant data.

Full code:  WaterFall


About Author

Sanjay Matange

Director, R&D

Sanjay Matange is R&D Director in the Data Visualization Division responsible for the development and support of the ODS Graphics system, including the Graph Template Language (GTL), Statistical Graphics (SG) procedures, ODS Graphics Designer and related software. Sanjay has co-authored a book on SG Procedures with SAS/PRESS.

Related Posts

1 Comment

Leave A Reply

Back to Top