Getting started with SGPLOT - Part 3 - VBOX

This is the 3rd installment of the Getting Started series, and the audience is the user who is new to the SG Procedures.  Experienced users may also find some useful nuggets here.

box_key_sgThe Tukey box plot is popular among statisticians for viewing the distribution of an analysis variable with or without classifiers.  The figure on the right is from the SGPLOT Box Plot documentation showing all the features of the box.

The code shown below creates the simplest box plot graph which displays the distribution of the analysis variable Cholesterol.

title 'Distribution of Cholesterol';
proc sgplot data=sashelp.heart;
  vbox cholesterol;
run;

vbox1The graph on the right shows the results of the procedure step above and displays a box for the variable Cholesterol.  The display includes a box spanning the Q1-Q3 inter-quartile range, with a line drawn at the median value.  A marker is used to display the mean value.  Whiskers are drawn to the observation nearest to the "Fence" as defined in the doc mentioned above, and "outlier" observations are displayed above and below the fences.   See the online documentation for the GTL Box Plot for all the details of the various statistics that are displayed.

Box Plot by Category:  The code below creates a box plot graph by a category variable - DeathCause.  Note, we have used the XAXIS statement to remove the display of the label name on the axis.

vboxbycattitle 'Distribution of Cholesterol by Death Cause';
proc sgplot data=sashelp.heart;
  vbox cholesterol / category=deathcause;
  xaxis display=(nolabel);
run;

The graph on the right displays the distribution of the cholesterol values by death cause.  Note, by default the graph will try to split long axis tick values at the "white space" in the value.

vboxbycatconnectConnect:  A connect line is drawn connecting the mean statistic across the categories using the CONNECT=mean option.  The connect line can connect any statistic like mean, median, Q1, Q3 etc.

For this graph, we have also simplified the layout by dropping the frame border of the wall, the axis lines, and added y-axis grids.  This presents the data in an alternative visual manner that reduces clutter and is pleasing to the eye.  A DATASKIN is set for visual effect.

title 'Distribution of Cholesterol by Death Cause';
proc sgplot data=sashelp.heart noborder;
  vbox cholesterol / category=deathcause
            connect=mean fillattrs=graphdata3
            dataskin=gloss;
  xaxis display=(noline nolabel noticks);
  yaxis display=(noline noticks nolabel) grid;
run;

vboxbycatgroup_2Grouped Box Plot:  One additional classifier can be added - GROUP.  The graph on the right displays the distribution of Cholesterol by death cause and sex.  This is a common graph type useful in the Clinical Research domain where we want to view the results by category and treatment.

title 'Distribution of Cholesterol by Death Cause';
proc sgplot data=sashelp.heart noborder;
  vbox cholesterol / category=deathcause
          group=sex clusterwidth=0.5
         boxwidth=0.8 meanattrs=(size=5)
         outlierattrs=(size=5);
  xaxis display=(noline nolabel noticks);
  yaxis display=(noline noticks nolabel) grid;
run;

Cluster width can be set to make the cluster of boxes for each category tighter.  Here we have set CLUSTERWIDTH=0.5, making the boxes for each category are more tightly packed.  BOXWIDTH can also be used to make the individual boxed narrower or wider.  BOXWIDTH=1 will make the boxes within each cluster touch.  Attributes for the mean marker and outlier markers can be set using the appropriate ATTRS option.

vboxbycatnotchNotches:  Notches can be displayed by using the NOTCH option.  The graph on the right shows the result of the program shown below.  Notches are displayed and the box width is reduced to 20% of the available spacing.  The whisker cap is removed by setting CAPSHAPE.

title 'Distribution of Cholesterol by Death Cause';
proc sgplot data=sashelp.heart noborder;
  vbox cholesterol / category=deathcause
            boxwidth=0.2 meanattrs=(size=6)
            notches capshape=none ;
  xaxis display=(noline nolabel noticks);
  yaxis display=(noline noticks nolabel) grid;
run;

vboxbycatpctWhisker Percentile:  The graph on the right shows how to control the whisker percentile.  This is popular option requested by many users.  WHISKER=value (0-25) can be used to set the length of the whisker as a percentile.  WHISKER=1 creates a graph with 99% Whisker percentile.

By default, the box plot makes the category axis discrete.  This happens even if the category variable is numeric or time.  There are many cases where we want to see the distribution of some variable by a numeric x variable, such as weeks or over time.  In such cases, we want the boxes to be positioned on the x-axis with the correct scale.  This is supported and can be done by setting TYPE=LINEAR on the x-axis.  We will discuss this in more detail in a subsequent article.

Full SAS Code: getting_started_3_vbox_3

Post a Comment

Getting started with SGPLOT - Index

Index of articles on "Getting Started with SGPLOT Procedure".

  1.   Getting Started with SGPLOT - Part 1 - Scatter Plot.
  2.   Getting Started with SGPLOT - Part 2 - VBAR.
  3.   Getting Started with SGPLOT - Part 3 - VBOX.
Post a Comment

Mixing plots with different classification

One of the key benefits of creating graphs using GTL or SG Procedures is their support of plot layering to create complex graphs and layouts.  Most simple graphs can be created by a single plot statement like a Bar Chart.  Complex graphs can be created by layering appropriate plot statements to add the complexity needed like a Swimmer Plot.

When creating graphs with multiple VBAR statements, we sometimes run into a limitation on how VBAR statements can be layered.  In general, VBAR and VLINE statements can be layered only when all the layered statements have the same category variable.  If a group classification is in effect, all statements must have the same group variable.  So, it is not possible to layer VBAR and VLINE statements that have different category or group classification.

shoes_4Consider the example on the right.  This graph has a bar chart of Total Sales by Subsidiary for Canada.  Each subsidiary has multiple observations for the type of shoes sold, as seen by the table under the bar chart.

One would expect we could simply layer an XAXISTABLE with the VBAR statement to create such a graph.   The code is shown below.  Some options are thinned to fit.  See the linked code below for all the details.

title "Total Sales for &region by Subsidiary";
proc sgplot data=sashelp.shoes;
  vbar subsidiary /response=sales;
  xaxistable sales / x=subsidiary class=product;
  xaxis display=(nolabel noline);
  yaxis display=(noline) grid;
run;

shoes_2Unfortunately, this will not produce the desired results.  The reason is that while the VBAR statement has only one classifier (subsidiary), the XAXISTABLE has two classifiers, x=subsidiary and class=product. Each bar shows the summarized value of sales as one bar per subsidiary.  When you submit the code above, you will get the following warning in the log, and the graph on the right is produced.

WARNING: The CLASS option is ignored when the axis table is used with bar charts, line charts, or dot plots. The GROUP option from these charts is used as the CLASS variable for the axis table.

The CLASS option on the XAXISTABLE is ignored by the procedure, so the table has only one row of data, which is the summarized value for each bar.  Also, "x=subsidiary" is not required for the axis table as it is default.  I use it here for clarity.  I also used a different color for the graph just for variety.

So, how do we get around this to create the graph shown at the top?

The VBARBASIC statement was released with SAS 9.40M3 to address such use cases.  The VBAR statement does its own data processing to support additional features.  This requires that all layers used with VBAR have the same set of classifiers.  But the underlying GTL BarChart statement does not have such restrictions.  So, we decided to surface a way directly to the GTL BarChart using the VBARBASIC statement.  VBARBASIC can still summarize the data by subsidiary and does not have any restrictions on layering with other statements with different classifications.  The code is shown below.  Some options are thinned to fit.

shoes_4title "Total Sales for &region by Subsidiary";
proc sgplot data=shoes;
  vbarbasic subsidiary /response=sales;
  xaxistable sales / x=subsidiary class=product;
  xaxis display=(nolabel noline);
  yaxis display=(noline) grid;
run;

Now, the VBarBasic statement displays the bars summarized by subsidiary, while the XAxisTable can display the detailed information by subsidiary and product.

productAnother example of layering plot statements with different classifiers is shown on the right.  Here, we have displayed the summarized Actual Sales by Product shown by the blue bars.  On that, we have overlaid a graph of the Actual Sales by Product and Quarter.  This is possible using the VBARBASIC statement.  All the values by quarter add up to the total shown by the blue bar.  This allows us to compare across products, and by quarter within each product.

title "Actual Sales";
proc sgplot data=sashelp.prdsale noborder nocycleattrs;
  vbarbasic product / response=actual;
  vbarbasic product /response=actual group=quarter
          groupdisplay=cluster dataskin=matte;
  xaxis display=(nolabel noline);
  yaxis display=(noline) grid;
run;

Full SAS 9.4M3 Code:  vbarbasic

Post a Comment

Getting started with SGPLOT - Part 2 - VBAR

This is the 2nd installment of the "Getting Started" series, and the audience is the user who is new to the SG Procedures. It is quite possible that an experienced users may also find some useful nuggets here.

One of the most popular and useful graph types is the Bar Chart.  The SGPLOT procedure supports many types of bar charts, each suitable for some specific use case.  Today, we will discuss the most common type, the venerable VBAR statement.  In this article I will show you many small examples of bar charts with increasing information.

barchartfreqLet us start with the most basic case, as shown on the right.  This graph shows the frequency or counts by category with default settings.  Click on the graph for a higher resolution image.  The SGPLOT code needed to create is very simple, as shown below.

title 'Counts by Type';
proc sgplot data=sashelp.cars;
  vbar type;
run;

The graph above is rendered to the LISTING destination with default style and default setting for the axes.

barchartrespThe graph on the right shows the mean of city mileage by type.  The title already mentions "Mileage by Type", so there is no need to repeat that information as the label of the x-axis.  The label is suppressed by the x-axis option.

title 'Mileage by Type';
proc sgplot data=sashelp.cars;
  vbar type / response=mpg_city stat=mean
           barwidth=0.6 fillattrs=graphdata2;
  xaxis display=(nolabel);
run;

Note, we have specified RESPONSE=mpg_city, with STAT=MEAN.  This has to be set as the default STAT is SUM, and there is no point in viewing the sum of the mileage of all cars of one type.  Also, we have set BARWIDTH=0.6 and set the bar attributes to GRAPHDATA2 for a change of pace.

barchartresperrorNext, we create a bar chart of mean mileage by type, with display of the 95% confidence limits.  A legend is automatically created by the procedure to display the two items in the graph.  Also note, I have used GRAPHDATA4 for the bar attributes, and removed the display of the baseline to clean up the display.

title 'Mileage by Type';
proc sgplot data=sashelp.cars;
  vbar type / response=mpg_city stat=mean
            barwidth=0.6
            fillattrs=graphdata4 limits=both
            baselineattrs=(thickness=0);
  xaxis display=(nolabel);
run;

barchartresplabelThe graph on the right shows the mean mileage by type, using options to create a different look and feel.  We have also displayed the response value for each bar at the top.  A decorative skin is used to make the bars aesthetically pleasing using DATASKIN=matte.

In this graph I have suppressed the border around the data area.  The axis lines and ticks are removed and y-axis grids are added.  This results in a clean graph as shown on the right.  Click on the graph for a higher resolution image.

title 'Mileage by Type';
proc sgplot data=sashelp.cars noborder;
  format mpg_city 4.1;
  vbar type / response=mpg_city stat=mean
           datalabel dataskin=matte
           baselineattrs=(thickness=0)
           fillattrs=(color=&softgreen);
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline noticks) grid;
run;

barchartstackNow, let us add a group classifier using the GROUP=variable option.  The SGPLOT procedure summarizes the response data by category and group.  Values for each group are stacked for each category, creating a stacked bar chart as shown on the right.

title 'Sales by Type and Quarter for 1994';
proc sgplot data=sashelp.prdsale(where=(year=1994)) noborder;
  format actual dollar8.0;
  vbar product / response=actual stat=sum
           group=quarter seglabel datalabel
          baselineattrs=(thickness=0)
          outlineattrs=(color=cx3f3f3f);
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline noticks) grid;
run;

A stacked bar chart makes sense with STAT=SUM (default).  Now the bar height is the sum of all the observations for the category.  By default, SGPLOT stacks the segments for each group in a category.  Note, with SAS 9.4, the segments can be labeled with the value of each segment, and the bar itself can also be labeled with the total value for each bar.  Note, a legend showing the color used for each unique value of the group variable is shown.

barchartclusterAnother useful graph is shown on the right.  Here, we have used GROUPDISPLAY=CLUSTER which places the groups side-by-side within each category.  A group legend is displayed by default.

title 'Sales by Type and Year';
proc sgplot data=sashelp.prdsale noborder;
  vbar product / response=actual
          group=year groupdisplay=cluster
         dataskin=pressed
         baselineattrs=(thickness=0);
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline) grid;
run;

barchartclustergradientBar values can be shown for each group in a category, as shown on the right.  Note, the values are automatically rotated to a vertical orientation when the values will not fit in the space available.

Note the use of the STYLEATTRS statement to set the fill colors for the two group values to gold and olive.  This statement allows to control the attributes for the group values for fill colors, contrast colors, marker symbols and line patterns.  Also, note the use of FILLTYPE=Gradient to color the bars in an alpha gradient, from fully saturated at the top, to transparent at the bottom.

title 'Sales by Type and Year';
proc sgplot data=sashelp.prdsale noborder;
  styleattrs datacolors=(gold olive);
  vbar product / response=actual  
           group=year groupdisplay=cluster
          dataskin=pressed baselineattrs=(thickness=0)
          filltype=gradient datalabel;
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline) grid;
run;

You may have noted that the VBAR statement supports only one GROUP role, which can then be displayed as STACKED or CLUSTERED.  SGPLOT does not support a bar chart that has both a CLUSTER and a STACK group like the SAS/GRAPH GCHART statement.  Creating such a graph requires some complex layout of the category axis, and a decision was made to avoid such complex axis layouts as this combination is relatively rare.

barchartclusterstackBut, what to do if you do need a stacked + clustered bar chart?  The solution is to use the SGPANEL procedure as shown below.  The resulting graph is shown on the right.  Here we have a bar chart of actual sales by type, year and quarter.  The year values are side-by-side and the quarter values are stacked.

The SGPANEL procedure below uses the panel variable of product.  So, each "cluster" is really a cell in the panel.  Each cell contains a stacked bar chart with category of year and group=quarter.  Normally, the cell header is at the top of each cell, with a header border.  Here, we have moved the header to the bottom of the graph, and suppressed the cell borders, thus making the graph appear like a stacked+clustered bar chart.  Note use of COLAXIS instead of XAXIS and ROWAXIS instead of YAXIS.

title 'Sales by Type, Year and Quarter';
proc sgpanel data=sashelp.prdsale;
  styleattrs datacolors=(gold olive &softgreen silver);
  panelby product / onepanel rows=1 noborder layout=columnlattice
                 noheaderborder novarname colheaderpos=bottom;
  vbar year / response=actual stat=sum group=quarter barwidth=1
           dataskin=pressed baselineattrs=(thickness=0) filltype=gradient;
  colaxis display=(nolabel noline noticks) valueattrs=(size=7);
  rowaxis display=(noline nolabel noticks) grid;
run;

For all the examples above, the data contains one or more classifier variables with one response variable.  This is what is sometimes referred to as a "Tall" structure.  But often, the data structure is "Wide", like in an Excel table, with multiple response columns by category.

barchartoverlayIn such a case, it is possible to create a clustered bar chart without transforming the data, by layering the data for each column as shown on the right.  Here, we have layered two bar VBAR statements, one for mpg_city and one for mpg_highway, both for the same category variable.  Normally, the second layers would cover the first, but we have made the 2nd layer bars narrower, so we can see both.

title 'Mileage by Type';
proc sgplot data=sashelp.cars noborder;
  styleattrs datacolors=(olive gold);
  vbar type / response=mpg_city stat=mean
           dataskin=pressed baselineattrs=(thickness=0) ;
  vbar type / response=mpg_highway stat=mean
          dataskin=pressed baselineattrs=(thickness=0)
         barwidth=0.5;
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline) grid;
run;

barchartoverlayoffset_2Finally, the bars need not be overlayed on category centers, but can be "offset" to be side-by-side, or even a bit overlapped as shown on the right.  Here the bar widths are 0.6, and each VBAR is offset to left or right by 0.1, creating overlapping bars.

title 'Mileage by Type';
proc sgplot data=sashelp.cars noborder;
  styleattrs datacolors=(brown olive);
  vbar type / response=mpg_highway stat=mean
           dataskin=pressed barwidth=0.6 
           baselineattrs=(thickness=0)
           discreteoffset=-0.1;
  vbar type / response=mpg_city stat=mean
          dataskin=pressed barwidth=0.6 
          baselineattrs=(thickness=0)
          discreteoffset= 0.1;
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline) grid;
run;

There is one restrictioin when layering multiple VBAR statements.  The category variables for all VBAR statements must be the same.  If a group is specified, it must be specified for all the VBAR statements in the same way.  If this is not the case, the program will stop with an error message in the log.  There are other ways to handle such cases that will be discussed later.

These examples give you an idea of the versatility of the SGPLOT VBAR statement.  You can create bar charts from the simplest to complex and with different aesthetic appearance.  I would encourage you to see other examples in this blog on creating bar charts with SGPLOT procedure.

Full code:  getting_started_2_vbar

Post a Comment

Advanced ODS Graphics: Remove ODS Subtitles

In my last blog, I showed you how to change the titles in graphs produced by analytical procedures; today I will show you how to remove subtitles that procedures display on some output pages. The following step creates output that contains a SAS title ('Illustrate the CIF Plot'), a PROCTITLE ('The LIFETEST Procedure'), and a subtitle ('Failed Event: Event Indictor: 1=Event 0=Censored=1') that is set by the LIFETEST procedure.

title 'Illustrate the CIF Plot';
ods graphics on;
proc lifetest data=sashelp.Bmt plots=all;
   strata group;
   ods select FailureSummary cifPlot;
   time T*Status(0) / eventcode=1;
run;

Click on images to enlarge.
odsdoc1
You can remove the first and all subsequent titles that are set by TITLEn statements by specifying:

title;

You can remove the PROCTITLE by specifying:

ods noproctitle;

There is not a one-line specification that will remove subtitles, but you can use the ODS Document and PROC DOCUMENT. The following step captures the procedure output into an ODS Document named MYDOC:

title;
ods noproctitle;
ods document name=mydoc (write);
proc lifetest data=sashelp.Bmt plots=all;
   strata group;
   ods select FailureSummary cifPlot;
   time T*Status(0) / eventcode=1;
run;
ods document close;

The WRITE option creates a new document rather than appending to an old document. The following step lists the contents of the document:

proc document name=mydoc;
   list / levels=all;
run;

odsdoc2
This document contains a table and a graph. The listing provides you with the paths that you need to copy and paste to write the next step, which replays the table and graph while suppressing all subtitles:

proc document name=mydoc;
   obstitle \Lifetest#1\Failcode#1\FailureSummary#1;
   replay   \Lifetest#1\Failcode#1\FailureSummary#1;
   obstitle \Lifetest#1\Failcode#1\cifPlot#1;
   replay   \Lifetest#1\Failcode#1\cifPlot#1;
run;

The OBSTITLE (subtitle object) statements, when specified with only an object path, suppress all subtitles. The REPLAY statements display the table and the graph.
odsdoc3
Since the code depends a previous step, and the results from that step (the contents of the ODS Document) can be stored in a SAS data set, you can easily use a macro to avoid copying and pasting paths to the PROC DOCUMENT step:

ods document name=mydoc (write);
proc lifetest data=sashelp.Bmt plots=all;
   strata group;
   ods select FailureSummary cifPlot;
   time T*Status(0) / eventcode=1;
run;
ods document close;
 
%macro nosubs;
proc document name=mydoc;
   ods exclude properties;
   ods output properties=p;
   list / levels=all;
run;
 
data _null_;
   set p end=eof;
   if _n_ = 1 then call execute('proc document name=mydoc;');
   if type = 'Table' or type = 'Graph' then do;
      call execute(catx(' ', 'obstitle', path, ';'));
      call execute(catx(' ', 'replay'  , path, ';'));
   end;
   if eof then call execute('quit;');
run;  
%mend;
 
%nosubs

CALL EXECUTE writes SAS code to a buffer, and that code is run after the DATA step terminates. The macro creates the same PROC DOCUMENT step and results as the first PROC DOCUMENT step--a table and a graph with no titles or subtitles. See the documentation for PROC DOCUMENT for more information about rearranging, subsetting, and generally controlling SAS output.

Members of the Advanced Analytics division at SAS (who create products including SAS/STAT, SAS/QC, SAS/ETS, SAS/IML, SAS/OR, and many others) create documentation that contains text interleaved with output. They use macros that run PROC DOCUMENT to display subsets of procedure ouput. You can use a very similar system called StatRep to create documents that contain SAS code and reproducible results.

Update (November 28, 2016).
You can use the following step to replay the table and the graph with no page break in between.

proc document name=mydoc;
   obstitle \Lifetest#1\Failcode#1\FailureSummary#1;
   obpage   \Lifetest#1\Failcode#1\cifPlot#1 / delete;
   obstitle \Lifetest#1\Failcode#1\cifPlot#1;
   replay   \Lifetest#1\Failcode#1\FailureSummary#1, 
            \Lifetest#1\Failcode#1\cifPlot#1;
quit;
Post a Comment

Layers vs annotation

Last week a user asked about BY variable group processing for SGAnnotate with SGPLOT procedure.  The user provided a simple use case for the question (always a good idea) using the sashelp.class data set.  The graph included a display of reference lines for the mean value of height using annotation.  The problem was that all the lines defined were being rendered in each graph and were not getting filtered with the BY group as SGAnnotation does not support BY variable processing.  See the graph in the linked question above.

This is a good example of a user who is familiar with SAS/GRAPH programming using SGPLOT.  When you do that, it is useful to remember that SGPLOT supports many plot statements that can be "Layered"  together to create a graph.  With SGPLOT, graphs should preferably be built by adding plot layers as far as possible.  This will work for a large number of graphs and annotation may be needed only for a few cases.

data_2This use case is actually better handled in SGPLOT by using plot layers.  Instead of building a separate data set of the mean values by sex for the annotated reference lines, we can merge that data into the single data set.  We compute the mean values by sex using the MEANS procedure and then we can merge the computed data into the original data set (by Sex).  The last few rows of the final data set are shown on the right.  The data set is a merge of the relevant columns of sashelp.class and the computed mean values by sex.  I have also added a column called LBL for the label that I want to display on the mean reference line at x=zero.

Now, in addition to the SCATTER statement, we can use a REFLINE statement layer to display the reference lines at the mean value, and also label the line as necessary using a TEXT plot.  The active BY variable for the procedure will automatically work with the reference line and text plot to render the desired results.  The data set is sorted by sex for BY variable processing.

SGPLOT code:

title 'Height by Weight';
proc sgplot data=class2 uniform=yscale noautolegend;
  by sex;
  scatter x=weight y=height;
  refline meanHeight;
  text x=zero y=meanHeight text=lbl / position=topright ;
run;

heightbysexThe graph on the right is one of the two graphs generated for By value of Sex=F.  The reference line is displayed at the appropriate value, along with the label "Mean=60.8" positioned at the left end of the reference line.  If your SAS release does not include support for TEXT plot, you can leave it out, or use SCATTER with MarkerChar option.

Note the use of  the procedure option UNIFORM=yscale which makes the y-axis data range uniform across all the graphs.   Click on the graph for a higher resolution view.

There are multiple benefits of using plot layers instead of annotation.

  • Each plot contributes its data range and offset requirements to the axes.
  • The axes union the data ranges, and communicate the information back to the plots for display.
  • The plots can be interspersed between other plots in the order we want.
  • The plots contribute to legends (and work with attribute maps).
  • The data also works correctly with the BY variable.

While not significant here, plot ordering and group attribute assignment and contribution to legends can be a big benefit in other graphs such as the Swimmer Plot.  So, when coming to SGPLOT from SAS/GRAPH, it will be to your benefit if you don't just duplicate the process you use with GPLOT and annotate.  There may be other ways to achieve the right results.

Full SAS9.40 code: layers 

 

 

Post a Comment

Outside-the-box: Directed circle link graphs

circle_graph_arrowOne request came in for the previous article on Circle link graph, for the addition of arrow heads to indicate the direction of the flow.  Given that I am using a SERIES plot to render the links, it is relatively easy to add arrow heads to the links as the SERIES plot statement itself supports options for displaying arrow heads.

The arrow head sizes depend on the shape of the arrow head and also on the thickness of the line. Arrowheads generally work out well for smaller line thicknesses.  So, in the case of the graph on the right, I have remove the response thickness option and made all the arrows 5px thick.  Click on the graph for a higher resolution view.

series x=x y=y / group=link
           lineattrs=(pattern=solid thickness=5)
           grouplc=colorTo transparency=0.05
           nomissinggroup
           arrowheadpos=end arrowheadshape=barbed;

One important change in the code is related to the length of each link.  Previously, without arrow heads, the links were drawn from and to the middle of the circle rim, and then overdrawn with the circular nodes.  Now, that is not suitable since portion of the arrow head would be hidden.  Also, drawing the links over the circular nodes looks less aesthetic.  So, I have to stop the link short on the "To" side of the node.

For this graph, I am computing the spline curve myself.  In the %makelink() macro, the vertex values for the curved links are computed for t=0 to 1.0 by 0.05.  So, I can stop the computation short of 1.0, say, 0.97 to get the shortened links so the arrow heads are fully visible.  Also, when the SERIES plot computes the arrow head, the thick line needs to be stopped a bit early so the arrow head point is not overdrawn by the thick line.  To allow this to happen, the last link of the series needs to be long enough to allow it to be shortened.  To do this, I used:  do t=0.0 to 0.85 by 0.05, 0.97;  This provides a longer last line segment from t=0.85 to 0.95 for the last link.

circle_graph_arrow_respVariable line thickness can still be supported, but I reduced the max line thickness, as seen in the graph on the right.  If the links are made translucent, the overlap between the arrowhead and line segment can be seen.  So, it is better to have a high level of opacity.  Just for aesthetics, I moved one of the links, but it would still work.

Full SAS 9.40M3 SGPLOT code: directed_circle_graph

Post a Comment

Outside-the-box: Circle link graph

There has been some interest in "Circle Link Graph" diagrams where the nodes are laid out in a circle, with links going from one node in the circle to another.

circle_graph_dataI recall seeing one diagram during the 2014 World Cup Soccer tournament, showing the number of players from one country that are playing in a league in another country.  I thought it would be an interesting exercise to see how to build this graph using SGPLOT procedure.

To create such a graph, I just made up some data of the number of players in each team that are playing in their own country and in some other country.  Then, I used some data step code using Hash Object to build a data set that contains the coordinates of the nodes as segments of arcs around the circle, and links that traverse from the center on one node to another.

circle_links_fromThe graph is shown on the right.  Click on the graph for a higher resolution view.  I first did this exercise back in 2014.  At that time SERIES plot did not support line thickness response to make the line thickness proportional to some variable.  Also, SERIES plot did not support coloring lines by a separate group and smooth splines.  Nor was there any expressed interest in the user community for such a graph, so I did not post my findings.

Now, there appears to be some interest expressed in the SAS community for such a graph.  Also, with SAS 9.40M3, we have all the features in place to make decent circle-link graph as shown on the right.    It will be clear when you see the code linked below that the task is not trivial.  Here are the steps to create this graph.

  • The data set as shown above has three columns, From, To and LinkCount.  These represent the number of players that are "From" one country playing in the "To" country.
  • Run a data step to get the total number of players.
  • Run a data step to add all the node names and links into two separate Hash Objects.
  • Iterate over all the nodes in the nodes hash object to compute the start and end angles for all nodes.
  • Iterate over all the links in the links hash object and compute the coordinates for each link .
  • A macro is used to compute the spline shape.
  • Note, each link starts and ends at the center of the node.
  • Use two TEXT plot do display the country names.  Two TEXT statements are needed to change the Position option to "Right" and "Left" as we go around the circle.

circle_graph_to_2Node names can be displayed in the circle instead of "radial" as shown on the right.  The angle of rotation "rotate2" is already computed.  "Backlight" option is used to ensure the text is visible.

SGPLOT code:

proc sgplot data=links aspect=1 nowall noborder subpixel;
  series x=x y=y / group=link lineattrs=(pattern=solid)
             grouplc=colorTo transparency=0.2 nomissinggroup
             smoothconnect thickresp=linkcount
             thickmax=36 thickmaxresp=3;
  series x=x y=y / group=country lineattrs=(thickness=20
            pattern=solid color=white) nomissinggroup;
  series x=x y=y / group=country nomissinggroup
            lineattrs=(thickness=20 pattern=solid)
           grouplc=colorTo transparency=0.2 name='a';
  text x=xlbl y=ylbl text=country / rotate=rotate2
          position=center textattrs=(size=9 color=White) backlight;
  xaxis display=none;
  yaxis display=none;
run;

Next Steps:  With SAS 9.40M3, a SPLINE statement is also available which does all the work of computing a smooth spline for each link.  The program could be updated to use those and remove the need for the macro.

Also, it would be nice if the start and end points of the links can be distributed along the arc of the nodes.  In that case the width of the node and of the link will both have to be proportional to the number of players.

I believe such graphs can be useful to view all kinds of "Consumer" and "Provider" relationships.  For "Patients" and "Providers" it could be that some providers are also patients.

Full SAS9.40M3 SGPLOT code:   circle_graph

Post a Comment

Clinical Graphs: Spider plot

A Spider Plot is another way of presenting the Change from Baseline for tumors for each subject in a study by week.  The plot can be classified by response and stage.  Another way of displaying Tumor Response data was discussed earlier in the article on Swimmer Plot.

spiderThis article is prompted by a question on the SAS communities page on how to create a Spider plot.  The user provided an illustration of what the plot might look like. I followed the example and generated some data to create the graph shown on the right.  Click on the graph for a higher resolution view.

The data is arranged in six columns.  Four columns are needed to draw the progression of the disease for each subject over time:  Subject, Week, Change, RGroup.

spider_dataTwo additional columns are used to display the status at the end of the curve for each subject: WeekS, TGroup.  The first three observations are just to ensure the groups are assigned colors in the order we want.  A Discrete Attributes Map can be used, but I ran into some minor difficulties, so I skipped that step.  This exercise hepled reveal a minor problems, but I was able to work around it.

In this case, WeekS=Week+2 in this case, to position the marker at the right of the curve.  Only the last point in the curve has this nonmissing value.

The SAS 9.4M2 options for coloring the series and markers by another classifier (GroupLC and GroupMC) is used to color each curve for the Subject by the Response Group (RGroup).  Note, the connectivity for each curve is determined by setting Group=Subject.  Then, with in this, the color of each curve is set by setting GroupLC=RGroup.  This allows multiple curves to be classified in the same category.

The SAS 9.40M1 SymbolChar statement is used to define a few symbols to represent the status of each Subject, such as "Treatment Ongoing" etc.  These are inserted into the group symbols list using the DataSymbols option.  Custom group data colors are set using the DataContrastColor option.

A Scatter plot is used to display these markers.

SAS 9.4M2 SGPLOT Code:

title "Tumor Response by Week";
ods graphics / reset width=5in height=3in imagename='Spider';
proc sgplot data=spider noborder tmplout='c:\spider.sas';
  format tgroup $growth.;
  symbolchar name=ongoing char='2192'x / scale=1;
  symbolchar name=growtht char='2020'x / scale=1;
  symbolchar name=growthnt char='2021'x / scale=1;
  styleattrs datacontrastcolors=(green gold red)
                    datasymbols=(ongoing growtht growthnt );
  refline 0 / lineattrs=(pattern=shortdash);
  series x=week y=change / group=subject grouplc=rgroup groupmc=rgroup
             markers markerattrs=(symbol=circlefilled)
            lineattrs=(thickness=2 pattern=solid) name='a';
  scatter x=weekS y=change / group=TGroup markerattrs=(size=16 color=black)
           nomissinggroup name='b';
  keylegend 'a' / title='Response' type=linecolor valueattrs=(size=7)
            location=inside position=topright across=1 opaque;
  keylegend 'b' / valueattrs=(size=7) noborder;
  xaxis label='Week';
run;

There are also examples of Spider plots with negative date values on the x-axis.  These appear to track the disease state before and after start of treatment.  The above code will likely work if the data itself includes values from before start of treatment and the baseline value is 1.0.

Full SAS 9.4M2 SGPLOT code:  spider

Post a Comment

Outside-the-box: CONSORT diagram

Over the past few weeks I have heard about the "Consort Diagram".  This was mentioned in a Communities article, and also by a couple of users separately.

consort_diagram_poster_800This topic was also covered by Anusha Mallavarapu and Dean Shults from Cytel in a poster at PhUSE 2016 as shown on the right.  The authors discuss an automated way to create the diagram using RTF template.

Speaking with the author, it appears that the diagram structure is relatively fixed for a set number of arms of the study.  The authors showed a sample diagram for a 4 arm study shown on the right.

I thought it would be an interesting exercise to see if I could create this diagram fully in SAS, thereby reducing the complexity of using multiple tools.  The diagram I created is meant to essentially mimic the diagram presented by the authors above.  I have not filled in every box, but did one of each to see how we can create such a graph using the SGPLOT procedure.  Other diagram structures could be defined in a similar way.

consortdiagramI have used the following statements with SAS 9.4M3 SGPLOT to create the diagram shown on the right.  You can click on it for a higher resolution view.

  1. A Series plot to draw the links.
  2. A Polygon plot to draw the empty boxes.
  3. A Polygon plot to draw the filled boxes.
  4. A Text plot to draw the center aligned horizontal text.
  5. A Text plot to draw the left aligned horizontal text.
  6. A Text plot to draw the rotated text in the filled boxes.

The full program is linked below.  The diagram is created in a 0-200 vertical and 0-100 horizontal space. Vertices for the links are defined as nodes data set with Node Id and their (x, y) coordinates based on the shape of the diagram.  A Hash Object is created to hold the node ids and their x and y coordinates.

Links are defined as multi segment lines with the Node Ids as vertices.  Up to 4 nodes can be used to allow for the angled links.  Then, the Node Hash Object is used to get the coordinates of the vertices for the links, and written out as a series plot with multiple legs.  Separate data sets are defined for the empty rectangles, and for the filled rectangles so two Polygon statements can be used to draw these, one for empty and one for filled.

While I defined the polygon vertices directly for ease of use, this could also be based on the Node Ids from the Hash Object.

Similarly, text is defined in three data sets, one for the rotated text, one for the center aligned text and one for the left aligned text.  These are used with the three Text statements to draw the information.  a FitPolicy=SplitAlways is used with a SplitChar="." to arrange the text.  Finally, all the data sets are merged into one data set for use with the procedure.

I defined the text location directly for ease of use, but that could also be associated to the Node Ids and extracted from the Hash Object.

My goal is to show how the data should be arranged and which statements to use to draw the graph.  The authors indicated that often the Consort Diagram structure and textual information (except the numbers) is static, with changing numbers.  In that case, the diagram can be defined once and reused.  The "N" values can likely be held in macro variables and inserted into the right places.

SAS 9.40M3 is necessary as I have used the TEXT plot statement to draw the text in the nodes.  It may be possible to do this using Annotate with earlier versions, likely a bit harder.  I would be interested to hear if this helps in the task, and what other details may need to be addressed.  So, please feel free to chime in.

SAS 9.4 SGPLOT Code:  consort_diagram

 

Post a Comment