Dashboard graphs

In this blog we have been discussing graphs useful for analysis of data for many domains such as clinical research, forecasting and more.  SG Procedures and GTL are particularly suited for these use cases.  So, when I came upon a dashboard image from Steven Few's Visual Business Intelligence blog, showing the use of bullet graphs with targets, I was intrigued to see how much mileage I could get out of GTL to create this graph:

As we are all familiar by now, GTL uses a building block approach to create a graph.  Often you can build unique visuals by combining together plot statements in creative ways .  Let us analyze this graph and see if we can break it down into component parts that can be handled by GTL.  We will  start with the first bullet graph for Revenue.  Here is what I see, and the equivalent GTL feature:

  • A needle showing the current performance on a linear axis - GTL horizontal bar chart.
  • A banded background behind the needle represents qualitative data ranges -  GTL horizontal stacked bar chart or band plot.
  • A symbol to show a comparative measure or a target - GTL Scatter plot.
  • A label on the left to describe the measure - GTL Entry.

I put together a simple data set for just one indicator by eyeballing the data in the graph.  Here is the data for Revenues:

Step 1:  Draw the background range bands using a stacked horizontal bar chart.  This could likely be done using a band plot.  With bar chart we can use the skin feature to make it look better (business users prefer a little glitz).  We will use a two-column Lattice, with the text in the left cell, and the graph in the right cell.  The cell sizes are set to provide more space for the plot.

  • The stacked horizontal grouped bar chart has only one bar (category).
  • I used index to pick specific style elements for each group value.
  • I have hidden the Y axis and set offsets to zero.
  • I made the BarWidth=1.0 so now the bar fills the whole cell.
  • I added the legend so show the range levels.
  • I used a right aligned Entry in the left cell for the description.

Code snippet:

proc template;
  define statgraph KPI_Revenue_1;
    begingraph;
      entrytitle '2005 YTD' / textattrs=(weight=bold);
      layout lattice / rows=1 columns=2 columnweights=(0.25 0.75);
        layout gridded;
          entry halign=right "Revenue" / textattrs=graphtitletext;
          entry halign=right "US $(1000)";
	endlayout;
 
        layout overlay / yaxisopts=(display=none offsetmin=0.0 offsetmax=0.0)
                         xaxisopts=(display=(ticks tickvalues) offsetmin=0.0 offsetmax=0.0
                                    tickvalueattrs=(size=8));
	  barchart x=cat y=level / group=group name='a' orient=horizontal
                   barwidth=1.0 outlineattrs=(color=black) skin=satin index=index;
	endlayout;
 
	columnheaders;
	  entry ' ';
          discretelegend 'a' / border=false valueattrs=(size=8);
	endcolumnheaders;
      endlayout;
    endgraph;
  end;
run;
 
ods graphics / reset width=4in height=1in imagename="KPI_Revenue_92_1";
proc sgrender data=YTD_2005_Revenue template=KPI_Revenue_1;
run;

Step 2:  Overlay a second bar chart showing just the revenue:

Step 3:  Overlay a ScatterPlot with filled triangle marker at a small discrete offset.

  • Markers are not outlined, so I used two scatter plots, one with black marker and one with smaller yellow marker.
  • Markers in the legend are not overlaid, so I put a grey background color.
  • I used DiscreteOffset to move the marker down a bit.
  • Now, we essentially have the bullet KPI for one measure.

Step 4:  Extend this concept to multiple cells, each cell using a separate set of columns for the data.   Here is what the data looks like for 2 cells (Revenues and Expenses): 

Here is the graph for all three measures:

  • Note the 2nd KPI has reverse axis.
  • I used index to color the group values.  The indexes are reversed for the 2nd kpi.
  • I added a gutter to separate the rows a bit.

This works well with style=journal too, as long as we are careful with the custom colors we used for the indicator bar and the target symbol.

Looks like we may have covered almost every detail from the original graph.  Everything is data driven, no custom hard coded features except the labels.  If one of the axis was not reversed, we could have used a DataLattice with data organized by class variable for even more flexibility.

Not bad for a framework designed to create analytical graphs!  With SAS 9.3, you have more options with built-in Target option on the Bar Chart.

A suggestion was made by Prashant that if the data was classified by the type of metric (Revenue, Expense, etc.), one could use the ifc() function to extract just the values for one metric for each cell.   This seemed to work, until I ran into a glitch.  I will post a follow up once I figure that out.   Stay tuned.

Full SAS 9.2 program :  Full SAS 92 Code

Post a Comment

Distribution of Maximum LFT by Treatment

The graph showing the distribution of the maximum liver function test values by treatment for all participants in a study is commonly used for the analysis of safety data in clinical trials.   The data is often structured in multiple columns (one per treatment) as below on the left, or grouped by the treatment as shown on the right:

When data is structured in multiple columns, we can create the graph showing the distribution of Max LFT values using overlaid box plots as follows:

Here we have used an overlay of two box plots, one for each treatment group (column) with DISCRETEOFFSET to place the two treatments side by side.  With SAS 9.2, we have to use GTL to create this graph.  Here we have used filled boxes.

SAS 9.2 GTL Code Snippet:

proc template;
  define statgraph Max_LFT_By_TRT;
    begingraph;
      entrytitle 'Distribution of Maximum Liver Function Test Values by Treatment';
      entryfootnote halign=left "For ALAT, ASAT and ALKPH, the Clinical Concern Level is 2 ULN;" / textattrs=(size=8);
      entryfootnote halign=left "For BILTOT, the CCL is 1.5 ULN: where ULN is the Upper Level of Normal Range" / textattrs=(size=8);
      layout overlay / cycleattrs=true yaxisopts=(label='Maximum / ULN')
                       xaxisopts=(display=(ticks tickvalues line));
	boxplot x=test y=a / discreteoffset=-0.2 boxwidth=0.2 display=(median mean outliers caps fill)
                outlineattrs=graphdata1(pattern=solid) meanattrs=graphdata1
                medianattrs=graphdata1(pattern=solid) whiskerattrs=graphdata1(pattern=solid)
                outlierattrs=graphdata1 name='a' legendlabel='Drug A (N=209)';
        boxplot x=test y=b / discreteoffset= 0.2 boxwidth=0.2 display=(median mean outliers caps fill)
                outlineattrs=graphdata2(pattern=solid) meanattrs=graphdata2
                medianattrs=graphdata2(pattern=solid) whiskerattrs=graphdata2(pattern=solid)
                outlierattrs=graphdata2 name='b' legendlabel='Drug B (N=405)';
        discretelegend 'a' 'b' / location=inside halign=right valign=top across=1;
		referenceline y=1 / lineattrs=(pattern=dot);
        referenceline y=1.5 / lineattrs=(pattern=dot);
        referenceline y=2 / lineattrs=(pattern=dot);
      endlayout;
    endgraph;
  end;
run;

Full SAS 9.2 GTL Program:  Full SAS 92 GTL Code

SAS 9.3 supports box plots with cluster grouping.  Provided the data is in the "Grouped by treatment" form shown on the right side above, we can use a single box plot statement, with GROUP=treatment and GROUPDISPLAY=CLUSTER.   At SAS 9.3, you can either use GTL, or use SGPLOT procedure to create this graph.  Here we have used unfilled boxes.

SAS 9.3 SGPLOT code snippet:

title h=10pt 'Distribution of Maximum Liver Function Test Values by Treatment';
footnote1 h=8pt j=left "For ALAT, ASAT and ALKPH, the Clinical Concern Level is 2 ULN;";
footnote2 h=8pt j=left "For BILTOT, the CCL is 1.5 ULN: where ULN is the Upper Level of Normal Range";
proc sgplot data=LFT_Group;
  format drug $drug.;
 
  /*--Use grouped box plot - default GroupDisplay is cluster--*/
  vbox value / category=test group=drug nofill lineattrs=(pattern=solid)
               medianattrs=(pattern=solid);
 
  keylegend / location=inside position=topright across=1;
  refline 1 1.5 2 / lineattrs=(pattern=dot);
  xaxis discreteorder=data display=(nolabel);
  yaxis label='Maximum (/ULN)';
  run;

Full SAS 9.3 SGPLOT Program:  Full SAS 93 SG Code

Post a Comment

Beer, diapers and heat map

The parable of beer and diapers is often related when teaching data mining techniques.  Whether fact or fiction, a Heat Map is useful to view the claimed associations.  A co-worker recently enquired about possible ways to display associations or dependency between variables.  One option is to show the dependency as a node link diagram.  But, he soon settled on the Heat Map as the preferred means, one reason may be its compact display.

In the examples below, we show a few different ways we can display such data.  The data shown below is totally made up just for the purposes of illustration.  You may actually have a response value for each crossing, and we will look at that use case later.

Showing both positive and negative associations, the resulting map is like this:

Code snippet:

/*--Group Heat Map--*/
proc template;
  define statgraph dep_grp;
    dynamic _showvalues _gap _offset;
    begingraph;
      entrytitle 'Associations Matrix';
      layout overlay / yaxisopts=(reverse=true display=(tickvalues) offsetmin=_offset offsetmax=_offset)
                       xaxisopts=(display=(tickvalues) offsetmin=_offset offsetmax=_offset);
        heatmapparm x=prod_x y=prod_y colorgroup=value / name='a' display=(fill outline)
                    xgap=_gap ygap=_gap datatransparency=0.4;
        if(_showvalues eq 'yes')
         scatterplot x=prod_x y=prod_y / markercharacter=value markercharattrs=(size=9);
	endif;
        discretelegend 'a';
      endlayout;
    endgraph;
  end;
run;
 
/*--Heat Dependency Map with Groups, Labels and Gaps--*/
ods graphics / reset width=3.5in height=3in imagename='Dependency_Group_Labels_Gap';
proc sgrender data=dep_grp_2 template=dep_grp;
  dynamic _showvalues='yes' _gap='3';
run;

Significant features of the graph are as follows:

  • A HeatMapParm with ColorGroup role is used.
  • Dynamics are used for _offset, _gap and _showValue.
  • These dynamics are set in the SGRENDER step.
  • An overlaid ScatterPlot is used to draw the value labels on the cells.

The group value that is seen first gets the first GraphData style element.  In this case, GraphData1 has the blue color, and GraphData2 has the red color.  We have used some transparency to fade the color intensity.

DiscreteAttrMap:  Often it is necessary to have reliable color assignment for the cells based on the value of the group variable.  We do that using the DiscreteAttrMap feature in SAS 9.3.  A DiscreteAttrMap works pretty much like a User Defined Format.  Each formatted value for the variable can be assigned specific visual attributes, which are then used regardless of the order or presence of the values in the data:

Code Snippet:

/*--Group Heat Map with Discrete Attr Map--*/
proc template;
  define statgraph dep_grp_map;
    dynamic _showvalues _gap _offset;
    begingraph;
      entrytitle 'Associations Matrix';
 
      /*--Define the discrete attributes map--*/
      discreteattrmap name='map';
        value "Yes" / fillattrs=(color=darkgreen);
        value "No" / fillattrs=(color=darkred);
      enddiscreteattrmap;
 
      /*--Associate the attributes map with the variable--*/
      discreteattrvar attrvar=value var=value attrmap="map";
 
      layout overlay / yaxisopts=(reverse=true display=(tickvalues) offsetmin=_offset offsetmax=_offset)
                       xaxisopts=(display=(tickvalues) offsetmin=_offset offsetmax=_offset);
        heatmapparm x=prod_x y=prod_y colorgroup=value / name='a' display=(fill outline)
                    xgap=_gap ygap=_gap datatransparency=0.6;
        if(_showvalues eq 'yes')
	  scatterplot x=prod_x y=prod_y / markercharacter=value markercharattrs=(size=9);
	endif;
        discretelegend 'a';
      endlayout;
    endgraph;
  end;
run;
 
/*--Heat Dependency Map with Groups, Labels, Gaps and Custom colors--*/
ods graphics / reset width=3.5in height=3in imagename='Dependency_Group_Map';
proc sgrender data=dep_grp_2 template=dep_grp_map;
  dynamic _showvalues='yes' _gap='3';
run;

Full SAS 9.3 code:  Full SAS 93 Code

If the association has an interval response value, we can display the heat map with a ColorResponse rather than ColorGroup.  An example of this is the Calender Heatmap posted earlier in this blog by Pratik Phadke.

 

 

Post a Comment

SG Procedures Book Samples: Adverse Event Timeline

Here is the second installment of sample graphs from the SG Procedures book - The Adverse Event Timeline.  This is a graph commonly used in patient profiles for clinical trials where we track the progress of a patient through a hospitalization event, tracking the dates and severity of the adverse events.

The data for this graph was originally in CDISC format, and was extracted and written to a data set for convenience.  The data looks like this:

The first three observations have been added to ensure that all three severity levels are included in the graph and the legend.  This data is then shaped into the format needed to create the graph, shown below.  Incremental Y values is assigned to each event so we can use the Vector Plot to plot the events.

SAS 9.2 Version:

The important features of the graph are as follows:

  • The data is shaped to start the Study Days from the earliest start date minus 10 days.
  • The adverse event is plotted by day on the X (bottom) axis using a Vector plot.
  • Markers for start and end event are plotted using Scatter plot.
  • An equivalent  date axis is plotted along the X2 (top) axis.
  • Macro variables computed in the data step are used to correctly line up the X and X2 axes.
  • The name of the adverse event is plotted on the left of the event, using a scatter plot with markerchar feature.
  • The position of this label has to be computed based on graph width, length of string and font metrics (approx).
  • Three dummy events are assigned Y=-9, so they fall below the range of the axis range.

SAS 9.2 SG Plot Code snippet:

title "Adverse Events for Patient Id = xx-xxx-xxxx";
proc sgplot data=ae1 noautolegend nocycleattrs;
   refline 0 / axis=x lineattrs=(thickness=1 color=black);
 
   /*--Draw the events--*/
   vector x=endday y=y / xorigin=startday yorigin=y noarrowheads lineattrs=(thickness=9px);
   vector x=endday y=y / xorigin=startday yorigin=y noarrowheads lineattrs=(thickness=7px pattern=solid)
          transparency=0 group=aesev  name='sev';
 
   /*--Draw start and end events--*/
   scatter x=aestdy y=y / markerattrs=(size=13px symbol=trianglefilled);
   scatter x=aestdy y=y / markerattrs=(size=9px symbol=trianglefilled) group=aesev;
   scatter x=aeendy y=y / markerattrs=(size=13px symbol=trianglefilled);
   scatter x=aeendy y=y / markerattrs=(size=9px symbol=trianglefilled) group=aesev;
 
   /*--Draw the event names--*/
   scatter x=xc y=y / markerchar=aedecod;
 
   /*--Assign dummy plot to create independent X2 axis--*/
   scatter x=aestdate y=y /  markerattrs=(size=0) x2axis;
 
   /*--Assign axis properties data extents and offsets--*/
   yaxis display=(nolabel noticks novalues) min=0;
   xaxis grid label='Study Days' offsetmin=0.02 offsetmax=0.02
         values=(&minday10 to &maxday by 2);
   x2axis notimesplit display=(nolabel) offsetmin=0.02 offsetmax=0.02
         values=(&mindate10 to &maxdate);
 
   /*--Draw the legend--*/
   keylegend 'sev'/ title='Severity :';
run;

Full SAS 9.2 Code:  Full SAS Code_92

In the above graph, we have made use of the Vector Plot to draw the events.  Also, we have to make an approximate computation for the location of the event name.

SAS 9.3 Version:

Some of the above machinations can be avoided when using the SAS 9.3, where we can use the new HighLow plot statement to display the event, along with end caps to indicate if the event continues past the duration.  Also, the HighLow plot automatically draws the event label in the right location, saving us the kludged code.  The coding is easier, and the resulting graph is nicer to boot.

The important features of the graph are as follows:

  • The data is shaped to start the Study Days from the earliest start date minus 10 days.
  • The adverse event is plotted by day on the X (bottom) axis using a  HighLow plot with end caps.
  • An equivalent  date axis is plotted along the X2 (top) axis.
  • Macro variables computed in the data step are assused to correctly line up the X and X2 axes.
  • The name of the adverse event is plotted on the left of the event, using the data label feature of HighLow plot
  • Three dummy events are plotted outside the Y range, and attached to the legend.

SAS 9.3 SG Plot Code snippet:

title "Adverse Events for Patient Id = xx-xxx-xxxx";
proc sgplot data=ae2 noautolegend nocycleattrs;
  /*--Draw the events--*/
  highlow y=aeseq low=stday high=enday / group=aesev lowlabel=aedecod type=bar
          barwidth=0.8 lineattrs=(color=black) lowcap=lcap highcap=hcap name='sev';
 
  /*--Assign dummy plot to create independent X2 axis--*/
  scatter x=aestdate y=aeseq /  markerattrs=(size=0) x2axis;
 
  refline 0 / axis=x lineattrs=(thickness=1 color=black);
 
  /*--Assign axis properties data extents and offsets--*/
  yaxis display=(nolabel noticks novalues) type=discrete;
  xaxis grid label='Study Days' offsetmin=0.02 offsetmax=0.02
        values=(&minday10 to &maxday by 2);
  x2axis notimesplit display=(nolabel) offsetmin=0.02 offsetmax=0.02
        values=(&mindate10 to &maxdate);
 
  /*--Draw the legend--*/
  keylegend 'sev'/ title='Severity :';
  run;

Full SAS 9.2 Code: Full SAS Code_93

Adverse Event graphs are often used with graphs showing concomitant medications and vital signs of the patient on a uniform X axis.

 

Post a Comment

Comparative density plots

Recently a user posted a question on the SAS/GRAPH and ODS Graphics Communities page on how to plot the normal density curves for two classification levels in the same graph.

We have often seen examples of a  distribution plot of one variable using a histogram with normal and kernel density curves.  Here is a simple example:

Code Snippet:

title 'Mileage Distribution';
proc sgplot data=sashelp.cars;
  histogram mpg_city;
  density mpg_city  / type=normal legendlabel='Normal' lineattrs=(pattern=solid);
  density mpg_city  / type=kernel legendlabel='Kernel' lineattrs=(pattern=solid);
  keylegend / location=inside position=topright across=1;
  xaxis display=(nolabel);
  run;

To compare the distribution by a classifier in the same graph, you can do something similar as long as the classified data is transformed into a multi-column format.  Now, you can overlay two (or more) density curves of different variables in the same way.

In the example below, we have transformed the data for sashelp.cars into a multi-column format  using the code suggested by Rick Wicklin in his article Reshape data so that each category becomes a variable.   The values of MPG_CITY for the three levels of the Origin variable are transformed into three indepenent columns.  Then, we have used three density statements to plot the data in one graph.  Here is the graph and the code snippet.  Full program is included at the bottom.

Code snippet:

title 'Mileage Distribution by Origin';
proc sgplot data=multiVar;
  density mpg_usa / legendlabel='USA' lineattrs=(pattern=solid);
  density mpg_asia  / legendlabel='Asia' lineattrs=(pattern=solid);
  density mpg_eur  / legendlabel='Europe' lineattrs=(pattern=solid);
  keylegend / location=inside position=topright across=1;
  xaxis display=(nolabel);
  run;

We can take this idea further, and create a plot to see the distribution of multiple variables on the same graph using histograms and / or density plots.  Here is an example of systolic and diastolic blood pressure from sashelp.heart.  We have set a transparency level for each plot to be able to see the data:

Code snippet:

title 'Distribution of Blood Pressure';
proc sgplot data=sashelp.heart;
  histogram systolic / fillattrs=graphdata1 name='s' legendlabel='Systolic' transparency=0.5;
  histogram diastolic / fillattrs=graphdata2 name='d' legendlabel='Diastolic' transparency=0.5;
  keylegend 's' 'd' / location=inside position=topright across=1;
  xaxis display=(nolabel);
  run;

Full SAS 9.2 Program:  Full SAS Code

SAS 9.3:  With SAS 9.3, you can set the binwidth for the histograms to get a better comparative graph:

SGPlot code:

title 'Distribution of Blood Pressure';
proc sgplot data=sashelp.heart;
  histogram systolic / fillattrs=graphdata1 name='s' legendlabel='Systolic' 
                       transparency=0.5 binwidth=5; 
  histogram diastolic / fillattrs=graphdata2 name='d' legendlabel='Diastolic' 
                       transparency=0.5  binwidth=5; 
  keylegend 's' 'd' / location=inside position=topright across=1;
  xaxis display=(nolabel);
  run;

Ful SAS 9.3 code:  Full SAS Code 93

Post a Comment

SG Procedures Book Samples: Forest Plot

In December of last year, the book "Statistical Graphics Procedures by Example" co-authored by Dan Heath and I was published.  On the back cover, it proclaims "Free Code on the Web".  Now, who can resist such an offer?   Since most of the examples in the book have very short syntax, we put all the data sets used in the examples in the downloadable file, but no sample code.

Well, this did not fly, and we got multiple requests from readers for sample code.  Chapters 12 and 13 of the book include many industry specific graphs for the Clinical and Business use cases.  The examples in these chapters show the SGPLOT procedure code needed, but not the rest of the code needed to prepare the data for the graph.  Also, some graphs use macro variables that have to be set up in such code.

So, we decided to add these samples to the downloadable file and I thought it would be a good idea to share them in this blog for wider distribution, starting with the Forest Plots from Figures 12.2 and 12.3 in the book .

Note:  These samples show how to create such graphs using the SG procedures (the topic of the book).  Some graphs may be better done using GTL.  Where appropriate, I will also post the GTL version.

Here I have included SAS 9.2 and SAS 9.3 versions.  When I first did the graphs using SAS 9.2 (eating ones own dog food), some gaps in the features came to light that were addressed in SAS 9.3.  So, the SAS 9.3 version is genearlly easier to code up.

Forest Plot Data:

The data is as shown in the table above.

  • Study names are included as individual observations.
  • Individual study has Grp=1 and Overall has Grp=2.
  • The Overall Observations are separated into a separate set of columns to the right.
  • Additional columns on the right are created to display the table of values.

Figure 12.2:  SAS 9.2 Forest Plot:

SGPLOT Code:

title "Impact of Treatment on Mortality by Study";
title2 h=8pt 'Odds Ratio and 95% CL';
 
/*--Create the plot--*/
proc sgplot data=forests noautolegend;
  /*--Display overall values (Study2)--*/
  scatter y=study2 x=oddsratio / markerattrs=(symbol=diamondfilled size=10);
 
  /*--Display individual values (Study)--*/
  scatter y=study x=oddsratio / xerrorupper=ucl2 xerrorlower=lcl2
          markerattrs=(symbol=squarefilled);
 
  /*--Display statistics columns on X2 axis--*/
  scatter y=study x=or / markerchar=oddsratio x2axis;
  scatter y=study x=lcl / markerchar=lowercl x2axis;
  scatter y=study x=ucl / markerchar=uppercl x2axis;
  scatter y=study x=wt / markerchar=weight x2axis;
 
  /*--Draw other details in the graph--*/
  refline 1 100  / axis=x;
  refline 0.01 0.1 10 / axis=x lineattrs=(pattern=shortdash) transparency=0.5;
  inset '                Favors Treatment'  / position=bottomleft;
  inset 'Favors Placebo'  / position=bottom;
 
  /*--Set X, X2 axis properties with fixed offsets--*/
  xaxis type=log offsetmin=0 offsetmax=0.35 min=0.01 max=100 minor
        display=(nolabel) ;
  x2axis offsetmin=0.7 display=(noticks nolabel);
 
  /*--Set Y axis properties using offsets computed earlier--*/
  yaxis display=(noticks nolabel) offsetmin=&pct2 offsetmax=&pct2;
run;

The key feature is the splitting of the width of the graph into the graph area on the left (on X axis) and the table on the right (X2 axis).  Other steps in this graph are as follows:

  • The  Overall study values are plotted using the first scatter plot with the diamond marker.
  • The individual study values are plotted using the second scatter plot.
  • The statistics are plotted using 4 scatter plot statements with markerchar on X2 axis.
  • The X and X2 axis extents are set using OffsetMin and OffsetMax.
  • Macro variables are used to set Y-axis offsets based on number of observations.
  • The data is sorted by descending obsid to draw the Overall observation at the bottom.

Full Code: Full SAS 92 Code

Figure 12.3:  SAS 9.3 Forest Plot:

In the SAS 9.3 version, we have used the new HIGHLOW plot to draw the Odds Ratio of the individual study observations.  The weight of the study is represented by the horizontal length of the box.  In the previous example, we did not display the weight of the study.  Computation of the weight is up to the user, and is usually based on the sample size of the study.

I also used a reference line to draw faint alternate bands to aid the eye across the width of the graph.  A Macro variable is used to set the thickness of this line.  This part can also be done the same way in the SAS 9.2 version.

SGPLOT code:

title "Impact of Treatment on Mortality by Study";
title2 h=8pt 'Odds Ratio and 95% CL';
 
/*--Create the plot--*/
proc sgplot data=forest noautolegend nocycleattrs;
  /*--Draw alternate reference line--*/
  refline studyref / lineattrs=(thickness=&thickness) transparency=0.85;
 
  /*--Display overall values (Study2) using scatter plot--*/
  scatter y=study2 x=oddsratio / markerattrs=(symbol=diamondfilled size=10);
 
  /*--Display individual values (Study) using highLow plot--*/
  highlow y=study low=lcl2 high=ucl2 / type=line;
  highlow y=study low=q1 high=q3 / type=bar;
 
  /*--Display statistics columns on X2 axis--*/
  scatter y=study x=or / markerchar=oddsratio x2axis;
  scatter y=study x=lcl / markerchar=lowercl x2axis;
  scatter y=study x=ucl / markerchar=uppercl x2axis;
  scatter y=study x=wt / markerchar=weight x2axis;
 
  /*--Draw other details in the graph--*/
  refline 1 100 / axis=x;
  refline 0.01 0.1 10 / axis=x lineattrs=(pattern=shortdash) transparency=0.5;
  inset '                   Favors Treatment'  / position=bottomleft;
  inset 'Favors Placebo'  / position=bottom;
 
  /*--Set X, X2 axis properties with fixed offsets--*/
  xaxis type=log offsetmin=0 offsetmax=0.35 min=0.01 max=100 minor display=(nolabel) ;
  x2axis offsetmin=0.7 display=(noticks nolabel);
 
  /*--Set Y axis properties (including reverse) using offsets computed earlier--*/
  yaxis display=(noticks nolabel) offsetmin=&pct offsetmax=&pct2 reverse;
run;

Full Code:  Full SAS 93 Code

The SGPLOT procedure is ideal for single-cell graphs.  To create the effect of a multi-cell graph,  I have used the axis splitting technique to create the appearance of a graph cell on the left and table cell on the right.  An actual multi-cell graph can be created using the LAYOUT LATTICE statement in GTL, and may really be a better way to do this.  I will post an example of that in a subsequent article.

 

Post a Comment

A (tool)tip for band plots

Recently, I had a discussion with a user concerning the volume of imagemap data generated for an interactive,  web-based visual contain a large number of graphs. The large amount of imagemap data was causing problems with the current version of their web browser. The graphs consisted of either bar charts or series plots with bandplots in the background representing different levels. The band levels were computed values, but they were constant across the X variable for each graph. The user noticed that the output contained a large amount of imagemap data for the bandplot, and it appeared to be duplicated information.

The key to understanding this result (and how to prevent it) is to understand the difference between specifying variables for upper and lower limits versus specifying constants. First, let’s create a simple dataset that simulates this situation:

proc summary data=sashelp.class nway;
  class age sex;
  var height;
  output out=meandata mean=;
run;
 
/* Add the "computed" band levels */
data meanheight;
  set meandata;
  short = 55;
  average = 65;
  tall = 72;
run;

The resulting dataset contains 11 observations and appears like the following table :

 Given that the limit variables are contained in the dataset, you would naturally use those variables in the BAND statement to draw the levels:

ods html file="average_height.html";
ods graphics / imagemap;
Title "Average Class Weight by Age";
proc sgplot data=meanheight;
band x=age upper=tall lower=average / name="tall" legendlabel="Tall"
     fillattrs=(color="light orange") transparency=0.5;
band x=age upper=average lower=short / name="average" legendlabel="Average"
     fillattrs=(color="light yellow") transparency=0.5;
band x=age upper=short lower=50 / name="short" legendlabel="Short"
     fillattrs=(color="light green") transparency=0.5;
series x=age y=height / group=sex markers
     lineattrs=(pattern=solid) name="series";
keylegend "short" "average" "tall" / position=bottomleft;
keylegend "series" / position=botomright title="Gender";
run;
ods html close;

The visual looks okay; but, if you look at the imagemap data, there were 10 records generated for each bandplot. The reason is that, when you use variables for the limits, the limit values can change across the X values. Therefore, records must be created for each value of X to guarantee that the correct value appears in the tip. The more X values in your data, the more imagemap records are generated for the bandplot.

However, when the limit values are really constant, you can avoid this situation by extracting the constant values from the dataset and setting them into macro variables:

data _null_;
  set meanheight;
  if (_n_ = 1) then do;
    call symput("TALL", tall);
    call symput("AVERAGE", average);
    call symput("SHORT", short);
  end;
run;

Now, the procedure can be modified to use the constants from the macro variables:

 
Title "Average Class Weight by Age";
proc sgplot data=meanheight;
band x=age upper=&TALL lower=&AVERAGE / name="tall" legendlabel="Tall"
     fillattrs=(color="light orange") transparency=0.5;
band x=age upper=&AVERAGE lower=&SHORT / name="average" legendlabel="Average"
     fillattrs=(color="light yellow") transparency=0.5;
band x=age upper=&SHORT lower=50 / name="short" legendlabel="Short"
     fillattrs=(color="light green") transparency=0.5;
series x=age y=height / group=sex markers 
     lineattrs=(pattern=solid) name="series";
keylegend "short" "average" "tall" / position=bottomleft;
keylegend "series" / position=botomright title="Gender";
run;

The visual looks the same; however, the imagemap for each band contains only 2 records.This is true regardless of the number of X values. For a web output containing a large number of these graphs, the savings in imagemap size can improve the performance of your visual in a web browser.

Post a Comment

Nested graphs

Here are a couple of bar charts showing the city mileage of cars by Type and Origin using the SGPLOT procedure from the sashelp.cars dataset.

title 'Vehicle Mileage by Type';
proc sgplot data=cars;
  format mpg_city 4.1;
  vbar type / response=mpg_city stat=mean datalabel;
  xaxis display=(nolabel);
  run;

title 'Counts by Country';
proc sgplot data=cars;
  vbar origin / datalabel;
  xaxis display=(nolabel);
  run;

Pretty simple and straigntforward so far.   But, now we want to do something more interesting.  In the first graph of Mileage by Type, there is a whole lot of empty space at the upper right.  How can I insert the second graph of the frequency counts into this space?  With GTL, it is relatively easy.  Here is the graph and the code.

proc template;
  define statgraph cars_inset_bar_bar;
    begingraph;
      entrytitle 'Vehicle Mileage by Type';
      layout overlay / yaxisopts=(griddisplay=on) xaxisopts=(display=(ticks tickvalues));
	barchart x=type y=mpg_city / stat=mean skin=modern barlabel=true;
	layout overlay / width=25pct height=30pct halign=right valign=top
                         walldisplay=(outline) opaque=false
                         yaxisopts=(display=none) xaxisopts=(display=(ticks tickvalues));
          barchart x=origin  / fillattrs=graphdata1 barlabel=true skin=modern;
	  entry halign=center "Counts by Country" / valign=top location=outside;
	endlayout;
      endlayout;
    endgraph;
  end;
run;
 
/*--Nested graph--*/
proc sgrender data=cars template=cars_inset_bar_bar;
format mpg_city 4.1;
run;

Here is the basic idea:

  • I created a template using a LAYOUT OVERLAY, and placed the first BARCHART in it with the appropriate options.
  • Within this layout, I inserted another LAYOUT OVERLAY.  This layout is given a small width and height, and aligned to the top right.    I also set options to turn off the background and wall fill, and set the axis options to reduce clutter.
  • Within the second layout, I added the 2nd BARCHART with the appropriate options.
  • I added an entry with LOCATION=outside to simulate a title for the inset.

That is pretty much it.  We have a compact graph that utilizes the available space and provides more information about the data.  The visuals stay together when shared.

I did encounter one problem.  If the axes of the two plots are compatible, as we have here using two bar charts, all is well.  Same would be true if using other compatible plots such as scatter, series, histogram, etc.

However, if I try to insert a HISTOGRAM as an inset into the BARCHART, even though these are two different graphs,  an axis conflict is reported, and one of the graphs is not drawn.  This looks like a bug and we will address it in the next release.  But, what to do in the meanwhile?

Well, there is a "workaround".  Each layout has two sets of X and Y axes, that are independent of each other.  So,if I want to inset a histogram into the bar chart, I can use the X2 axis for the histogram, and then, turn off the main X2 axis display and turn on the X2axis DisplaySecondary.  Here is the graph and the code.

proc template;
  define statgraph cars_inset_bar_hist;
    begingraph;
      entrytitle 'Vehicle Mileage by Type';
      layout overlay / yaxisopts=(griddisplay=on) xaxisopts=(display=(ticks tickvalues));
	barchart x=type y=mpg_city / stat=mean skin=modern barlabel=true;
	layout overlay / width=25pct height=30pct halign=right valign=top
                         walldisplay=(outline) opaque=false
                         yaxisopts=(display=none)
                         x2axisopts=(display=none displaysecondary=(ticks tickvalues));
	  histogram mpg_city / xaxis=x2 fillattrs=graphdata1;
          entry halign=center "Distribution of Milage" / valign=top location=outside;
	endlayout;
      endlayout;
    endgraph;
  end;
run;
 
/*--Nested graph of different types--*/
proc sgrender data=cars template=cars_inset_bar_hist;
format mpg_city 4.1;
run;

You are not restricted to one nested inset, and can add more if it makes sense.  Here I have used SAS 9.2 code, but with SAS 9.3, you have even more options to use Pie Chart and other plot types as insets.

Full Program: Full SAS 92 Code

Post a Comment

Timeseries plots with regimes

Recently we discussed the features of the Shiller Graph, showing long term housing values in the USA.  To understand the features necesary in the SGPLOT procedure to create such graph easily, it was useful to see how far we can go using GTL as released with SAS 9.2(M3).

I got the data Shiller Housing index data over the web as a spread sheet, and read it into SAS and added some "forecast" observations at the end.  I created a data set of the historical events, and merged this with the housing data.  You can see all this in the full program code attached.

GTL supports the BLOCKPLOT statement that is designed for such a use case.  The graph and GTL code are shown below.  Click on the graph to see it in full size.

proc template;
  define statgraph housing1;
    begingraph;
      entrytitle "Housing Price Trends in USA";
      layout overlay / walldisplay=(fill) pad=(top=20)
                       xaxisopts=(display=(ticks tickvalues line) griddisplay=off offsetmax=0.02
                          linearopts=(tickvaluesequence=(start=1890 end=2010 increment=10)))
                       y2axisopts=(display=(ticks tickvalues) displaysecondary=(ticks tickvalues)
                          griddisplay=on offsetmin=0.05 offsetmax=0.05
                          linearopts=(tickvaluesequence=(start=60 end=200 increment=10) thresholdmax=1));
        blockplot x=date2 block=event / fillattrs=(color=lightgray) datatransparency=0.5
                  fillattrs=(color=white) altfillattrs=(color=lightgray) filltype=alternate
                  display=(fill values) valuehalign=center valuevalign=top;
	seriesplot x=date y=index / yaxis=y2 group=group lineattrs=(thickness=3) name='p'
                   includemissinggroup=false;
	discretelegend 'p' / location=inside halign=left valign=bottom across=1;
      endlayout;
    endgraph;
  end;
run;
 
ods listing;
ods graphics / reset width=10in height=6in imagename='Housing_1' antialiasmax=1000;
proc sgrender data=merged template=housing1;
run;

The BLOCKPLOT statement does most of what we want.  The statement has the following syntax:

BLOCKPLOT X=var BLOCK=var / <options>;

In our usage, "X" should be assigned the date variable (sorted), and "BLOCK" should be assigned the Event variable.   A "block" is formed from consecutive values of the event variable.  Each block is drawn with the applicable display attributes, including values.  This statement used with the SERIESPLOT statement is essentially what is needed to create this basic graph.  Note, the Date variable may be the same for both timeseries and event data, or separate variables, with similar data.

Now for the details.  The block value is displayed in each block, as requested above.  If the value text fits, all is well.  If not, it is truncated, as can be seen for some values in the graph.  There is no option available to control this behavior.  We will get back to this issue later.

To overcome this shortcoming, we have to resort to other means to draw the block labels.  We use the SCATTERPLOT statement with the MarkerCharacter option using a separate column added to the Historical Data called Label.  Now, we turned off the BLOCKPLOT labels, and added the SCATTERPLOT.  Graph and code are shown below.

proc template;
  define statgraph housing2;
    begingraph;
      entrytitle "Housing Price Trends in USA";
      layout overlay / walldisplay=(fill) pad=(top=20)
                       xaxisopts=(display=(ticks tickvalues line) griddisplay=off offsetmax=0.02
                           linearopts=(tickvaluesequence=(start=1890 end=2010 increment=10)))
                       y2axisopts=(display=(ticks tickvalues) displaysecondary=(ticks tickvalues)
                          griddisplay=on offsetmin=0.05 offsetmax=0.05
                          linearopts=(tickvaluesequence=(start=60 end=200 increment=10) thresholdmax=1));
        blockplot x=date2 block=event / fillattrs=(color=lightgray) datatransparency=0.5
                  fillattrs=(color=white) altfillattrs=(color=lightgray) filltype=alternate
                  display=(fill) valuehalign=center valuevalign=top;
        scatterplot x=date2 y=ylabel / markercharacter=label yaxis=y2;
	seriesplot x=date y=index / yaxis=y2 group=group lineattrs=(thickness=3) name='p'
                   includemissinggroup=false;
	discretelegend 'p' / location=inside halign=left valign=bottom across=1;
      endlayout;
    endgraph;
  end;
run;
 
ods listing;
ods graphics / reset width=10in height=6in imagename='Housing_2' antialiasmax=1000;
proc sgrender data=merged template=housing2;
run;

Click on the graph to see that now the labels for each regime is shown in its entirety, and the label text flows beyond the block if necessary.

To make the event labels more readable, we use the MarkerCharacterAttrs option, setting size=10 and weight=bold.  Now the labels are bigger, and easier to read, but they run into each other, especially the 70's Boom and 80's Boom.  To fix this, we can split the labels, and use two scatter plots to create this graph.  Graph and code included below.

proc template;
  define statgraph housing3;
    begingraph;
      entrytitle "Housing Price Trends in USA";
      layout overlay / walldisplay=(fill) pad=(top=20)
                       xaxisopts=(display=(ticks tickvalues line) griddisplay=off offsetmax=0.02
                          linearopts=(tickvaluesequence=(start=1890 end=2010 increment=10)))
                       y2axisopts=(display=(ticks tickvalues) displaysecondary=(ticks tickvalues)
                          griddisplay=on offsetmin=0.05 offsetmax=0.05
                          linearopts=(tickvaluesequence=(start=60 end=200 increment=10) thresholdmax=1));
        blockplot x=date2 block=event / fillattrs=(color=lightgray) datatransparency=0.5
                  fillattrs=(color=white) altfillattrs=(color=lightgray) filltype=alternate
                  display=(fill) valuehalign=center valuevalign=top;
	scatterplot x=date2 y=ylabel / markercharacter=label1 markercharacterattrs=(size=10 weight=bold) yaxis=y2;
	scatterplot x=date2 y=eval(ylabel-5) / markercharacter=label2 markercharacterattrs=(size=10 weight=bold) yaxis=y2;
	seriesplot x=date y=index / yaxis=y2 group=group lineattrs=(thickness=3) name='p'
                   includemissinggroup=false;
	discretelegend 'p' / location=inside halign=left valign=bottom across=1;
     endlayout;
    endgraph;
  end;
run;
 
ods listing;
ods graphics / reset width=10in height=6in imagename='Housing_3' antialiasmax=1000;
proc sgrender data=merged2 template=housing3;
run;

This exercise showed us that while a graph like this can be created using some custom coding, the program does not scale well to all situations.  To make this easy to use, we need some enhancements to the BLOCKPLOT statement.

One feature we need is a BlockValueFitPolicy=(None | Split | ...).  The "none" option will allow the block value text to flow across boundaries, and the "Split" feature will split the label into mulitple lines within the block width on split characters.  These features are planned for the V9.4 release.  Also, a BLOCK statement is planned for the SGPLOT procedure, so we can look forward to be able to create this graph with code like this (future):

proc sgplot data=merged;
  block x=date2 block=event / display=(fill value) valuefitpolicy=none;
  series x=date y=index / group=group;
  keylegend / location=inside position=bottomleft across=1;
run;

If this seems of interest to you, please feel free to provide your comments.

Full program code:  Full SAS 92 Code

Housing Data: housing_csv

Post a Comment

They go where you put them

An issue that SAS/GRAPH users have wrestled with in the past has been how to put tick marks at irregular intervals on their axes. In PROC GPLOT, if you specify irregular intervals using the ORDER option on the AXIS statement, the procedure’s axis kicks into a “discrete” mode, where the tick values are placed at equal distances despite the tick values. I have seen SUGI posters describe workarounds for this situation. Most techniques involve turning off the axis tick values, moving the axis origins to create space, and using annotate to draw the tick values in their correct numerical location.

With the ODS Graphics system, correct placement of tick marks based on value is an automatic behavior, even with irregular intervals. The following examples using PROC SGPLOT show you how you can control tick value placement with a combination of simple options.

First, it is important to understand that the data range of an axis is independent from the tick mark placement. In the example below, notice that the two plot points extending beyond the end tick marks. The true X-axis data range goes from 50.5 to 150, but “nice” tick values are automatically chosen within that range to display on the axis.

proc sgplot data=sashelp.class;
scatter x=weight y=height / group=sex datalabel=weight;
run;

Because the axis range and the axis ticks are independent, you can specify a list of tick values and have them placed correctly along the axis range.
There are four axis options in PROC SGPLOT and PROC SGPANEL that control axis range and tick placement: MIN, MAX, VALUES, and VALUESHINT. The MIN and MAX options set the minimum and maximum for the axis range while leaving the tick mark choice up to the system.

proc sgplot data=sashelp.class;
xaxis min=80 max=120;
scatter x=weight y=height / group=sex datalabel=weight;
run;

The VALUES option give you the ability to both set the range min/max, as well as the tick values that are displayed. The tick values can be specified as a combination of individual values and “M to N by INCREMENT” specifications.

proc sgplot data=sashelp.class;
xaxis values=(20 30 80 to 120 by 10 160);
scatter x=weight y=height / group=sex datalabel=weight;
run;

The VALUESHINT option changes the behavior of the VALUES option. When VALUESHINT is specified in conjunction with the VALUES option, the minimum and maximum values in the list no longer set the range for axis, and any of the specified tick values that fall within the axis range are drawn – the rest are ignored. This means that you can specify a large list of custom tick values while not impacting the true data range of the plot. If the data changes in subsequent runs, different parts of your tick list will be displayed.

proc sgplot data=sashelp.class;
xaxis values=(20 30 80 to 120 by 10 160) valueshint;
scatter x=weight y=height / group=sex datalabel=weight;
run;

Finally, It is important to note that the VALUES and VALUEHINT options affect only LINEAR axes, while the MIN/MAX options may be used on all axis types except DISCRETE. If you need to use the VALUES/VALUESHINT options in a logarithmic case, you can compute a separate column of log values and plot them on a LINEAR axis using these options.

Post a Comment