Ways to include textual data columns in graphs

Most simple graphs generally include graphical representation of data using various plot type such as bar charts, scatter plots, histograms, box plots step plots and more.  Both SG procedures and GTL provide many easy ways to create such graphs.

However, for many real world use cases, we need to display related textual data in the graph, usually aligned with one of the axes.  Over the past few years with SAS 9.2 we have done this using the SCATTER plot with the MARKERCHARACTER option.  This option displays the textual value from the associated column in the (x, y) location in place of a marker.  This text string is center-justified at the marker location.  While this works for many cases, sometimes we need finer control over the placement of the text.

Recently a SAS user posted just such a question on the SAS Communities page.  User wanted the text to be left justified, using SAS 9.3 and was looking for some help.  With SAS 9.2, one has to use the MARKERCHARACTER option and then use some coding tricks to position the strings just right.  I have discuss some ways earlier using Non-Breaking Spaces.

With SAS 9.3, there are some more options available to user, and I thought this would make for a good blog article.  Let us use the data in the article Forest Plot with SAS 9.3 to illustrate the possibilities.  We will use only the columns on the left and the hazard plot.  Note in this data set, the observations with ID=1 are subgroup headings and the observations with ID=2 are the values.  The intention is to display the subgroup headings with bolder fonts and the values with an indentation.

We can use a two cell lattice and populate the first cell with the first two columns from this data set.  The second cell will contain a scatter plot of the mean with low and high limits by study.  Here is the graph using the MarkerCharacter option.  Click on the graph for a high resolution graph.

Here is the code fragment for the first cell in the graph.  Please see the full code in the attached program file.

  layout overlay / walldisplay=none
               x2axisopts=(display=(tickvalues) offsetmin=0.3 offsetmax=0.3
               yaxisopts=(reverse=true display=none offsetmin=0);
    scatterplot y=obsid x=subgroup_lbl / markercharacter=subgroup xaxis=x2;
    scatterplot y=obsid x=count_lbl / markercharacter=countpct xaxis=x2;
  endlayout;

In the graph above, the textual data is positioned center-justified for each text string.  This is not what we want.  To address this, some suggestions were made in the blog posts referred above.  The solutions are not optimal, and require the use of non-proportional fonts.

The SCATTER plot also supports another way to display text data using the DATALABEL option.  With SAS 9.2, this data label is always displayed at the top right of the marker, and can be moved around by the system to avoid collisions with other labels or markers.  So, with SAS 9.2, it was not possible to use this feature to draw the text strings with deterministic results.  However, with SAS 9.3, a new DATALABELPOSITION option is added, allowing explicit positioning of the labels.  While the default position is AUTO, meaning the old collision avoidance behavior, you can also specify any compass position such as TOP, LEFT, etc.

In the graph below, we have used the DATALABEL option to display the text, using DATALABELPOSITION of RIGHT and CENTER.  RIGHT places the text to the right of the invisible marker, effectively making the text left justified.

Here is the SAS 9.3 code fragment for the first cell.

  layout overlay / walldisplay=none
                   x2axisopts=(display=(tickvalues) offsetmin=0.15 offsetmax=0.3
                   yaxisopts=(reverse=true display=none offsetmin=0);
    scatterplot y=obsid x=subgroup_lbl / datalabel=subgroup markerattrs=(size=0)
                datalabelposition=right xaxis=x2 discreteoffset=-0.25;
    scatterplot y=obsid x=count_lbl / datalabel=countpct markerattrs=(size=0)
                datalabelposition=center xaxis=x2;
  endlayout;

This is an improvement over the first graph, but the observations like "Overall", "Age", "Sex" are subgroups, and we want to display them in bold, and the observations like "<= 65 yr" etc. are values which we want to display indented over a bit.  How can we do that?

One way with SAS 9.3 GTL is to use the EVAL function to have a scatter plot display only the values with ID=1 with specified attributes, and another scatter plot display only the values with ID=2 using different attributes and offset.  Here is the result:

Note, in this case all the subgroup headings (ID=1) are displayed with a blue bold font of size 8 while the values are displayed with normal font with an offset using the DISCRETEOFFSET option.  Here is the SAS 9.3 GTL code fragment.

  layout overlay / walldisplay=none
               x2axisopts=(display=(tickvalues) offsetmin=0.15 offsetmax=0.3
               yaxisopts=(reverse=true display=none offsetmin=0);
    scatterplot y=eval(ifn(id=1, obsid, .)) x=subgroup_lbl / datalabel=subgroup
                markerattrs=(size=0) datalabelposition=right xaxis=x2
                discreteoffset=-0.25 datalabelattrs=(weight=bold size=8  color=blue);
    scatterplot y=eval(ifn(id=2, obsid, .)) x=subgroup_lbl / datalabel=subgroup
                markerattrs=(size=0) datalabelposition=right
                xaxis=x2 discreteoffset=-0.15 datalabelattrs=(weight=normal size=7);
    scatterplot y=obsid x=count_lbl / datalabel=countpct markerattrs=(size=0)
                datalabelposition=center xaxis=x2 datalabelattrs=(size=7);
  endlayout;

Note, now we are using two scatter plots to display the first column.  The first one displays only the observations with ID=1 (the subgroup headings) with a blue bold font.  The second one displays only the observations with ID=2 (the values) with a normal font with an offset.  Note the use of the Y=EVAL(IFN ( ) ) expressions.

With SAS 9.2 and SAS 9.3 we found ourselves doing this so often, that we decided we needed a special statement to make it easy to display such columns (or rows) of textual data axis aligned with the Y or X axis.  So, with SAS 9.4 is included a new plot statement AXISTABLE.  This statement makes it very easy to create such textual entries, and supports text attributes and indentation.  Here is the resulting graph using SAS 9.4 AXISTABLE:

Here is the SAS 9.4 code fragment to create the text data columns:

  layout overlay / yaxisopts=(reverse=true display=none offsetmin=0) walldisplay=none;
    innermargin / align=left opaque=false;
      axistable y=obsid value=subgroup / indentweight=indent textgroup=textid;
      axistable y=obsid value=countpct / labelattrs=(size=7);
    endinnermargin;
  endlayout;

Note the use of INDENTWEIGHT and TEXTGROUP for the first column.  These axis table statements are placed in the INNERMARGIN, which automatically computes the amount of space needed for each column.  With SAS 9.4, you can now have inner margins on left and right of the overlay container.

Full SAS code (SAS 9.3 and SAS 9.4):  DataLabels

 

Post a Comment

More symbols, you say?

Users have often expressed the need for more marker symbols.  ODS Graphics supports over  30 scalable marker symbols, both filled and empty.  As mentioned in an earlier article, with SAS 9.4, filled markers can now have outlines and fills, and can also have special effects.

Also with SAS 9.4, now you can specify group colors or markers that you want to use using the STYLEATTRS statement.  With SAS 9.4 M1, you can now define your own marker shapes, so you can have exactly the marker you need.  This can be done using two new statements in GTL and SG.  I will demonstrate the feature using SG syntax.

One new statement is SYMBOLCHAR.  With this statement you can define a symbol using any character from any font.  The other is SYMBOLIMAGE.  As expected, you can define a symbol using any image.  With these two statements you can literally have any symbol you need.

Here is an example of using the SYMBOLCHAR statement to define two new marker symbols from the ARIAL Unicode MS font.  I have use the Greek symbols for males and females.  Here is the graph using the SASHELP.CLASS data set.  Click on the graph for a high resolution version.

SAS 9.4 SGPLOT code:

Title 'Weight by Height by Gender for Class';
proc sgplot data= sashelp.class;
  symbolchar name=male char='2642'x / textattrs=(family='Arial Unicode MS' weight=bold);
  symbolchar name=female char='2640'x / textattrs=(family='Arial Unicode MS' weight=bold);
  styleattrs datasymbols=(male female);
  scatter y=weight x=height / group=sex name='a' markerattrs=(color=black size=30)
         dataskin=sheen;
  keylegend / location=inside position=topleft opaque;
  xaxis grid offsetmin=0.05 offsetmax=0.05;
  yaxis grid offsetmin=0.05 offsetmax=0.05;;
  run;

In this program, I have used the following features:

  1. I have used two SYMBOLCHAR statements to define two new symbols called 'male' and 'female'.
  2. I have used the new STYLEATTRS statement to use only these two symbols for groups.
  3. Now, it so happens the order of the data is as expected.  Else we can use the Discrete Attribute Map statement to associate the correct symbol to the correct group value.
  4. The symbols thus defined also respond to other marker attributes like color and size.  Here we have explicitly set the color to black.  Also, a faint drop shadow can be seen due to DATASKIN.

In the graph below, I have used the new SYMBOLIMAGE statement to define marker symbols from images.  This can be very useful, as you can use familiar images for groups, thus reducing the effort for decoding the graph.  This is very useful when using logos.  I did not use company logos to avoid any copyright issue but you can likely see the use case.   While I would recomment caution when using images from the web, you can certainly use images owned by your own company.

SAS 9.4 SGPLOT code:

Title 'Mileage by Horsepower for some Vehicles';
proc sgplot data=cars;
  symbolimage name=sedan image='C:\Sedan_Trans.png';
  symbolimage name=sports image='C:\Sports_Trans.png';
  symbolimage name=suv image='C:\SUV_Trans.png';
  symbolimage name=truck image='C:\Truck_Trans.png';
  styleattrs datasymbols=(suv sedan sports  truck);
  scatter x=horsepower y=mpg_city / group=type markerattrs=(size=50);
  xaxis grid offsetmin=0.05 offsetmax=0.05;
  yaxis grid offsetmin=0.05 offsetmax=0.05;;
  run;

In this program, I have used the following features:

  1. I have used four SYMBOLIMAGE statements to define four different marker symbols named 'sedan', 'sports', 'suv' and 'truck', each from the associated image from the file system.
  2. Note, I have used "transparent" images where the pixels outside the actual image are transparent.  So, these icons only draw the needed pixels, not the background pixels that make up the rectangular bounding box.
  3. Once again, I have used the STYLEATTRS statement to define the list of marker symbols to be used for the groups.
  4. Here, I have skipped adding the legend.  Can you make out just by the icons which one is which type?

Finally, both SymbolChar and SymbolImage support a rotation angle.  In the example below, I have used that feature to use angular orientation for classification.  I originally saw this idea in a paper by Dr. Healey from NC State University, where he proposed that symbol orientation can be a "pre-attentive" feature for classification.

Here, I used the '25AC'x  Unicode character, which is the simple horizontal bar.  I defined a symbol using this character with rotation angle of zero, and one with angle of 90.  Here is the graph:

SAS 9.4 SGPLOT code:

Title 'Weight by Height by Gender for Class';
proc sgplot data= sashelp.class;
  symbolchar name=male char='25AC'x / textattrs=(family='Arial Unicode MS' weight=bold)
             rotate=0;
  symbolchar name=female char='25AC'x / textattrs=(family='Arial Unicode MS'
            weight=bold) rotate=90;
  styleattrs datasymbols=(male female);
  scatter y=weight x=height / group=sex name='a' markerattrs=(color=black size=30)
          dataskin=sheen;
  keylegend / location=inside position=topleft opaque;
  xaxis grid offsetmin=0.05 offsetmax=0.05;
  yaxis grid offsetmin=0.05 offsetmax=0.05;;
  run;

Full SAS 9.4 code:  Symbols

Post a Comment

Two-in-one Graphs

A large variety of graphs fall in the category of what I call a "Single-Cell" graph.  This type of graph consists of a single data region along with titles, footnotes, legends and other ancillary objects.  Legends and text entries can be included in the data area.   The data itself is displayed in a single rectangular region bounded by a set of X and Y axes.  Here is an example of such a graph popular in the Clinical Research domain.  Click on the graph for a higher resolution view:

SGPLOT code:

Title 'Most Frequent On Therapy Adverse Events';
proc sgplot data= MostFrequentAESort nocycleattrs;
  scatter y=ae x=a / name='a' legendlabel='Drug A (N=216)' markerattrs=graphdata1;
  scatter y=ae x=b / name='b' legendlabel='Drug B (N=431)' markerattrs=graphdata2;
  keylegend 'a' 'b';
  yaxis display=(nolabel noticks) valueattrs=(size=7) fitpolicy=none
        colorbands=odd colorbandsattrs=(transparency=0.5);
  xaxis  label='Percent' labelpos=datacenter grid;
  run;

The SGPLOT procedure is ideally suited to create such graphs.  But in many cases it is useful or necessary to add a second display of data in the same graph.  In this case, we would like to add a display of the relative risk values to the graph, and sort the data by descending risk.  Such a graph can be created using a two cell graph using GTL as described in Most Frequent Adverse Events sorted by Relative Risk.  This graph uses the LAYOUT LATTICE to create a two cell graph, and then each cell is populated with the appropriate plots.

However, it is also possible to create what appears to be a 2-cell graph using SGPLOT using the "axis-splitting" technique I have described earlier.  The graph still has only one cell.  But, each cell can have up to four independent axes (X, Y, X2 and Y2).  Each plot statement placed in the cell can be associated with one pair of x and y axes.  Also, each axis can be restricted to use portion of the graph width or height by using the axis offsets.  We can use a combination of these to create two independent graphs in the same cell.

Here is such a graph created using SAS 9.2 SGPLOT procedure.  This graph appears to have two cells, one for the percent values on left and one for the relative risk on the right.  But, it is really only one cell, divided into two parts as described below.

SAS 9.2 SGPLOT Code:

Title 'Most Frequent On Therapy Adverse Events Sorted By Relative Risk';
proc sgplot data= MostFrequentAESort nocycleattrs;
  refline refae / lineattrs=(thickness=12) transparency=0.8;
  scatter y=ae x=a / name='a' legendlabel='Drug A (N=216)' markerattrs=graphdata1;
  scatter y=ae x=b / name='b' legendlabel='Drug B (N=431)' markerattrs=graphdata2;
  scatter y=ae x=mean / xerrorlower=low xerrorupper=high  x2axis
          markerattrs=graphdatadefault(symbol=x) errorbarattrs=(pattern=solid);
  refline 40 / axis=x;
  keylegend 'a' 'b';
  yaxis display=(nolabel noticks);
  xaxis offsetmax=0.5 grid labelattrs=(size=8)
        label='Percent                                                          ';
  x2axis offsetmin=0.5 type=log logbase=2 logstyle=logexpand grid max=64
         labelattrs=(size=8)
         label='                                       Relative Risk with 95% CL';
  run;

Note the following features of this graph and code:

  1. We have used two scatter plots to draw the percent occurrences for each adverse event.  These two scatter plots are associated with the default X and Y axes.
  2. We have set the X axis OffsetMax to 0.5, so the X axis data is drawn only in the left half of the cell region.
  3. We have used another scatter plot with error bars to display the relative risk values.  This scatter plot is associated with the X2 and Y axis.  The Y axis is common between the two, so the data are correctly aligned vertically.
  4. We have set the X2 axis OffsetMin to 0.5, so the X2 axis data is drawn only in the right half of the cell region.
  5. The X2 axis type is set to log.
  6. A vertical reference line is drawn at the max value of X which appears  like a cell divider.
  7. Alternate wide reference lines are used to simulate alternate horizontal bands.
  8. Note, the X and X2 axis labels will still try to draw in the center of the full axis.  That is why we have to use extra trailing or leading blanks for the labels.  This issue is addressed in the SAS 9.4 version shown below.
  9. Y axis tick values will be suppressed if they run into each other vertically.  So, we have to reduce the font to make sure they fit in a derived style.

Some of the issues mentioned above are addressed in in SAS 9.3 version as shown below.  Here, the Y axis tick value fit policy is set to none, thus allowing the values to crowd together if needed.

SAS9.3 Graph:

SAS 9.3 SGPLOT code: 

Title 'Most Frequent On Therapy Adverse Events Sorted By Relative Risk';
proc sgplot data= MostFrequentAESort nocycleattrs;
  refline refae / lineattrs=(thickness=12) transparency=0.8;
  scatter y=ae x=a / name='a' legendlabel='Drug A (N=216)' markerattrs=graphdata1;
  scatter y=ae x=b / name='b' legendlabel='Drug B (N=431)' markerattrs=graphdata2;
  highlow y=ae low=low high=high / x2axis;
  scatter y=ae x=mean / x2axis markerattrs=(symbol=x);
  refline 40 / axis=x;
  keylegend 'a' 'b';
  yaxis display=(nolabel noticks) valueattrs=(size=7) fitpolicy=none;
  xaxis offsetmax=0.5 grid labelattrs=(size=8) valueattrs=(size=7)
        label='Percent                                                          ';
  x2axis offsetmin=0.5 type=log logbase=2 logstyle=logexpand grid max=64
         labelattrs=(size=8) valueattrs=(size=7)
         label='                                       Relative Risk with 95% CL';
  run;

With SAS 9.3, the X and Y axis tick value size can be set in the syntax, and also the Y axis fit policy is set to none.  Now, tick values can get closer to each other without being suppressed.  Note, the error bars are drawn using the highlow plot to avoid the bar caps.

With SAS 9.4, additional features are available to improve this graph.

SAS9.4 Graph:

SAS 9.4 SGPLOT code:

Title 'Most Frequent On Therapy Adverse Events Sorted By Relative Risk';
proc sgplot data= MostFrequentAESort nocycleattrs;
  scatter y=ae x=a / name='a' legendlabel='Drug A (N=216)' markerattrs=graphdata1;
  scatter y=ae x=b / name='b' legendlabel='Drug B (N=431)' markerattrs=graphdata2;
  scatter y=ae x=mean / xerrorlower=low xerrorupper=high x2axis
          markerattrs=(symbol=x) noerrorcaps;
  refline 40 / axis=x;
  keylegend 'a' 'b';
  yaxis display=(nolabel noticks) valueattrs=(size=7) fitpolicy=none
        colorbands=odd colorbandsattrs=(transparency=0.5);
  xaxis offsetmax=0.5 grid labelattrs=(size=8) valueattrs=(size=7)
        label='Percent' labelpos=datacenter minor;
  x2axis offsetmin=0.5 type=log logbase=2 logstyle=logexpand grid max=64
         labelattrs=(size=8) valueattrs=(size=7)  labelpos=datacenter
         label='Relative Risk with 95% CL';
  run;

Now, the label position for X and X2 axis are set as "DataCenter".  This means the label is automatically drawn in the data space of the axis, not the center of the whole axis width.  Now, we no longer have to pad the axis label with leading or trailing blanks.

We have gone back to using the scatter plot for the relative risk bars and used the NOERRORCAPS option.  Also, we have used COLORBANDS option on the Y axis to create the alternate horizontal bands.  We no longer have to add a reference line to do this which requires guessing at the width of the line.

SAS 9.4 also allows wrapping of the axis labels as shown here:

Note the X2 axis label is fully spelled out, and has wrapped within the data space for the axis.  This will work for all axes.  Minor ticks are also displayed for the x axis.

Full SAS program.  Note, some program will need SAS 9.4 features:  Two-In-One

Post a Comment

Data-driven Layouts in R's ggplot2 and ODS Graphics

Following Sanjay's cue (see “R U Graphing with SAS”), I tried creating data driven multi-cell graphs using R. I played with the lattice and ggplot2 packages. I found ggplot2 simpler to understand and use than lattice, but there are probably some trades offs.

Data-driven layouts are referred to as 'faceting' in ggplot2. This package provides two faceting operators: facet_wrap and facet_grid, which roughly correspond to ODS Graphics' data panel and data lattice layouts, respectively.

Let us take the simple use-case of a graph with a categorical variable as the classifier. Here is a ggplot2 output of city MPG for SUVs classified by Origin using an sashelp.cars data set. (I imported the data in R from a CSV file exportedby SAS.)

The code for the above graph, excluding the data import and subset , is shown below:

ggplot(data=sashelp.cars.suv) +
   geom_boxplot(aes(Make, MPG_City)) +
   facet_grid( ~ Origin, scales="free", space="free_x", shrink=T) +
   theme(axis.text.x=element_text(angle=-45, hjust=0)) +
   ggtitle("City MPG of SUVs by Origin ")

For comparision, here is the output from the SGPANEL procedure:

And here is the corresponding SAS 9.4 PROC SGPANEL code:

proc sgpanel data=sashelp.cars (where=(type='SUV'));
title "City MPG of SUVs by Origin";
  panelBy origin / rows=1 uniscale=row proportional;
  vbox mpg_city / category=make;
run;

As you can see there is a fair bit of correspondence between the two examples. ggplot2's space="free_x" gives you proportional width cells, just like SGPANEL panelBy's proportional option. SGPANEL manages tick value collisions with its tick value fit policies, whereas in ggplot2, I had to make some adjustments to the X axis tick values via the theme() to keep them legible.

How about paneling by a numeric variable? ggpplot2 has two functions to ‘cut’ your numeric ranges into class values (or factors as they call them). cut_interval() categorizes a numeric variable into equal sized ranges, whereas cut_number() does it by equal observation counts.

Here is a ggplot2 facet_grid output after ‘cutting’ MSRP into four class values using cut_number():

The code snippet for this graph is as follows:

# Convert MSRP=$n,nnn to numeric MSRP2.
sashelp.cars.suv$MSRP2 <- as.numeric(gsub('[\\$,]','', sashelp.cars.suv$MSRP))
# Scale MSRP2 by 1000 to keep the headers legible,
sashelp.cars.suv$MSRP2 <- sashelp.cars.suv$MSRP2/1000;
# Convert numeric MSRP2 to 4 class values.
sashelp.cars.suv$cut <- cut_number(sashelp.cars.suv$MSRP2, n=4)
#
ggplot(data=sashelp.cars.suv) +
  geom_boxplot(aes(Origin, MPG_City)) +
  facet_grid( ~ cut, scales="free_x", space="free_x") +
  ggtitle("City MPG of SUVs by MSRP(x $1,000) intervals")

In ODS Graphics, we do not have predefined ways to convert a numeric variable into a classifier. You need some data processing to get there. I used a quick and dirty version of equal count for illustrative purposes.

Here is an SGPANEL output using 4 classes of MSRP using equal observation counts:

Here is the SAS 9.4 SGPANEL code snippet for the above output:

/* Data processing for equal count not shown */
proc sgpanel data=interval_bins;
title "City MPG of SUVs by MSRP intervals";
  panelBy binlabel / onePanel rows=1 uniscale=row noVarName proportional;
  vbox mpg_city / category=origin;
run;

[Full SAS program]

For a better numeric variable ‘slicing’ treatment using SAS, please see Kincaid and Fuller’s SAS Global Forum 2012 paper “SG Techniques: Telling the Story Even Better!”.

In conclusion, the capabilities for data driven layouts from ggplot2 package are fairly well covered in ODS Graphics, although there are differences between the two systems.

Post a Comment

Plot Layering for Bland-Altman Graph

Recently a user new to GTL and SG procedures asked how to create a Bland-Altman graph on the SAS Communities site.  He included an image of the resulting graph to indicate what he wanted,  I described to him how that graph can be created, but since he is new to the art of creating graphs with SG procedures, I decided to send him sample code.

On building the graph, it became apparent that this graph could be a good example of how to use the layering capabilities of the SGPLOT procedure (and GTL) to create a graph that is made up of multiple separate components.  This also shows how to create the single data set needed to achieve this result.

This graph is similar in construction to the Clarke Error Grid, where there are certain zones in the graph depicted by boundaries with labels along with actual data points.

To make this graph, we start with the data and program needed to draw the different regions in the graph.  Here is the data set needed to draw the bands, the graph and the code:

This 'Bands' dataset defines two bands with Ids of A and B.  The bands are defined by a set of observations with three values (Xb, Lower, Upper).  The bi-linear bands  look like this.  Click on the graph for a higher resolution image:

SGPLOT code :

proc sgplot data=bands;
  format Limits $name.;
  title 'Blood Glucose Results';
  band x=xb lower=lower upper=upper / group=Limits outline nofill name='Band';
  refline 0;
  xaxis grid label='YSI Plasma Result (mg/dL)';
  yaxis grid values=(-120 to 120 by 20) label='Bias from YSI (mg/dL)';
  run;

Note the following features of the program:

  1. We have used a GROUPED BAND plot to draw the two bands.
  2. We need data columns for Band ID, X, Upper and Lower roles.
  3. We have used a format to name each band, and included the names in a legend.
  4. We added a reference line at Y=0.
  5. We set the extents and tick values for the Y axis, and enabled the grid lines.

In the graph above, we have displayed the band names in the legend below the plot.  However, in the example sent to me by user, each band was directly labeled and no legend was provided.  To do this, we have to add a layer on top of the band plot to display the label for each band.  To place the labels in the middle of the graph, we add two data points (xl, yl) and a label.

We create the data set 'Labels' with two observations and three columns, (xl, yl, Label)  and merge it with the 'Bands' data set.  Since there is no overlap with any column names, a simple merge works just fine.

Now, we layer a Scatter plot with MarkerChar option on top of the Band plot to display these two labels using the three new columns.

SGPLOT code:

proc sgplot data=plot noautolegend;
  title 'Blood Glucose Results';
  band x=xb lower=lower upper=upper / group=Limits outline nofill;
  scatter x=xl y=yl / markerchar=label;
  refline 0;
  xaxis grid offsetmin=0 offsetmax=0  label='YSI Plasma Result (mg/dL)';
  yaxis grid values=(-120 to 120 by 20) label='Bias from YSI (mg/dL)';
  run;

Note the Scatter plot with the MarkerChar option added after the Band statement.  This displays the band labels at the specified position.  Also note the addition of the OffsetMin and OffsetMax options to the XAXIS statement.

Lastly, we layer the actual data points obtained from the study.  For that, I have simulated a few random data points (x, y) in the expected data range in a data set called  'Points'.   Then, we merged this data set with the Bands and Labels data set:

Here is the final graph:

SGPLOT code:

proc sgplot data=plot noautolegend;
  title 'Blood Glucose Results';
  band x=xb lower=lower upper=upper / group=Limits outline nofill;
  scatter x=x y=y;
  scatter x=xl y=yl / markerchar=label;
  refline 0;
  xaxis grid offsetmin=0 offsetmax=0  label='YSI Plasma Result (mg/dL)';
  yaxis grid values=(-120 to 120 by 20) label='Bias from YSI (mg/dL)';
  run;

Full SAS 9.3 Code: Bland_Altman

Post a Comment

Broken Axis

In my previous post I described the new Polygon plot statement that is included with the SAS 9.4M1 release.   So, a valid question is - what is my motivation for discussing the new features in SAS 9.4M1 when most users are at SAS 9.3 or SAS 9.2 versions?  Here are a few reasons:

  • Some of you will get the new release early and this information may be valuable to you.
  • It is an indicator of ongoing work to enhance the graphics features in SAS.
  • This will create a repository of examples you can access as you start using the new release.

In this article, I want to address a new feature added at your request -  "Broken Axis".   Here are two graphs showing the same data without and with broken axis.

SAS 9.4M1 GTL code:

proc template;
  define statgraph BrokenAxis;
    begingraph;
      entrytitle 'Bar Chart with Broken Y axis';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
                       yaxisopts=(display=(ticks tickvalues) griddisplay=on
                                  linearopts=(includeranges=(0-30 195-220)));
        barchart category=x response=y / dataskin=gloss;
      endlayout;
    endgraph;
  end;
run;

Note the new option INCLUDERANGES in the LINEAROPTS bundle where you can provide the ranges that are to be included on the axis. You can have more than two ranges. Only the data ranges provided in the ranges are retained. The range intervals are used to proportion the segments.

I had addressed other possible solutions to such use cases in a previous article on Broken Axes using techniques available in SAS 9.2 and SAS 9.3.

Broken axis can be specified for any one axis (X or Y or X2 or Y2) at a time.  Linear and Time axis are supported.  There are a few other restrictions.  The intention is to put the basic feature out there and then see if further action is needed based on your feedback.

Here is an example of a broken X axis with two time ranges.

Full SAS 9.4M1 Program:  BrokenAxis

 

Post a Comment

New Polygon Plot

The SAS 9.4 Maintenance 1 release is now shipping to users. This is great news for GTL and SG procedures users as this release includes some useful new features. Some of these are in direct response to your requests, and others are enhancements that we think you will come to like.

One new feature that falls in the second category is the new POLYGON plot. This is a "hybrid" plot that has the properties of a plot statement and Annotate. As a plot statement, it can be inserted anywhere in the sequence of plot statements as you are familiar with.  So the graphics rendered by this plot will interleave between other plots. However, as the name of the plot suggests, you can draw simple (or complex) polygons defined by you. This will really open up the variety of graphs you can create. We will investigate many such plots in the next few blog article.  If you like annotate, you will love this plot.

Simply put, you can draw polygons defined in either interval or categorical data space. The polygon is defined just like you would a SERIES plot. Each polygon has an ID, an X and a Y variable.  X and Y can be numeric or discrete.  The data set shown below defines two polygons with ids of 'X' and 'Y'.  'X' has a hole.

Here is the resulting graph.  Note, the X variable is Name and is discrete and the Y variable is Y and is numeric.  Data values that are discrete can have discrete offset for each obs, both in X and/or Y.  This is like DiscreteOffset, but for each individual vertex.   Note the lower vertex of the pink polygon is half way between B and C on the X axis.  This can be a very useful feature.  Click on the graph to see a higher resolution image.

Here is the SAS 9.4M1 code:

proc sgplot data=poly;
  polygon id=id x=name y=y / xoffset=offset group=id fill outline
          dataSkin=matte fillattrs=(transparency=0.5);
  keylegend  / location=inside position=topleft;
  run;

As you probably realize, this plot statement can be used in many, many creative ways to build your graph.  In this article, we will examine how you can build an Area Bar Chart, where the X and Y axis are both numeric, and each bar width is proportional to a response variable.  Of course, we have to do a little bit of work to create the polygons as needed.

Here is some data on Revenues and Profits by Product.  This is just made up data.  Here is what the data set looks like:

 From this data, we build a polygonal bar for each observation.  The width of each bar is the Revenue for the observation, and the bar is placed to the right of the previous one on the X axis.  The height of each bar is the Profit on the Y axis.  See code in program to build this data set.  Once done for all obs, the X dimension will represent the sum of all the revenue values.

Here is the graph created using this data and the Polygon plot using the SGPLOT procedure.  Click on the graph for a higher resolution image.

SAS 9.4 M1 SGPLOT code:

proc sgplot data=areabar;
  title 'Revenue and Profit by Product';
  polygon id=id x=x y=y / fill outline;
  yaxis offsetmin=0 grid label='Profit';
  xaxis label='Revenue';
  run;

Polygons can have labels, and the labels can be drawn in many different locations, positions and orientation.  Here are some examples.  Code for all cases is shown in the attached file.

SAS 9.4 M1 SGPLOT code:

proc sgplot data=areabar;
  title 'Revenue and Profit by Product';
  polygon id=id x=x y=y / fill outline dataskin=gloss label=product
          labelpos=ymax rotatelabel=vertical;
  yaxis offsetmin=0 grid label='Profit';
  xaxis label='Revenue';
  run;

Here are some key features of this plot statement:

  • A polygon is defined just like a SERIES plot, with a sequence of observations having the same ID.
  • Each vertex can have numeric or discrete axis values.
  • Each discrete value for a vertex can have an X and/or Y discrete offset.
  • Polygons can have holes, indicated by missing X and Y values.
  • Polygons can have label which is displayed at the bounding box center of polygon by default.
  • Labels can be positioned inside the polygon, outside the polygon bounding box, or outside the axes.
  • Labels can be horizontal or vertical.
  • Each polygon can be rotated around its bounding box center.
  • For rotated polygon, label can only be at center, but is also rotated.

Here is the same graph grouped by the Product Group.  Note by default, the polygon labels use the contrast color attribute of the fill color.  But, you can override this to use a fixed color.  Legends are generated automatically as for other plots.  In the graph below, we are still using rotated labels at the top of each polygon.

Here the label is a compound string made up of the product and its revenue.  The label is shown at the top of the bar, using  the default Split character.

As you can see, the POLYGON plot is a very versatile statement that will allow you to really customize your graph.  Yes, you will need to generate the polygon vertices.  The overlay axes recognize the data extents, and will union them with values from other plot statements.  Legends are automatically generated.  The plot will support both discrete and range attribute maps.  The possibilities are endless.

Clearly, you can create maps using this statement.  I will describe that use cases in a subsequent article.  In the meantime, see if you can use the data from the MAPS library with this statement to create simple maps.

Full SAS 9.4 M1 Code:  Polygon

Post a Comment

Survey Charts

Often we have situations where the category values on the graph have long character strings.  This is often the case when graphing survey responses to questions.  The questions may be sentences, sometimes moderately long.

With SAS 9.4, GTL and SG now support the ability to display tick values split over multiple lines.  Here is an example of a simple survey dataset.

The bar chart made with this data using SAS 9.4 looks like this.  Click on the graph for a higher resolution image.

SAS 9.4 SGPLOT code

proc sgplot data=survey;
  title 'What value did you gain from this event?';
  vbar question / response=response nostatlabel dataskin=gloss datalabel;
  xaxis display=(nolabel) valueattrs=(size=7);
  yaxis display=(nolabel) valueattrs=(size=7) grid;
  run

For SAS 9.4 SGPLOT, we have changed the default tick fit policy for the x axis.  Now the new fit policy is SPLITROTATE.  The x axis will try to split the long tick values.  If they cannot fit in the available spacing, then the tick values will be drawn rotated like before.  This creates a much nicer graph that is readable.  Personally, I do not like the diagonal rotated tick values.

While this works by default for the x axis, for the y axis, there is based on the default width available to draw the tick values on the Y axis.  With GTL, splitting has to be specifically requested.  Then, GTL uses 40% of the graph width to split the tick values.  Here is the graph and the GTL code.

SAS 9.4 GTL Code:

proc template;
  define statgraph Survey_GTL;
    begingraph;
      entrytitle 'What value did you gain from this event?';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
             yaxisopts=(reverse=true display=(ticks tickvalues)
             discreteopts=(tickvaluefitpolicy=split));
        barchart category=question response=response / orient=horizontal
             fillattrs=graphdata2 dataskin=gloss barlabel=true;
      endlayout;
    endgraph;
  end;
run

Note, we have to specify the TickValueFitPolicy=fit when using GTL.  The y axis ticks are wrapped in 40% of the width of the graph.

Now, you may want to have the tick values wrap in a tighter zone, leaving more width for the data area.  In this case, you can provide the split characters explicitly in the strings.  Here we have used the '^' as the split character, inserted into the string where we want the split.

Now, we can use an explicit directive to the y axis to split the tick values on the provided split character.  Here is the graph and the GTL code.  Click on the graph for a high resolution image.

proc template;
  define statgraph Survey_GTL_2;
    begingraph;
      entrytitle 'What value did you gain from this event?';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
             yaxisopts=(reverse=true display=(ticks tickvalues)
             discreteopts=(tickvaluefitpolicy=splitalways
             tickvaluesplitchar='^'));
        barchart category=question response=response / orient=horizontal
             fillattrs=graphdata3 dataskin=gloss barlabel=true;
      endlayout;
    endgraph;
  end;
run;

Note the use of the TickValueSplitPolicy of SPLITALWAYS, and TickValueSplitChar='^'.  With these new policies and options, you can control the splitting of the categorical tick values.  Split characters are normally dropped where the string is split.  However, if needed, you can cause the split character to be retained in the displayed string.

Full SAS 9.4 program: Survey

 

Post a Comment

Grouped Bar Chart with StatisticsTable

Creating a Bar Chart with a table of statistical data aligned with the bars is a popular topic.  With SAS 9.4, creating such graphs gets easier with the new AXISTABLE statement in GTL and SG procedures.  But some use cases can flummox the latest gizmos.  Such is the case I ran into recently.

Here is a bar chart of mileage by origin for some cars in the sashelp.cars dataset.  Trucks and Hybrids are removed.  Creating a bar chart with a statistics table of the mean mileage below each bar is very easy.  Click on the graph for a higher resolution image.

Note the mean MPG statistics displayed under each bar.  The code needed for this using the new SAS 9.4 Axis table is as follows:

SAS 9.4 SGPLOT code:

proc sgplot data=sashelp.cars(where=(type not in ('Hybrid' 'Truck')));
  title 'Mileage by Origin';
  format mpg_city 4.1;
  vbar origin / response=mpg_city dataskin=gloss stat=mean;
  xaxistable mpg_city / label labelpos=left stat=mean location=inside;
  xaxis display=(nolabel);
  run;

However, if the bar chart is grouped by TYPE, then we can create a cluster grouped bar chart, and include the mileage statistics by type as shown below.

SAS 9.4 SGPLOT code:

proc sgplot data=carmeans;
  title 'Mileage by Origin and Type';
  vbarparm category=origin response=mpg / group=type dataskin=gloss;
  xaxistable mpg / class=type label labelpos=left location=inside;
  xaxis display=(nolabel);
  keylegend / location=inside position=topright;
  yaxis offsetmax=0.1;
  run;

In this case, the statistics values are displayed for each value of TYPE in a multi-row table.  This is how the CLASS variable is handled in the AXISTABLE statement.   But, what if I really want each statistics to be displayed by TYPE under each bar for the same type value.  So, I want a statistics table of one row, with all the values shown under the associated group value.  Here is what I want:

This, unfortunately, cannot be done using the AxisTable, which does not support the GROUP option.  Clearly, we have found a hole that we need to fill, and we will do that in an upcoming SAS release.  But, how did I make the above graph?  Well, the old fashioned way.

In the days before the arrival of AxisTable, we created such tables using the Scatter plot with the MarkerChar option.  Well, I have to go back to that method to do this graph, using the Y and Y2 axis along with the "axis-splitting" technique I spoke of earlier.  Here is the code to create this graph using SAS 9.3 code.

SAS 9.3 Code:

proc sgplot data=carmeans;
  title 'Mileage by Origin and Type';
  vbarparm category=origin response=mpg / dataskin=gloss group=type;
  scatter x=origin y=lb_mpg / markerchar=mpg group=type groupdisplay=cluster
          markercharattrs=(size=6 weight=bold) y2axis;
  xaxis display=(nolabel);
  yaxis grid offsetmin=0.1;
  y2axis offsetmax=0.95 display=(nolabel) valueattrs=(size=6);
  run;

In the program above, first we computed the mean statistics using the MEANS procedure.  Then, we have used a VBARPARM associated with the Y axis to plot the bar chart.  We used a scatter plot with cluster group on the Y2 axis, and adjusted the Y and Y2 axis settings to get this graph.  See attached program at bottom for the full code.  Note, the statistics are aligned with the bars, and are also colored by the group values.

We can extend this technique to add more rows for Horsepower, MSRP and any other statistics we want to display in the table as shown below.

SAS 9.3 Code:

proc sgplot data=carmeans;
  title 'Mileage by Origin and Type';
  vbarparm category=origin response=mpg / dataskin=gloss group=type;
  scatter x=origin y=lb_mpg / markerchar=mpg group=type groupdisplay=cluster
          markercharattrs=(size=6 weight=bold) y2axis;
  scatter x=origin y=lb_hp / markerchar=horsepower group=type groupdisplay=cluster
          markercharattrs=(size=6 weight=bold) y2axis;
  scatter x=origin y=lb_msrp / markerchar=msrp group=type groupdisplay=cluster
          markercharattrs=(size=6 weight=bold) y2axis;
  xaxis display=(nolabel);
  yaxis grid offsetmin=0.15;
  y2axis offsetmax=0.88 display=(nolabel) valueattrs=(size=6);
  run;

Note we have used a scatter plot statement for each statistic, one each for MPG, HP and MSRP.  We have used columns that contain the appropriate character labels for each row.  This gets us to the basic plot, but note the table row labels are to the right.  These are really the Y2 axis tick values.  How can we move them to the left?

We can use a trick to do that using the REFLINE statement.  This statement draws reference lines for each value specified (or from data).  But, each reference line can also have a label, either on left or right.  So, we can draw reference lines for the same values, suppress the line itself, and draw the labels to the left and suppress the Y2 axis tick values.  Here is the graph and the code:

SAS 9.3 Code:

proc sgplot data=carmeans;
  title 'Mileage by Origin and Type';
  vbarparm category=origin response=mpg / dataskin=gloss group=type;
  scatter x=origin y=lb_mpg / markerchar=mpg group=type groupdisplay=cluster
          markercharattrs=(size=6 weight=bold) y2axis;
  scatter x=origin y=lb_hp / markerchar=horsepower group=type groupdisplay=cluster
          markercharattrs=(size=6 weight=bold) y2axis;
  scatter x=origin y=lb_msrp / markerchar=msrp group=type groupdisplay=cluster
          markercharattrs=(size=6 weight=bold) y2axis;
  refline 'MPG' 'HP' 'MSRP' / axis=y2 lineattrs=(thickness=0) label
          labelpos=min labelattrs=(size=6);
  xaxis display=(nolabel);
  yaxis grid offsetmin=0.15;
  y2axis offsetmax=0.9 display=none valueattrs=(size=6);
  run;

Note the REFLINE statement, with line thickness of zero, and label position of MIN.  One last item is to see if we can reduce the required eye movement between the legend at the bottom and each color.  This part is optional.  I remove the legend, and draw the name of the car type directly on the bar itself as shown below:

SAS 9.3 Code: 

proc sgplot data=carmeans noautolegend;
  title 'Mileage by Origin and Type';
  vbarparm category=origin response=mpg / dataskin=gloss group=type;
  scatter x=origin y=lb_y / markerchar=type group=type groupdisplay=cluster
          markercharattrs=(size=6 color=black);
  scatter x=origin y=lb_mpg / markerchar=mpg group=type groupdisplay=cluster
          markercharattrs=(size=6 weight=bold) y2axis;
  scatter x=origin y=lb_hp / markerchar=horsepower group=type groupdisplay=cluster
          markercharattrs=(size=6 weight=bold) y2axis;
  scatter x=origin y=lb_msrp / markerchar=msrp group=type groupdisplay=cluster
          markercharattrs=(size=6 weight=bold) y2axis;
  refline 'MPG' 'HP' 'MSRP' / axis=y2 lineattrs=(thickness=0) label
          labelpos=min labelattrs=(size=6);
  xaxis display=(nolabel noticks);
  yaxis grid offsetmin=0.15;
  y2axis offsetmax=0.9 display=none valueattrs=(size=6);
  run;

Now, the legend is in the graph itself, and no need for a separate legend, requiring eye movement to decode the colors and recover space used up by the legend.  Clearly, this last part is an interesting exercise, and may or may not be suitable for all users.

Moral of the story is that often important missing features are discovered later in the game when new features are released.  Luckily, SG and GTL syntax is well structured and robust, so we can rectify the situation in the next release.

Full SAS program:  BarTable

Post a Comment

Graphs at WUSS - Part 2

Last week I covered some of the interesting graph-related papers presented at WUSS.  There were quite a few, so I broke up the report into two parts.  Here is the second installment.

In the paper  Creating Graphical Patient Profiles using SAS by William Garner of Gilead Sciences, the author describes how to create patient profiles for a specific application.  The graph contains a wealth of information, arranged by study day and dates.   The graph displays patient demographic information followed by a graph including drug administered, adverse events, lab results, vital statistics all in one graph.  I was personally gratified as William has used some of the techniques I have described in this blog on creating Patient Profiles and taken the next step to make it into a real world graph.  This graph is used in a Gilead Sciences study for antifungal therapy.  Click on the graph to see the high resolution graph.

Chuck Kincaid of Experis presented a hands-on workshop on Using SAS ODS Graphics that teaches the audience how to use SG procedures.  Chuck describes the features of the SGPLOT and SGPANEL procedures showing us how to create many graphs commonly used in the industry using these procedures.  Here is an example of a paneled graph from his presentation.

Leanne Goldstein and Rebecca Ottesen, both of City of Hope presented an excellent hands-on workshop titled Survival 101 - Just Learning to Survive.  This presentation was chock full of information on the techniques for analysis of survival data using the LIFETEST procedure and the Kaplan-Meier Methods.  Rebecca also demonstrated the use of the ODS Graphics Designer for building Survival Plots.  Here is one of the graphs.

I presented my paper from SAS Global Forum 2013 called Make a Good Graph.  This paper was a late entry into the presentation schedule, replacing a paper that was dropped late in the game.  So, the presentation was included in the "Tutorials" section, and I feared no one would show up.  I was relieved to see a decent sized audience.  The paper describes the science behind the standard industry practices for creation of effective graphs.  I presented many examples describing the preferred ways to represent classifications and facilitate accurate magnitude comparison.

The example above, shows how magnitude difference between lines are perceived by viewers.  The difference between two lines are often perceived as the nearest distance, and not the vertical distance, as is intended here.  To overcome this, differences should be plotted directly, and not left to the inference by viewer.  In the above graph, the actual vertical distance between the two plot lines is represented in the band plot at the bottom.  The drop in the difference between the two lines on the right side at the two reference lines is not very obvious in the line plots, but is more obvious in the band plot at the bottom.

It is indeed gratifying to see SAS users adopting SG Procedures and GTL to create graphs from the simple to the intricate, and presenting their findings and the techniques they have developed to create these graphs.

Post a Comment