Multi-Group Series Plots

The series plot is a popular way to visualize response data over a continuous axis like date with a group variable like treatment.   Here is some data I made up of a response value by date, treatment, classification and company that makes the drug.  The data is simulated as shown in the attached program (see bottom of article).

DataThe data includes the columns VALUE, DATE, DRUG, CLASS and COMPANY.  The columns LABEL and VALUEL are computed at every 5th observation per drug for labeling.

 

Series_94We can use the GTL SERIESPLOT to display Value by Date and Drug as shown on the right.  Click on the graph to see a higher resolution graph.  The drug name for each curve is displayed at the right end of the curve, and also in the legend below.  We could turn off the legend if needed.

If a GROUP variable is not provided, the entire data is plotted as one series.  When a GROUP variable is provided, the data is plotted as one curve for each group value.  Each curve gets the display attributes such as color and line pattern from one of the GraphData01 - 12 style elements, in the order the group values are encountered in the data.  Alternatively, one can also use a Discrete Attributes Map to assign specific color and line pattern values by group value.

Sample code shown here is for SAS 9.4.  While some new options may not work, the basic ideas discussed below also works at SAS 9.3 or earlier.

SAS 9.4 GTL code for series with group:

proc template;
  define statgraph Series;
    begingraph / subpixel=on;
      entrytitle 'Values by Date and Treatment';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
                       yaxisopts=(griddisplay=on);
	 seriesplot x=date y=value / group=drug name='a' lineattrs=(thickness=2) 
                    smoothconnect=true;
        discretelegend 'a' / title='Drug:';
      endlayout;
    endgraph;
  end;
run;
 
proc sgrender data=SeriesGroup template=Series;
run;

SeriesLabel_94Note, the curve labels drawn at the end can get cluttered, as happened above for groups B and C.  To improve this situation, we can label each curve along its length at frequent intervals.  We do this by using the columns LABEL and VALUEL, which have non missing values at every 5th observation per group.

We can use these columns to overlaying a scatter plot with the marker character option.  To reduce clutter of overlaid text, we add a white marker behind each letter.  We discussed such ideas in Labeled Curves.

SAS 9.4 GTL code for curves with inline labels.

proc template;
  define statgraph SeriesLabel;
    begingraph / subpixel=on;
      entrytitle 'Values by Date and Treatment';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
                       yaxisopts=(griddisplay=on);
	seriesplot x=date y=value / group=drug name='a' lineattrs=(thickness=2) 
                   smoothconnect=true;
	scatterplot x=date y=valueL / group=drug 
                     markerattrs=(symbol=circlefilled color=white size=10);
        scatterplot x=date y=valueL / group=drug markercharacter=label; 
        discretelegend 'a' / title='Drug:' itemsize=(linelength=15px) 
                    location=inside across=1 halign=right valign=top;
       endlayout;
    endgraph;
  end;
run;

In the program above, we have also used ATTRPRIORITY=COLOR on the ODS GRAPHICS statement to delay the use of patterns till after all colors are exhausted.  See attached full program.  This option makes all regular styles behave like the HTMLBLUE style.  Each group is rendered by a different color from the style, using a total of four colors.

SeriesLineColorGroup_94Now, we want to be able to group the curves for each drug by another grouping variable like the drug class.  I assigned two classes "NSAID" and "Opioid".  Since each curve is labeled by the name of the drug, we want to use the color to depict the class of the drug.  We can do this by using a secondary group role called LINECOLORGROUP.  The graph is shown on the right where each curve is now colored either blue or red based on the drug class.  The legend contains a color swatch with its value.

SAS 9.4 GTL code for line color by a group:

proc template;
  define statgraph SeriesLineColorGroup;
    begingraph / subpixel=on;
      entrytitle 'Values by Date, Treatment and Class';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
                       yaxisopts=(griddisplay=on);
        seriesplot x=date y=value / group=drug name='a' lineattrs=(thickness=2) 
                   smoothconnect=true linecolorgroup=class;
        scatterplot x=date y=valueL / group=drug 
                  markerattrs=(symbol=circlefilled color=white size=10);
        scatterplot x=date y=valueL / group=drug markercharacter=label
                  markercharacterattrs=graphdatatext; 
        discretelegend 'a' / title='Drug Class:' type=linecolor location=inside 
                  across=1 halign=right valign=top;
      endlayout;
    endgraph;
  end;
run;

Note the features of the graph above:

  • We have labeled each treatment curve by its own label, so no need for a legend for this case.
  • We have assigned the color for each curve by a secondary group variable CLASS.
  • We have used a Discrete Legend of TYPE=LINECOLOR.  This displays only color swatchs.
  • The only requirement here is that the GROUP variable must be the lowest grouping factor for each curve.  The LINECOLORGROUP value must remain the same for all obs with same GROUP value.

The good news here is that LINECOLORGROUP has been available in GTL SERIESPLOT all along since SAS 9.2.  It is used by the POWER procedures, but the feature was tested only for the POWER procedures' use cases.  Hence, we did not feel confident we could document this feature as ready for general use.  Now, after hearing multiple users express the need for such use cases, we felt it was necessary to release this as production.  Now this feature has been well tested, and no problems have been found.  So, we feel the risk-to-reward ratio is in favor of exposing this feature to you.

In addition to LINECOLORGROUP, you can also use LINEPATTERNGROUP, MARKERCOLORGROUP and MARKERSYMBOLGROUP.  Each one can be used with the group variable and this value should not change withing a GROUP value.

SeriesLineColorPatternGroup_94

In the graph on the right, I have used COMPANY as the LINEPATTERNGROUP.  Now, each drug is colored by its CLASS and patterned by the COMPANY.  I have also added a discrete legend of TYPE=LINEPATTERN.  Both these legends are wrapped inside a LAYOUT GRIDDED and placed at the top right of the cell.

SAS 9.4 GTL code for series with line color and line pattern groups:

proc template;
  define statgraph SeriesLineColorPatternGroup;
    begingraph / subpixel=on;
      entrytitle 'Values by Date, Treatment, Class and Company';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
	                         yaxisopts=(griddisplay=on);
        seriesplot x=date y=value / group=drug name='a' lineattrs=(thickness=2) 
                  smoothconnect=true linecolorgroup=class linepatterngroup=company;
	scatterplot x=date y=valueL / group=drug 
                  markerattrs=(symbol=circlefilled color=white size=10);
        scatterplot x=date y=valueL / group=drug markercharacter=label
                  markercharacterattrs=graphdatatext; 
        layout gridded / halign=right valign=top columns=2 columngutter=5;
          discretelegend 'a' / title='Drug Class' type=linecolor location=inside 
                        across=1 halign=right valign=top;
          discretelegend 'a' / title='Company' type=linepattern location=inside 
                        across=1 halign=right valign=top itemsize=(linelength=30);
	 endlayout;
      endlayout;
    endgraph;
  end;
run;

Note the features of the graph above:

  • We have labeled each treatment curve by its own label, so no need for a legend for this case.
  • We have assigned the color for each curve by a secondary group variable CLASS.
  • We have assigned the pattern for each curve by a secondary group variable COMPANY.
  • We have used a Discrete Legend of TYPE=LINECOLOR.  This displays only color swatches.
  • We have used a Discrete Legend of TYPE=LINEPATTERN.  This displays patterns without color.
  • The only requirement here is that the GROUP variable must be the lowest grouping factor for each curve.  The LINECOLORGROUP and LINEPATTERNGROUP variables must remain the same for all obs with same GROUP value.

While you can display many different classifications in the graph at the same time, the graph can become complex very quickly.  You can  turn on the display of the markers for the series plot, and then control the visual attribute of the markers using MARKERCOLORGROUP and MARKERSYMBOLGROUP.

In the process of making the graphs for this article I noticed the lack of a way to make the scatter markercharacter color by group, to match the color of the drug names to the line when using LINECOLORGROUP.  There is no matching MARKERCOLORGROUP in the SCATTERPLOT.  I will see what we can do about that.  Please chime in with your comments and observations.

I certainly look forward to see the ways in which you can leverage these features.

Full SAS 9.4 Code:  MultiGroup_94

Full SAS 9.3 Code:  MultiGroup_93

Post a Comment

Labeled curves

Often, the topic of an article is motivated by a question from a user.  A satisfactory resolution of the situation is usually a good indication of a topic that may be of interest to other users.  On such question was posed to me by a user this weekend.  He wanted to display fit curves in a graph by group with the curve labeled all along its length by a one letter identifier.

CurveRegThis seems like a useful way to label a curve, as sometimes placing a label at just the end of the curve can be less than optimal.  When using a simple series plot, this is straightforward, and a short, one or two character label can be placed at intervals along the series.

But, what if the curves are fit plots, say a PBSpline or a regression?  Now, the plotted data is not the same as the original data.   Here is an example of curves of Mileage by Horsepower by Type.  We have used DEGREE=3 just for illustration.  The legend provides the decoding information, but it is less than ideal to refer back and forth to the legend.  Click on the graph for a higher resolution image.

SAS 9.3 SGPLOT Code:

title 'Mileage by Horsepower and Type';
proc sgplot data=cars_type_label_hp;
  reg x=horsepower y=mpg_city / degree=3 nomarkers group=type name='s';
  keylegend 's' / title='';
  run;

CurveRegScatter
Now, we want to add a short code representing the vehicle type at equal intervals along the curves.  First, we extract the label to be displayed as a 2 character abbreviation of the type.  Now we use the Scatter plot with marker character option to display the short label at each observations:

SAS 9.3 SGPLOT Code:

title 'Mileage by Horsepower and Type';
proc sgplot data=cars_type_label_all_hp;
  reg x=horsepower y=mpg_city / degree=3  nomarkers group=type name='s';
  scatter x=horsepower y=mpg_city /  group=type markerchar=label;
  keylegend 's' / title='';
  run;

Clearly, this is not acceptable, as every original observation is labeled, creating a cloud of labels around each fit plot.  What we need are the observations that are used to draw the fit curves, and not the original observations used to create the fit curves.  Now, the points to draw the fit lines are internally generated by the procedure, and not directly available to us.  How to do this?

To do this, we have to use a two-pass process.  First, we run the SGPLOT procedure to draw the fit curves, and also request output of the generated data using the ODS OUTPUT data set as follows:

SAS 9.3 SGPLOT Code:

ods output sgplot=RegData;
title 'Mileage by Horsepower and Type';
proc sgplot data=cars_type_label_all_hp;
  reg x=horsepower y=mpg_city / degree=3 nomarkers group=type name='s';
  scatter x=horsepower y=mpg_city / group=type markerchar=label;
  keylegend 's' / title='';
  run;

Note the use of the statement ODS OUTPUT SGPLOT=RegData.  This statement outputs the generated data to the data set name RegData.  This data set has the generated fit data points in addition to the original data.  The variable names are often long and convoluted. This is so the new generated names do not collide with the original column names, and are known to the renderer which generated columns to use to plot the data.  Such as: "Regression_Horsepower_mpg_cit__x" and so on.  See the generated data set for the generated variable names and values.

CurveRegLabelFor ease of use, we rename these generated column names to something simple like X, Y and Group.  Now, we know the data used to plot the curves, so we can use this data to display the 2 character code along the curve.  We ensure the data is sorted by Group and Horsepower, and create a 2 character code for every 30th observation.  Then, we use the scatter plot with marker character to plot the labels.

SAS 9.3 SGPLOT Code:

title 'Mileage by Horsepower and Type';
proc sgplot data=RegCurves;
  reg x=x y=y / degree=3 nomarkers group=group name='s';
  scatter x=x y=y / group=group markerchar=label2;
  keylegend 's' / title='';
  run;

CurveRegLabel_2While we have achieved what the user wanted, but the overlaid curve labels look a bit cluttered.  One way to improve the appearance would be to add small scatter markers at each location, and draw the label inside it, as shown below.

Now each abbreviated label is clearly visible, and the graph does not look cluttered.

SAS 9.3 SGPLOT Code:

title 'Mileage by Horsepower and Type';
proc sgplot data=RegCurves;
  reg x=x y=y / degree=3 nomarkers group=group name='s';
  scatter x=x y=y2 / group=group markerattrs=(size=14 symbol=circlefilled)
                     filledoutlinedmarkers markerfillattrs=(color=white);
  scatter x=x y=y2 / group=group markerchar=label2 
                     markercharattrs=(size=5 weight=bold);
  keylegend 's' / title='';
  run;

Full SAS 9.3 SGPLOT code: CurveLabels

Post a Comment

The BLOCK Plot

When you hear of a Scatter Plot or a Series Plot, you have a picture in your mind what we are talking about.  But one of the plot statements available in GTL, and soon with SGPLOT, is the BLOCK plot.  I am sure this leaves many users scratching their heads, wondering what in heaven's name is a BLOCK plot?  So, in this article we will shed some light on this unique and useful plot.

BlockDataMiss The block plot is a one dimensional plot with the syntax as shown below:

BLOCKPLOT X=var BLOCK=var < / options>;

For the data set shown on the right, we will use X=DATE and BLOCK=WINDOWS.  The plot will create contiguous horizontal rectangular "blocks" along the x axis while the block variable value is the same.  So, in this case, a rectangular block will be created from '01Jun1990' to '01Sep1995' with the block value of "3.0".  Then, when the new block value of "95" is encountered, a new block will be created while the block value stays as "95" till '01Jul1998', when it changes to the new value.

Thus, the BlockPlot statement creates such horizontal blocks and displays them in the plot.  Now, for convenience, a missing value can be considered to be a continuation of the previous block value.  So, the column  "WINMISS" would have a similar result.  As you can imagine, the plot needs the X variable to be sorted.

BlockThe graph shown on the right is created using this data.  For each contiguous range along the x axis where the block variable is the same, a block is displayed in the graph.  Each successive block gets the attributes from the GraphData1 to GraphData12 style elements.  The GTL code for this is shown below.

SAS 9.3 GTL code:

/*--Basic BlockPlot--*/
proc template;
  define statgraph Block;
    dynamic _display _type;
    begingraph;
      entrytitle 'Windows OS Releases';
      layout overlay / xaxisopts=(timeopts=(minorticks=true));
        blockplot x=date block=windows / display=_display 
                  valuehalign=center filltype=_type;
      endlayout;
    endgraph;
  end;
run;
 
/*--Basic BlockPlot--*/
proc sgrender data=Windows template=Block;
run;

Note, in the program above, we have provided for a couple of dynamics "_DISPLAY" and "_TYPE" that are not defined in the SGRENDER step so far.  So, these options are ignored, as if they are not even coded.  This results in the most basic of block plot outputs.

BlockValuesAltClearly, displaying the "Block" values in each block is often useful, and this can be enabled by specifying the "Values" in the DISPLAY option.  Additionally, the plot supports a different fill type called "Alternate".  In this case, instead of using a unique color per block, the blocks are drawn using alternating colors as shown in the graph on the right.  Here is the code for this graph, using the same template we have define above.

SAS 9.3 GTL code:

/*--Block Values Alternate colors--*/
proc sgrender data=Windows template=Block;
  dynamic _display='fill values' _type='Alternate';
run;

In the use case above, we have set _DISPLAY='fill values' and _TYPE to 'Alternate'.  ValueHAlign is set to CENTER and ValueVAlign is CENTER by default.  So the block values are displayed at the center of each block.  The blocks now get alternating colors.

BlockValuesAttrsTransIn the alternating color band case, the attributes of the bands can be set using the FillAttrs and the AltFillAttrs option.  As for all fill attributes, transparency can be used inside the fill attribute.  In the example on the right, we have used a pink color in the FillAttrs, and set the AltFillAttrs to fully transparent, so whatever is behind it will show through.  In this case, the wall.  We have also move the value labels to the top of the block using the ValueVAlign option.

Normally, the Block Plot is used in conjunction with some other plot statement.  In the example below, we have used this same data along with a SERIES plot of the monthly closing value for the Microsoft stock price from the SASHELP.STOCKS data set.  The data in the stocks data set only goes up to about April 2005, so I have restricted the data to that range.

BlockStockFirst, we extract the data for STOCK='Microsoft' from the SASHELP.STOCKS data set and sort it by Date.  Then, we merge the block data with the stock data by date.  See the attached program for full details.  Now, we add the SERIESPLOT statement to the GTL template to create the graph on the right.

Note the use of EXTENDBLOCKONMISSING to allow missing values to be used a continuation of previous block value.

SAS 9.3 GTL code for Stock Plot with Blocks:

/*--Block and Series Overlay plot--*/
proc template;
  define statgraph BlockStock;
    begingraph;
      entrytitle 'Microsoft Stock Price with Windows OS Releases';
      layout overlay / xaxisopts=(timeopts=(minorticks=true) 
                                  display=(ticks tickvalues));
        blockplot x=date block=windows / display=(fill values) 
                  valuehalign=center valuevalign=top 
                  filltype=alternate altfillattrs=(transparency=1)
                  extendblockonmissing=true;
        seriesplot x=date y=close / lineattrs=graphfit;
      endlayout;
    endgraph;
  end;
run;
 
/*--Block and Stock--*/
proc sgrender data=Series (where=(date &lt; '01apr2005'd)) template=BlockStock;
run;

As we saw in the previous examples, when a block plot is placed in a LAYOUT OVERLAY, the plot fills the entire height of the overlay region inside the axes.  The width of each block is determined by contiguous values of the BLOCK role.  That is why we call this a one-dimensional plot.

If multiple block plots are overlaid, each will fill the full height, and the last one will over write the previuous.  Using transparency for the FillAttrs can help in such cases.  However, this plot is one of the few the can also be placed in the INNERMARGIN of a layout overlay.   The INNERMARGIN is a region at the bottom of each overlay container.

BlockInnerWhen placed in the inner margin, this plot occupies only the height needed to accommodate the value of the block, about the height of the font.  So, each block plot occupies only a small part of the wall, and multiple block plots are STACKED and not overlaid.  This allows you to see all the blocks as shown in the graph on the right.  Here, we are displaying the release dates of the OS for both Windows and Mac.  Note the class values displayed on the left.  The GTL program for this graph is shown below.

SAS 9.3 GTL code for Block Plot in Inner Margin:

/*--Block plot with Inner Margin--*/
proc template;
  define statgraph BlockInner;
    begingraph;
      entrytitle 'Windows and Mac OS Releases';
      layout overlay / xaxisopts=(timeopts=(minorticks=true));
        innermargin;
	 blockplot x=date block=windows / display=(fill values label) valuehalign=center 
                  valuevalign=top filltype=alternate altfillattrs=(color=_color) 
                  fillattrs=graphdata1 extendblockonmissing=true valueattrs=(size=7);
	  blockplot x=date block=mac / display=(fill values label) valuehalign=center 
                  valuevalign=top filltype=alternate altfillattrs=(color=_color) 
                  fillattrs=graphdata2 extendblockonmissing=true valueattrs=(size=7);
        endinnermargin;
      endlayout;
    endgraph;
  end;
run;

BlockClassSeriesFinally, block plots also support the CLASS role.  This is similar to the GROUP role, except in this case a separate strip of block plot is created for each class value and stacked on the previous, not overlaid.  This behavior is both in the INNERMARGIN and in the layout itself.

In the graph on the right, we have changed the multi-variable data for Windows and Mac into a grouped data structure using the variable GROUP that has "Windows" or "Mac" and a REL variable that contains the release name, such as "WIN7" or "OSX".  Then, we merged this data with the stock data as before, and created this graph using CLASS=GROUP.

SAS 9.3 GTL code for Block Plot with CLASS:

/*--Group data with series--*/
proc template;
  define statgraph BlockClassSeries;
    dynamic  _color _trans;
    begingraph;
      entrytitle 'Microsoft Stock Price with OS Releases';
      layout overlay / xaxisopts=(timeopts=(minorticks=true) 
                                 display=(ticks tickvalues));
	 blockplot x=date block=rel / class=group 
                  display=(fill values outline) valuehalign=center valuevalign=top 
                  includemissingclass=false filltype=alternate 
                  altfillattrs=(color=_color) outlineattrs=(color=gray) 
                  extendblockonmissing=true valueattrs=(size=7);
        seriesplot x=date y=close / lineattrs=graphfit;
      endlayout;
    endgraph;
  end;
run;

Just as we did with the stock plot data, the block plot can be used to display the number of subjects at risk for a survival plot, or some other clinical graphs where it is important to display textual data that is axis aligned with the plot.  While the ability to draw axis aligned text is now available using the new AXISTABLE (SAS 9.4), the Block Plot can be effectively used to display segments over a linear or time axis, such as severity of an adverse event or more.  We look forward to seeing creative usage of this unique plot in your graphs.

Full SAS 9.3 code for all the examples:  BlockPlot

Post a Comment

G100 with SGPLOT

The GCHART procedure has a popular option called G100 to display all the subgroups in % format such that all the subgroup values add up to 100% for each group.   Each subgroup is labeled with its own % values.

SGPLOT procedure does not such an option, but with a little bit of preprocessing of the data, similar results can be obtained, with the added benefit that the groups can be  stacked or clustered.   Here, my interpretation of G100 is that all groups within a category add up to 100 %.  

I want to draw a G100 graph of Revenue by Year with Group=Customer.  First, I run the MEANS procedure on the sashelp.electric data set to compute the revenues by year and customer.  I retaining all observations with _TYPE_  of 2 or higher, and the sort the data by year and customer.

 Note, in the data set above sorted by Year and Customer, the total revenue for all customers for each year (_type_ = 2) comes before the detailed values per customer (_type_ = 3).   So, I simply run a data step to divide out all individual customer values for each year by the total to obtain the fraction for each customer per year.   I set the percent format on the values and drop the total value to get the data set below that I want.

 

 

We want to label each stacked bar segment by the value.  SGPLOT VBAR does not label all segments, so we will compute the low and high values for each customer segment within year.  Then, I can use the High Low plot to draw the bars segments, and overlay the labels using a scatter plot.  See attached code for the details.

SAS 9.3 SGPLOT G100 stacked (click on graph for high resolution image):

SAS 9.3SGPLOT  code for G100 stacked:

title 'All groups in a category total 100%';
proc sgplot data=pctByYearDescending(where=(year &lt; 2000));
  highlow x=year low=low high=high / type=bar group=customer;
  scatter x=year y=mid / markerchar=pct markercharattrs=(color=black size=7);
  yaxis grid display=(nolabel) offsetmin=0;
  xaxis display=(nolabel);
  run;

 The Year, Low and High are used to draw each segment of the stacked bar.  These are overlaid with the scatter plot using Markerchar at (year, mid).

With this data, the same graph can also be plotted as cluster groups.  Cluster groups with a data label for each group is easy to do with SGPLOT, so, we'll use the VBAR statement to do that.  I did not set the Y axis max to 100%, but you can if you wish.

SAS 9.3 SGPLOT G100 cluster (click on graph for high resolution image)::

SAS 9.3 SGPLOT code for G100 cluster:

title 'All groups in a category total 100%';
proc sgplot data=pctByYearDescending(where=(year &lt; 2000));
  vbar year / response=pct group=customer groupdisplay=cluster
              grouporder=data nostatlabel datalabel datalabelattrs=(size=7);
  yaxis grid display=(nolabel);
  xaxis display=(nolabel);
  run;
 

With SAS 9.4, there are some improvements in the graph drawing.  High Low plot can have data skins and the numeric labels on the bar charts will rotate when needed.  Here are the same graphs created using SAS 9.4:

Full SAS9.3 code for G100 Graphs: G100_93

Full SAS9.4 code for G100 Graphs: G100_94

Post a Comment

SAS Global Forum 2014 Graph Presentations

SAS Global Forum 2014 was a great success, with the SAS Studio, a web based SAS interface garnering a lot of attention.  SAS also announced the availability of SAS Analytics U, providing free web based access to SAS analytics for students, faculty and researchers.

The conference had multiple paper  and Super demos on data visualization presented by SAS staff.   A large number of excellent papers on graphics were presented by SAS users.   We noted that a large majority of users are now using SAS 9.3, with many users using SAS 9.4.

As promised, here are the links for graphics related presentations from SASGF 2014.  I included some additional papers I missed in the original list.

Papers by users:

Papers by SAS authors:

 

Post a Comment

Getting Ready for SAS Global Forum 2014

The SAS Global Forum 2014 is just around the corner starting Sunday, March 23 and I am eager to attend creative presentations from SAS users on ODS Graphics.  Adoption of SG procedures, GTL and ODS Graphics Designer is growing among users and I see many promising papers.

Papers by users:

  • Something for Nothing! Converting to ODS Graphics - Phil Holland
  • Communication-Effective Data Visualization: Design Principles - LeRoy Bessler
  • Stylish Waterfall Graphs Using SAS® 9.3 and SAS® 9.4 GTL - Setsuko Chiba
  • Increase Pattern Detection in SAS® Graph Template Language - Perry Watts
  • Graphing Made Easy with ODS Graphics Procedures - Lora Delwiche
  • Using SAS® ODS Graphics - Chuck Kincaid
  • Prescription for Visualization: Take One SAS® GTL - Radhika Myneni
  • Using SAS® GTL with SAS® 9.3 When There is Too Much Data to Visualize - Perry Watts
  • Make It Possible: Create Customized Graphs with Graph Template Language - Wen Song
  • The Many Ways of Creating Dashboards Using SAS® - Mark Bodt
  • Combined SAS® ODS Graphics Procedures with ODS to Create Graphs - Howard Liang

Papers by SAS authors:

  • Putting on the Ritz: New Ways to Style Your Graph to the Max - Dan Heath
  • Plotting Against Cancer: Creating Oncology Plots Using SAS® - Debpriya Sarkar
  • Up Your Game with Graph Template Language Layouts - Sanjay Matange

Super Demos:

  • Look Ma! No R - Prashant Hebbar
  • What's New in ODS GRAPHICS for SAS 9.4 - Sanjay Matange
  • Auto charts with ODS Graphics Designer - Lingxiao Li
  • Clinical Graphs using SG Procedures - Sanjay Matange
  • What's New with SAS 9.4 SG Procedures - Dan Heath
  • Yes, you can with polygon plot - Sanjay Matange
  • New Features in SAS 9.4 GTL - Prashant Hebbar

And then there is the Demo Room.  Starting Sunday,  we will be demoing the latest new features from SAS 9,4 and SAS 9.4M1.  All in all, it should be an exciting conference.

Post a Comment

Axes Synchronization

Often we need to plot multiple response variables on Y axes by a common variable on X axis.  When the response variables are very different in magnitudes or format, we prefer to plot the variables on separate Y (Left) and Y2 (Right) axes.

Here is some sample data with three response columns "East", "West" and "Percent".  Let us see how we can plot multiple variables by "Category".

Here is a bar chart of East by category overlaid with a line chart of West by Category.  East is plotted on the Y axis and West on the Y2 axis.  Click on the graph for a higher resolution image.

SAS 9.3 SGPLOT code:

proc sgplot data=synch;
  vbar  cat / response=east dataskin=matte nostatlabel;
  vline cat / response=west y2axis lineattrs=(thickness=4) nostatlabel;
  run;

Note, the bar has a baseline at zero on the Y axis, but this does not match the zero value on the Y2 axis.  How can we synchronize the zeros on both axes?    If both Y and Y2 axis ranges were positive, we can just set the min=0 for both axes.  But when one or both axes have negative values in the range, how do you do this?

Unfortunately,  there is no magical option you can set on one or the other axis to make this happen.  However, there is a simple way to do this.  The axis offsets (the space reserved at both ends) are the same by default.  Now, if you can set the Y and Y2 axis min and max values such that the proportion of the negative range to the positive range is same on both axes, then the zero values will line up.  Here is how we synchronized the zero values for this case:

SAS 9.3 SGPLOT code:

proc sgplot data=synch;
  vbar  cat / response=east dataskin=matte nostatlabel;
  vline cat / response=west y2axis lineattrs=(thickness=4) nostatlabel;
  yaxis  min=-250 max=500;
  y2axis min=-100 max=200;
  run;

Note in the above program, we have set the Y axis range is from -250 to 500.  So, the positive range is 2 times the negative range.  We also make the Y2 axis ranges in the same proportion, -100 to +200.  This makes the zeros of the two axes line up as seen in the graph above.  The baseline at zero for the bar chart on the Y axis lines up exactly with the zero value on the Y2 axis for the line chart.

This same principle applies if the axis formats are different as long as the axis are linear.  Actually, that is one reason you may want different axes anyway, when one of the values has a percent format, as shown in the graph below.  Here we have a bar chart of East by category on the Y axis overlaid with a line chart of Percent by category on the Y2 axes:

Graph without axis synchronization.  Note the zeros on Y and Y2 are not aligned:

Bar with axis synchronization.  Note the zeros on Y and Y2 are aligned:


We have done the same in this case, just made sure the proportion of the negative to positive ranges on both axes are the same.

Now, what if you want to align a particular value on each axis, but also the axis ranges?  In the graph below, I have plotted the Fahrenheit by Month overlaid with Celsius by Month.  I want to align the 32 on Y with zero on Y2, and also get the exactly same ranges so the two graphs are exactly the same.

Fahrenheit and Celsius Graph without axis synchronization.  Note the 32 on Y is not aligned with zero on Y2, and the scatter plot does not match with the series plot.

Fahrenheit and Celsius Graph with axis synchronization of 32 with zero.

Here I have added a reference line on Y2 axis at zero, and it lines up exactly with 32 on the Y axis.  Also note the Series plot of Fahrenheit on Y lines up exactly with the Scatter plot of Celsius on Y2.  So, the mapping of the values on both axis is exact.

You can also do the same for the X and X2 axes.  For an Adverse Events time line, we often we have a plot by Day on the X axis, with associated dates shown on the X2 axis.  To synchronize these axis, you can use the same idea.

Now, we have given this issue some thought to see how we can automate this process.  However, there has not been a pressing demand for this from the user community, nor have we come up with a simple solution.  So, it is likely that many of you have already figured this out.  If so, please feel free to contribute your ideas.

Full SAS 9.3 code:    Synch_Axes

Post a Comment

DataLattice with gradient backgrounds

Classification panels are a very popular visual representation of the data, where the data is gridded by class variables all in one graph.  This makes it easy to compare and contrast the data by these class variables.  The SGPANEL procedure makes this easy, and most of the time it is all you need to create your class panel especially since SGPANEL  supports computed plots like histograms, box plots and so on.

One long standing desire expressed by users was to be able to color the walls of each cell of the class panel by another variable, either as a group color or a color response.  Recently, this was requested again by a user on the SAS User Communities board, leading me to take another look at this use case.

Since the requested graph was a scatter plot (and not a computed plot), it is easier to use the GTL DATALATTICE statement to create this graph.  This opens up some possibilities.  While there is no direct option to do this, we can leverage the plot layering capability to create such a graph.  Click on the graph for a higher resolution image.

Here I have used the SASHELP.CARS data set.  I run the MEANS procedure to compute the mean mpg, horsepower and frequency counts for each crossing of Origin and Type variables (excluding Hybrids).  I have used the COMPLETETYPES option to retain crossings with zero counts.   I merged the data back into the cars data set and plot the graph using this GTL code.   SG procedures do not support ColorResponse (yet), so it is not possible to do this with SGPANEL.  This is being addressed at SAS 9.4M2.

SAS 9.3 GTL code:

proc template;
  define statgraph GradientPanelWall;
    begingraph;
      entrytitle 'Vehicle Statistics';
      layout gridded / columns=2;
        layout datalattice columnvar=origin rowvar=type / columns=3
               headerlabeldisplay=value
               rowaxisopts=(offsetmin=0.1 offsetmax=0.1)
               columnaxisopts=(offsetmin=0.1 offsetmax=0.1);
          layout prototype;
	    bubbleplot x=mean_hp y=mean_mpg size=size /  colorresponse=n name='a'
                       colormodel=twocolorramp datatransparency=0.5
                       bubbleradiusmin=300 bubbleradiusmax=400;
	    scatterplot x=horsepower y=mpg_city / primary=true;
	  endlayout;
        endlayout;
        continuouslegend 'a' / title='Observation Counts' halign=right orient=vertical;
      endlayout;
    endgraph;
  end;
run;
 
/*--Lattice with color background by count--*/
proc sgrender data=cars template=GradientPanelWall;
run;

The key features of this program are as follows:

  • A GTL LAYOUT DATALATTICE is used to create the data lattice using Origin and Type.
  • The prototype contains a SCATTERPLOT of mpg_city by horsepower.  This is marked PRIMARY.
  • The prototype also contains a BUBBLEPLOT using COLORRESPONSE=n.  The (x,y) location of each bubble is the middle of the values for mpg and horsepower.  The size of the bubble is fixed at 200, and the bubble radius min and max ranges are set to create large bubbles to cover the entire space of the cell.  This is placed behind the scatter plot.  The data has non-missing values for only one bubble per cell.
  • A large bubble will enforce large axis offsets.  This is overcome by setting axis offset min and max to a reasonable small value.
  • The lattice is placed in the left cell of a LAYOUT GRIDDED with 2 columns.  The right cell contains the gradient legend.

This code essentially creates  the graph we want.  Now, let use improve the visual a bit.  Note, the height of the continuous legend is the full height of the Layout Gridded cell.  It would be nicer if the height was reduced to match the height of the class lattice as shown below.

Now, this is much nicer.  Note, there is no automatic way to do this.  I have done this by specifying the PAD option for the legend, setting bottom, top and left pads appropriately for this case.  The left pad creates a small separation between the lattice and the legend.  We will take this up as an item to automate in future releases.  See attached full program for this option in the second graph.

Another frequently requested item is the ability to place insets into the cell that provide information on the data in each cell.  Using an ENTRY is not good enough as all the cells will have the same value.  SAS 9.3 LAYOUT DATALATTICE provides an INSET option that allows you to display different values in each cell.  The data has to be carefully merged into the data set and the additional columns should have only the number of observations as the number of cells, in the right order.  These values are then displayed in each cell.

Note, this is not very user friendly, as the order is important.  SAS 9.4 supports "match-merged" data, making this much easier.  Here is the SAS 9.3 graph with color backgrounds and insets.

In this last example, we have used the default three color ramp.  SAS 9.4 allows you to set your own color model by specifying a list of colors.  Now, we have a graph that provides us some useful value-add over what the SGPANEL procedure can do.  You can also do this with computed plots as long as you do the computing yourself.  Only non computed plots can be placed in the prototype of the GTL lattice.

Full SAS 9.3 Code:  DataLatticeGradientFill

 

Post a Comment

Layered graphs

Browsing graphs on the web, this graph caught my eye:  The Arctic Sea Ice Volume Graph.   My interest is not so much in the debate on Climate Change or Global Warming.  To me, this graph has some interesting features that can help show the benefits of plot layering to build a graph.  So, let us take a crack at it to see how far we can get.  As usual, my preference is to use plot statements and options only, and not resort to annotation.

From our perspective, this graph has the following components:

  1. A display of the "Ice Remaining" amount in the middle.
  2. Rotated text displaying the values.
  3. A plot at the top to represent the Yearly Ice Minimum with a fit plot.
  4. A plot at the bottom to represent the Yearly Ice Loss with a fit plot.
  5. Y axis label has a superscript '3'.

Here is the data set I built by eyeballing the graph:

The column "Mid" in the above table represents the midpoint where I want to draw the value for the "Ice Remaining".  Here is the first attempt at this graph, using SAS 9.3 features.  Click on the graph for a high resolution version.

SAS 9.3 Graph:  

SAS 9.3 SGPLOT code:

proc sgplot data=ice noautolegend;
  format diff 4.1;
  title 'Arctic Sea Ice Volume';
  title2 h=0.8 'Annual Maximum and Loss, and Ice Remaining at Minimum';
  footnote j=l  h=0.8 c=gray
          'Source: PIOMAS.vol.daily.1979.2013.Current.v2.dat.gz (Version 2.0)';
  highlow x=year low=low high=high / type=bar
          fillattrs=(color=silver) transparency=0.5 name='a'
          legendlabel='Ice remaining at yearly minimum';
  series x=year y=high /  lineattrs=(color=%rgbhex(0, 112, 192)
         thickness=3) name='b' legendlabel='Yearly ice maximum';
  series x=year y=low /  lineattrs=(color=red thickness=3)
         name='c' legendlabel='Yearly ice loss';
  reg x=year y=high / degree=2 lineattrs=(color=%rgbhex(0, 112, 192)
         thickness=1) nomarkers;
  reg x=year y=low / degree=2 lineattrs=(color=red thickness=1) nomarkers;
  scatter x=year y=mid / markerchar=diff;
  xaxis display=(nolabel) values=(1979 to 2012 by 1) valueattrs=(size=7);
  yaxis label="Ice Volume [1000 Km ~{unicode '00b3'x} ]"
        values=(15 to 35 by 3) min=15 max=34 valueshint grid;
  keylegend 'a' 'b' 'c' / location=inside position=topright across=1
        valueattrs=(size=7);
  run;

Here are the relevant aspects of the code:

  1. We used the HIGHLOW plot with Type=Bar to draw the "Ice Remaining", using transparency.  Skins are not available at SAS 9.3.
  2. We used two SERIES plots to display the curves at the top and bottom.  SMOOTHCONNECT is not available at SAS 9.4
  3. We used REG plots with DEGREE=2 to draw the fit lines.
  4. We used a SCATTER plot with MARKERCHAR to display the ICE values in the middle.
  5. Note the usage of Unicode super script in the Y axis label.

While the graph above gets the job done, let us see if we an improve it further.  Here is the same graph using some new features in SAS 9.4M1:

SAS 9.4M1 Graph:

Note the following improvement in the graph:

  1. HighLow plot uses data skin, giving it the look of an ice core.
  2. Series plots use SmoothConnect, which creates a smooth connection at the vertices.
  3. The vertically oriented ice values are drawn using the new POLYGON plot which supports Rotate and RotateLabel.   The plot data itself has only one obs. per ID, so the there is not enough data to draw the polygon itself, but associated label is still drawn at the location.
  4. Graph wall border is removed.

The above graph also demonstrates the power of using plot layers to create a graph, and the creative (non standard) usage of plot types to get our work done.

SAS 9.3 provides us the features necessary to present all the data as per the "intent" of the original graph.  However, with SAS 9.4M1, we can create a graph that is practically identical to the one in the original link.

Full SAS 9.4M1 program: Ice_94

Post a Comment

Sochi Medal Graphs

The attention of the world is now on Sochi and the Winter Games.  Gold, Silver and Bronze medals are being earned by these amazing athletes, and everyone has an eye on the tally.  Andre sent me a link to TRinker's R Blog, showing a graph of the current tally.  Andre wanted to convey to SAS users this kind of a graph is easy to do in SAS, so he sent me the SAS data set of the latest tally.  This data is in multi-column format, and I converted it to "Group" type structure as shown below.

A Lattice view of the graph as shown in TRinkers R Blog can be made using the SGPANEL procedure.  Click on the graph for a high resolution view.  I have intentionally ordered the countries by the number of GOLD medals, instead of Total.  To keep the graph small, I only show countries with at least one Gold medal.

SGPANEL code:

proc sgpanel data=sochi_group noautolegend;
  title 'Sochi Medals Counts by Country Ordered by Gold Medals';
  panelby medal / layout=panel columns=4 onepanel sort=data novarname spacing=5;
  styleattrs datacontrastcolors=(gold silver %rgbhex(236, 143, 44) black);
  dot country / response=count group=medal nostatlabel
                markerattrs=(symbol=circlefilled size=10);
  rowaxis discreteorder=data display=(nolabel) fitpolicy=none;
  colaxis integer display=(nolabel);
  run;

While graphs like the one above are preferred by the statistical and analytical community, other users often want something spiffier.  The nice thing about SG Procedures and GTL is that they can do both the analytical graphs and the spiffier graphs.  Here is an example of what you can do with SAS 9.4 GTL.  Click on the graph for a high resolution image.

SAS 9.4 GTL Graph of Sochi Medals:

This graph uses the following new features included in SAS 9.4M1 GTL:

  • A SymbolImage statement allows usage of any image as a marker symbol.  Here I have used the three medals.
  • Alternate horizontal color bands allows easier decoding of the data.
  • Backlit (or shadowed) text is used on the medals so it shows clearly.

Other graph configurations are possible and are included in the attached program.

Full SAS Program:  Sochi

 

 

 

 

 

Post a Comment