Customize Legend Entries - SAS 9.40M3

We all want to customize our graphs just so, and have our personal preferences.  Over the past few releases SG Procedures and GTL have added options to customize the look and feel of our graphs.  In this article, I will describe new ways in which you can customize your legends.  We will also see some new visual options.

LegendDefaultHere is a common graph with the default legend.  The graph displays a quadratic fit for MSRP by Horsepower, using the REG statement with the CLI and CLM options and DEGREE=2.

Note the default legend shows a small color swatch for the 95% confidence band, and two line elements for the Prediction limits and the fit.  The line elements are rather long, and designed to be able to represent any of the line patterns that are supported.

LegendLineIn this case,  we have only two line patterns, one solid and one dashed.  So, it is not necessary to use such a long line to represent the lines.  Now, with SAS 9.4M2, the lengths of the line segments in the legend can be controlled as I have done here using the LINELENGTH option:

keylegend / linelength=32;

This makes for a much better legend.  This normally makes sense only when the line patterns are short, or in case of a grouped plot, you are using a "Color" priority style like HTMLBlue.   Now we have addressed the line segment length issue, what about the fill color swatch?

With SAS 9.40M3, new options have been added to the KEYLEGEND statement to provide for more customization of the color swatch.  The default color swatches can be smaller than some of you may want, and when using skins, the small swatch is unable to properly represent the colors in the graph.  This was also brought to our attention by Dr. LeRoy Bessler during Dan's presentation at SGF 2014 in Washington DC.  And then there is always individual personal preferences that come into play.  To address these cases, SGPLOT provides the new  SCALE, FILLHEIGHT and FILLASPECT options.

LegendLineScaleIf all you want is to increase (or decrease) the size of the color swatch, and don't want a particular size, you can use the SCALE option as shown on the right.  Here I have used the SCALE option to increase the size of the fill swatchs:

keylegend / linelength=32 scale=1.2;

Now, the color swatch is a bit bigger.

LegendLineAspect3You can fully customize the size and shape of the color swatches using the FILLHEIGHT and FILLASPECT options.  Now, we have set the height of the swatch to 2.5% of the graph height and the aspect ratio to GOLDEN.  The golden ratio comes from observations of ratios in nature and also from Fibonacci sequence and is equal to 1.618.

keylegend / linelength=32 fillheight=2.5pct                                      fillaspect=golden;

FILLHEIGHT takes a dimension, so it can be pixels (px), percent (pct) inch, cm or mm.  All values are scaled by DPI.  FILLASPECT accepts a value greater than zero.  If the color swatch becomes too big, the legend will drop out.

DeathsUnicode3The example of the right uses swatches that are 2.5% high with an aspect of 2.5.  The bigger swatches provide more space to render the skinned areas.

As expected, GTL  provides the same options in the DISCRETELEGEND statement in the ITEMSIZE options bundle.

Those with a keen eye would have noted a few new visual possibilities.  SGPLOT now allows you to turn off the internal border of the wall, and control the axes lines to cover only the range of the data using the following option:

styleattrs axisextent=data;

Now the x and y axis lines only extend over the data range.  In case of the x axis, the line only goes from the min to max tick mark.  The y axis line stops at y=0 tick and extends to the actual data value on the high side.  In the past, some users have expressed a preference for such treatment of the axes.  Click on any of the fit plots above to see it in more detail.

Full SAS 9.40M3 Code:  Legend

Post a Comment

Modifying Dynamic Variables in ODS Graphics

If you are familiar with the output delivery system (ODS), then you know that you can modify the tables and graphs that analytical procedures display by modifying table and graph templates. Perhaps less familiar is the fact that you can also modify dynamic variables.

Tables and graphs are constructed from a matrix of information (the ODS data object), layout instructions (a table or graph template), instructions for the overall appearance (a style template), and dynamic variables. Procedures use dynamic variables to control certain details of how tables and graphs are created. If a table has a header that says "95% Confidence Limits", chances are the "95" was set by a dynamic variable. You can set this percentage as a procedure option, so the procedure writer cannot specify the "95" directly in the table template.  If a graph might or might not contain a loess fit, chances are a dynamic variable controls whether the loess fit is displayed or not.  More generally, if some portion of a table or graph is conditionally displayed, it is probably controlled by a dynamic variable. Dynamic variables are listed in the DYNAMIC statement in table and graph templates.

You will need to use the ODS document if you want to modify dynamic variables. The ODS document is a repository of information. You can open an ODS document, run one or more procedures, store all of the output (tables, graphs, notes, titles, footnotes, and so on) in the document, then replay some or all of the output in any order that you choose. For example, SAS/STAT documentation uses the ODS document to capture output from the code displayed in the documentation and then replay subsets of the output. This enables SAS documentation to display output, then add explanatory text, then display more output and more text, and so on.

The following steps capture a dendrogram in an ODS document and then replay it:

ods graphics on;
ods document name=MyDoc (write);
proc cluster data=sashelp.class method=ward pseudo;
   ods select dendrogram;
   id name;
ods document close;
proc document name=MyDoc;
   replay cluster\dendrogram;

Both steps produce the same dendrogram:


The ODS DOCUMENT statement opens an ODS document named MyDoc. Since the WRITE option is specified, a new document is created each time this statement is executed, and any old content is discarded.

The following step lists the contents of the ODS document:

proc document name=MyDoc;
   list / levels=all;

The document contents are displayed here.

This document contains one directory and a graph. There would be more entries in the document if the ODS SELECT statement had not been specified in PROC CLUSTER. Both the data object and the dynamic variables are stored in the ODS document. (Templates are stored in item stores that SAS provides.) You can store the dynamic variables in a SAS data and display them as follows:

proc document name=MyDoc;
   ods output dynamics=dynamics;
   obdynam \Cluster#1\Dendrogram#1;
proc print noobs data=dynamics;

The path specified on the OBDYNAM statement was copied from the results produced by the LIST statement. It is important to list the contents of the ODS document and then copy the path from the listing. Some procedures create multiple tables with the same name (for example, the ModelANOVA table in PROC GLM). The ODS document provides the precise path that you need to display each.

The data set is displayed here.

PROC CLUSTER determines the height and width of the dendrogram at run time after evaluating the number of rows in the graph. These sizes are stored as dynamic variables. The dynamic variable DH sets the design height and DW sets the design width. You can modify the values of the dynamic variables and then use them to replay the graph. The following steps create a smaller dendrogram:

data dynamics2;
   set dynamics;
   if label1 = 'DH' then cvalue1 = '400PX';
   if label1 = 'DW' then cvalue1 = '400PX';
proc document name=MyDoc;
   replay \Cluster#1\Dendrogram#1 / dynamdata=dynamics2;


You can additionally modify both the graph and the style templates for a fuller customization of the graph or table. In summary, ODS provides ways to modify every aspect of how a table or graph is displayed.

Post a Comment

Unicode in Formatted Data - SAS 9.40M3

SAS 9.4 Maintenance release 3 was released on July 14.  The ODS Graphics procedures include many important, useful and cool features in this release, some that have been requested by you for a while.  In the next few articles, I will cover some of these features.  Last time I covered the new HeatMap statement useful for Big Data Visualization.

One cool and useful new features is the support for Unicode values in SAS Formats.  For long, certain parts of the graph could have Unicode text.  These included user provided text strings for Titles, Footnotes, Entries.  These support Unicode characters, and also commands such as SUP and SUB to make any character string into a sub or super script.

Other items like Axis Labels, etc. support Unicode strings but not the commands like SUB and SUP.  However, there was no way to have data strings (from data set) to be displayed on the axis, data labels or legends.  Till now, that is.  Now, with SAS 9.40M3 you can have data values that can be displayed in the graph as Unicode strings using the user defined formats.

Deaths1Here is a simple example of a graph showing the counts of deaths by Age Group and Death Cause for the sashelp.heart data set.  I have created a format to break up the age values into four groups.  Here is the code:

proc format;
  value agegroup
    0 -< 40 = '< 40'
    40 -< 50 = '40 < 50'
    50 -< 60 = '50 < 60'
    60 -< high = '>= 60'

The code for the graph is shown below.  The graph is shown on the right.  Click on graph to see the full view.  Note, I have added some annotation around the last tick value ">= 60", which is the formatted label for the last age group.  The full code, including the annotation is shown in the link at the bottom.

title 'Counts by Age Group and Death Cause';
proc sgplot data=sashelp.heart(where=(deathcause ne 'Unknown')) sganno=annoAxis;
  format ageatdeath agegroup.;
  vbar ageatdeath / group=deathcause groupdisplay=cluster nooutline
       baselineattrs=(thickness=0) dataskin=pressed filltype=gradient;
  keylegend / location=inside across=1 title='';
  xaxis display=(nolabel noticks);
  yaxis label='Count' grid;

Now, with SAS 9.40M3, I can include a Unicode string in the label for the last age group as shown below.  Here, I have used the unicode value '2265' for the "greater than or equal" symbol.  Note the use of the full default ODS escape character string (*ESC*).  This must be used in the format syntax, and a user defined escape char cannot be used.

proc format;
  value agegroupUnicode
    0 -< 40 = '< 40'
    40 -< 50 = '40 < 50'
    50 -< 60 = '50 < 60'
    60 -< high = "(*ESC*){unicode '2265'x} 60"

DeathsUnicodeNow, running the same SGPLOT code again with the new format name produces the graph on the right.  Click on the graph to see the full image.  Now, the highlighted tick value uses the Unicode symbol.

This is very convenient, as the only alternative (pre SAS 9.40M3) is to replace the tick value using annotate, which is a messy and non scalable process.  Now, the value is what you want, and will automatically adjust to changing data, sort, graph orientation, etc.

DeathsUnicode2To illustrate this point, the graph on the right switches the category and group roles.  Now, age group is used as a group, so the formatted value for the fourth group is displayed in the legend.  Using this new technique, this happens automatically, no extra work is required.

It is still not possible to send entire long Unicode strings in the data set itself.  However, most of the use cases can be handled by creating a format that includes the unicode value.

Aside:  Personally, I don't like to see grid lines showing through the transparent bars.  I have prevented that in this graph.  Can you see how I did that in the linked code?

I know some of you already have SAS 9.40M3.  Please give this a spin to see how well this works for you and the mileage you get from  it.  You still cannot use the SUP and SUB commands to do something like Alpha ** Beta, but many simple numeric powers and subscripts are available in the Unicode fonts.  Please chime in with your comments.

Full SAS 9.40M3 program:   Unicode 

Post a Comment

Big Data Visualization

Big data is a popular topic, with multiple articles about the analysis of the same.  Today, "Big Data" is measured in multiple of Tera Bytes, and SAS provides special software for analysis and visualization of Big Data - Visual Analytics.

HeatMapWhen data is very big, it may be meaningless, let alone inefficient, to plot a scatter plot of such data. This is especially true when the data is on a server, and we want to create a X-Y plot on a local computer.  Bringing all the data down to plot is prohibitive, and the result is not very helpful.

With the release of SAS 9.40M3 this week, the SGPLOT procedure introduces the HEATMAP statement, a plot type suited for visualization of bigger data.  In this case, the data can be analyzed and binned into discrete bins along X and Y axis, and the results displayed using a color gradient.

The graph above shows a heat map of the distribution of the subjects in a study for Diastolic and Systolic blood pressure.  Admittedly, this graph is of a relatively small data set "sashelp.heart".  This data set has about 5200 observations, which is small from a "Big Data" perspective.  But for our purposes, we can assume we have a data like this for millions of subjects or billions of credit card transactions.  The binning of the data is done on a fast server, along with the computation of the regression fit.  Only the "graphical" information for drawing the bins and the curve are sent to the renderer to creating this graph.

title 'Distribution of Blood Pressure';
proc sgplot data=sashelp.heart;
  heatmap x=systolic y=diastolic / colormodel=(white green yellow red)
          nxbins=40 nybins=30 name='a';
  reg x=systolic y=diastolic / nomarkers degree=2 legendlabel='Fit';
  gradlegend 'a';
  keylegend / linelength=20 location=inside position=topright noborder;

HeatMapCLThis graph now allows us to view the blood pressure distribution of the subjects in a study.   The Heat Map statement works seamlessly with most other statements available in the SGPLOT procedure, so we can plot a regression plot on the heat map as easily as we did on the scatter plot.  In the graph above, I have set a custom color model for the display of the frequency data, starting with white to green to yellow to red, as displayed in the gradient legend on the right.  A discrete legend is displayed identifying the Fit plot.  This results in a nice, clean graph.

We can go a step further, and display the confidence and prediction limits on the heat map as shown on the right.  Once again, the same options are used as would be in the case of a scatter plot.

NumHeatMapResponseFor both of these graphs, the X and Y axis represent continuous, numeric data.  The data is binned into a set number of bins by default as determined by the underlying analytical code.  Bin counts can be controlled as we we have done using the statement options.

Heat Maps are also useful to view response data for the binned data, as shown in the graph on the right.  Here, we have a heat map of weight by height of the subjects in the study.  However, now each bin now shows the Mean of the Cholesterol level for all the subjects in the bin.  This show us the associations between Cholesterol by two analysis variables.

Another interesting use case would be to visualize the credit card balance for all customers of a bank by family income and value of the mortgage.

DiscreteHeatMapResponseThe SGPLOT heat supports numeric axes and discrete axes, and any combination of the two.  The graph on the right displays the mean MSRP value of the cars by Type and Make.  Both axes are discrete, and each bin displays the mean value of MSRP for all the observations in the bin.

Heat Maps have been supported in GTL, and you can find previous articles on GTL Heat Maps and Calendar Heat Maps.

SAS 9.4M3 code for Heat Maps:  HeatMap

Post a Comment

Row Lattice Headers

The SGPANEL procedure makes it easy to create graph panels that are classified by one or more classifiers.  The "Panel" layout is the default and it places the classifier values in cell headers at the top of each cell.

RowLatticeWhen using LAYOUT=Latice or RowLattice, the row headers are placed at the right side of each row, and the header text is rotated as shown in the example on the right.  The graph shown the distribution of Cholesterol and the panel variable (classifier) is "DeathCause".  Three cells are created and each cell displays the value of "DeathCause" on the right.

There are two obvious problems with this arrangement.

  1. Long text strings in the header are truncated, as for "Coronary Heart Disease" and "Cerebral Vascular Disease".
  2. The text strings are displayed in a vertical orientation that is hard to read.

Users have often complained about this, as admittedly, this is not a ideal arrangement.   The SAS code is included below.  Note the use of OFFSETMIN=0 for ROWAXIS, and usage of SPACING=10 for the cells.

proc sgpanel data=heart noautolegend;
  panelby deathcause / layout=rowlattice onepanel novarname spacing=10;
  histogram cholesterol;
  density cholesterol;
  rowaxis offsetmin=0;
  colaxis max=420;

RowLatticeInsetSAS 9.4M2 release provides a way to improve the arrangement of such a graph.  Here is a variation where I have suppressed the row headers entirely, and used the INSET statement to display the "DeathCause" values inside the cell at the top left.

The variable provided for the inset statement should have the values we want in each cell to be match merged with the panel by row variable.  In this case we are using the classifier variable itself.  Even though the column has the values repeated multiple times in the data, the value is drawn only once, and from the first observation only.

The NOHEADER option suppresses the row headers.  The INSET statement with column "DeathCause" inserts the text value into the top left of the cell.  In the case of this distribution plot, empty space is often available at the upper corners of the cell.  If not, you can add some offset to the top of the ROWAXIS.

proc sgpanel data=heart noautolegend;
  panelby deathcause / layout=rowlattice onepanel noheader spacing=10;
  inset deathcause / position=topleft nolabel;
  histogram cholesterol;
  density cholesterol;
  rowaxis offsetmin=0;
  colaxis max=420;

To draw the eye to the classifier value, the inset can be highlighted by using a background color or a border on the INSET statement as shown below left. Below right we have a 2x3 panel, showing both the row and column classifiers as insets. Note, I have added the "Death Cause" first since it has long textual values. I also added a OFFSETMAX=0.15 to create some space at the top of each cell.




Full SAS 9.4M2 Code: Lattice

Post a Comment

Attributes Priority for the Inquiring Mind

When ODS Graphics was first released with SAS 9.2 in 2008, a conscious effort was made to create graphs that were consistent and aesthetically pleasing out of the box.  Features in the graph derive their visual attributes from the active Style.  When Group classifications are in effect, the different classification levels of the group variable are represented on the screen using the attributes from the GraphData1 - GraphData12 elements of the Style.

AttrPriority_ListingThese attributes were carefully designed so the 12 colors are distinct from each other. The groups use up to 11 line patterns and 7 marker symbols. For each group value, the color, marker symbol and pattern are derived sequentially from these lists of 12 colors, 11 patterns and 7 symbols.  So the first group level gets the first color, first pattern and the first symbol.  The second group level gets the second color, second pattern and the second symbol.  This goes on till we run out of the list of symbols (there are only 7).  So, the eighth group level will get the eighth color, eighth pattern and the first symbol.  This goes on in this manner so we can have 84 distinct colored symbols and 132 distinct colored patterns.

The graph above uses the LISTING style.  The list of marker symbols has been changed to include filled markers.  Here you can see the assignment of colors, line patterns and marker symbols for each of the three group values.  Click on the graph to see a higher resolution image.

ods listing style=listing;
title 'Style=Listing'; 
proc sgplot data=seriesGroup;
  styleattrs datasymbols=(circlefilled trianglefilled squarefilled);
  series x=date y=val / group=drug lineattrs=(thickness=2);
  scatter x=date y=val2 / group=drug filledoutlinedmarkers 
  keylegend / title='' location=inside position=topright across=1;
  xaxis display=(nolabel);
  yaxis display=(nolabel) integer values=(4 to 20 by 4);

Note the use of FilledOutlinedMarkers in the Scatter plot.  Also, I have used the SAS 9.4 STYLEATTRS feature to change the group symbols to the list of three filled symbols.

AttrPriority_HTMLBlueSoon it was perceived that it is not always necessary to change all the attributes of the element for each group value.  This was especially true for the line patterns.  When using a color Style, it was felt that it was not necessary to change both line color and pattern, but only the color till all colors from the list are used.

The graph above is created using the SAS 9.3 HTMLBlue style.   In this style, the cycling of the attributes (color, symbol and line pattern) is different from the LISTING style.  As you can see, all the three groups have solid line patterns and circle marker.  Only the line color is changed per group.  So, the first 12 group values get the 12 different colors from the Style, along with the first line pattern and first symbol.  The 13th group level will get the 1st color with the 2nd line pattern and 2nd symbol.  Most of the time we only have a handful of group levels, so only color change is seen.

I recall seeing a presentation where the presenter was baffled on why he was not seeing different marker symbols for his scatter plot when he ran his SAS code, but was seeing only circle markers with different colors.  This was because, while at home he was using SAS 9.2, the presentation laptop had SAS 9.3 with the default destination of HTML with the HTMLBlue style.  He had to change the style back to LISTING to see the different shaped markers.

This behavior of the different Styles is called Attribute Priority.  The default AttrPriority is NONE, meaning that all the attributes are cycled together as for the LISTING style.  HTMLBlue has AttrPriority=Color.  This means that only the color attribute is cycled first holding the symbol and pattern constant till all the 12 colors are use up.  Then, we go to the second symbol and second pattern and cycle through all 12 colors again.

While this behavior was first introduced in SAS 9.3, this AttrPriority behavior was internally implemented.  With SAS 9.3M1, the AttrPriority option was surfaced in the Style.  With SAS 9.4, AttrPriority option was surfaced in the ODS Graphics statement.

AttrPriority_HTMLBlue_NoneNow, with SAS 9.4, you can make any Style behave in any attribute priority you want by setting the AttrPriority= option in the ODS Graphics statement.  Here is the HTMLBlue style with the AttrPriority set to NONE.  Now, all the visual elements come from the HTMLBlue Style (except the overridden symbols), but now all the attributes are cycled together.  So, the 2nd group value now gets a dashed line pattern and the TriangleFilled symbol.

ods listing style=htmlblue;
ods graphics / attrpriority=none;
title 'Style=HTMLBlue (Attrpriority=None)'; 
proc sgplot data=seriesGroup;
  styleattrs datasymbols=(circlefilled trianglefilled squarefilled);
  series x=date y=val / group=drug lineattrs=(thickness=2);
  scatter x=date y=val2 / group=drug filledoutlinedmarkers 
  keylegend / title='' location=inside position=topright across=1;
  xaxis display=(nolabel);
  yaxis display=(nolabel) integer values=(4 to 20 by 4);

AttrPriority_Analysis_ColorHere is an example of the same graph with the ANALYSIS style with AttrPriority=COLOR.  Note, in this case, both line pattern and marker color are held constant while color changes.

Often, one really does want the colors and symbols to change with group level, but not the line pattern.  This could be another value for the AttrPriority option (future).  But currently, we have only provided for AttrPriority of NONE and COLOR.

AttrPriority_Analysis_SolidTo create graph where the colors and symbols for groups change but not the line pattern, you will have to use a AttrPriority=NONE and hold the pattern by setting it to SOLID in the series plot.  Sure, this is not as good as having a value for AttrPriority that could do that for you, but that will have to wait till there is a strong demand for it.  Note, in the graph on the right, the color and symbols are changing, but the line pattern is held constant by setting lineattrs=(pattern=SOLID) in the code.

Full SAS 9.4 Code:  AttrPriority

Post a Comment

Bubble Plots

Bubble_Linear_SG3Bubble Plots provide additional ways to visualize your data.  The plot supports display of multiple response characteristics of the data in one graph.  Bubble plots were introduced with SAS 9.3 in GTL and SG Procedures.

A bubble is drawn at each (x, y) point in the graph, and each bubble is sized based on a third column.  Bubbles can be grouped by a classifier as shown here, or can be colored by a numeric response variable.

In the example above, we have specified an aspect=0.7,but this is not necessary.  Note, we have also used some special labeling to see how the markers sizes are scaled.  The graph is shown on the right.  Click on the graph for a higher resolution image.  The SGPLOT code is shown below, where I have used an additional TEXT plot to display some data in the graph.

proc sgplot data=bubble noautolegend aspect=0.7;
  bubble x=x y=y size=size / group=type datalabel=linlbl splitchar='-' 
         dataskin=gloss nooutline;
  text x=x y=y text=size / position=center;
  xaxis min=0 max=100 offsetmin=0 offsetmax=0.1 display=(nolabel) grid;
  yaxis min=0 max=70 offsetmin=0 offsetmax=0.1 display=(nolabel) grid;

The bubbles are sized based on the SIZE role shown in the code above.  By default, the sizing is done using a "Linear" scaling.  The smallest bubble size (on screen) has a diameter of the default marker size (7 px) and the largest bubble has a diameter of the three times the default marker size (3*7 px = 21 px).  The observation with the smallest value for "Size" gets the smallest bubble (7 px), and the observation with the largest value for "Size" gets the biggest bubble (21 px).   The on-screen size for the smallest or largest bubble can be set using the BRADIUSMIN and BRADIUSMAX options.  All other observations get a size between these two, scaled by the area of the bubble.  This is the default "Linear" scaling method.  More on this later.

Bubble_Abs_SGThe graph above shows a bubble plot with "Relative" scaling.  This means that the bubble sizes have no direct association with the dimensions on the axes.   They are sized as noted above, relative to each other.

Another useful way to see a bubble chart is where the size values are relative to the axis values.  In this case, a size of 10 means the bubble should have a radius of 10 units along the each axis.  Such a graph is shown on the right.

In this graph, each bubble has a size on the screen such that the radius of the bubble represents the distance along the axis.  So, the bubble with size=13 is centered at (50, 10), and has a radius of 13 units.  Such graphs are very useful when the observations represent some physical entity in geographic space, and the X and Y axes are equated.  In this graph we have set an ASPECT=0.7 and set the axes such that they have an aspect of 0.7.   Note the use of the absscale option and the grid lines create a mesh of square regions.

proc sgplot data=bubble noautolegend aspect=0.7;
  bubble x=x y=y size=size / group=type datalabel=size datalabelpos=center 
         <strong>absscale</strong> dataskin=sheen nooutline datalabelattrs=(size=10);
  xaxis min=0 max=100 offsetmin=0.05 offsetmax=0.1 display=(nolabel) grid;
  yaxis min=0 max=70 offsetmin=0.05 offsetmax=0.1 display=(nolabel) grid;

Bubble_Prop_GTLLet us take another look at the issue of "Linear" scaling in the graph at the top.  Here, the relationship between different values can be a bit confusing.    A bubble for an observation of size 2x will not be twice the size of the bubble for obs with size x.

It is often useful to have a graph where an observation with size=100 will be drawn with a bubble area twice as much as the bubble for an observation with size=50.  This scaling is called "Proportional", as shown in the graph on the right.

In the graph on the right, the "Size" is shown in the middle of the bubble,  The "Value Area" and the "Pixel Area" are shown in the outer label.  Now, the bubble of size 13 is only a little smaller than the bubble of size 15.  If we had a bubble of size 7.5, its area would be exactly half of the bubble with size 15.  In this method of scaling, the scaling line passes through zero and the max value.  So, observations with a response value of zero can (technically) have an area of zero.  However, BRADIUSMIN is used as a cutoff value to draw something on the screen.

The RELATIVESCALETYPE option can be used to get this last graph.  Except, this option is not currently available with the SGPLOT Bubble Plot statement.  If you need to create a bubble plot with proportional scaling, you will need to use the GTL version shown below.

/*--Template for Bubble Chart with Proportional scaling--*/
proc template;
  define statgraph Bubble;
      entrytitle 'Proportional Bubble Size - GTL'; ;
      layout overlay /   aspectratio=0.7 
                         xaxisopts=(display=(ticks tickvalues line) griddisplay=on 
                           linearopts=(viewmin=0 viewmax=100) offsetmin=0 offsetmax=0.1)
                         yaxisopts=(display=(ticks tickvalues line) griddisplay=on
                           linearopts=(viewmin=0 viewmax=70) offsetmin=0 offsetmax=0.1);
         bubbleplot x=x y=y size=size/ group=type datalabel=PropLbl 
                relativescaletype=proportional datalabelsplit=true 
                datalabelsplitchar='-' name='a' dataskin=sheen display=(fill);
        textplot x=x y=y text=size / position=center;
/*--Bubble Chart with Proportional scaling--*/
proc sgrender data=bubble template=bubble;

In this case, we can actually use the GTL LAYOUT OVERLAYEQUATED.  This layout ensures that each axis uses the same pixel to data scale, so a value interval of 10 units is represented by 10 pixels on each axis.

Full SAS 9.4 code:  Bubble

Scaling Diagrams (by Rick Wicklin):  Scaling_Diagram



Post a Comment

Is that Annotate?

The SGPLOT procedures includes features to add annotations to your graph in many different ways.  Annotations provide you a flexible way to add features to your graph that are not available through the standard plot statements.

Survival_Prognosis2Recently, I saw this graph on the web that caught my attention.  Clearly, this looks like a good candidate to use Annotate to create the arrows that explain the behavior of cancers with different severity of aggressiveness.

SAS 9.4M2 release of SGPLOT procedures also includes the POLYGON plot that can handle many such tasks.  The Polygon plot is a unique statement that behaves like annotation where it will draw for you any figure you define as a polygon on the graph. The plot statement can be interleaved with other basic plot statements and can negotiate the coordinate space with the graph axes.

PropnosisHere, I created the same graph using the Series plot and the Polygon plots of the SGPLOT procedure.  The survival percentages over time for patients with different category of cancers are displayed using a Series plot with a Group role.

The arrows with the text explaining the behavior of the cancers are drawn using the Polygon plot using a "Id" role.

In this case, I have defined the data for the curves as Alive * Time by Severity.  Then, I created another data set "Arrows" to define the two arrows using (x, y) coordinates for each vertex by "Id".  There are two arrows with ID=1 and 2.  A label is also defined for each polygon.

Now, I use the Series statement to draw the three curves, and the Polygon statement to draw the polygons.  Note the long Y axis label is automatically split.

proc sgplot data=both;
  series x=time y=alive / group=severity smoothconnect 
         lineattrs=(thickness=4) nomissinggroup name='a';
  polygon id=id x=x y=y / fill outline label=label 
          labelpos=center nomissinggroup splitjustify=center 
          fillattrs=(color=lightblue transparency=0.5) 
          labelattrs=(size=8) splitchar=',';
  xaxis grid values=(0 to 72 by 12) offsetmin=0 offsetmax=0;
  yaxis grid values=(0 to 1.0 by 0.2) offsetmin=0 offsetmax=0.01;
  keylegend 'a' / title='' position=top linelength=20 noborder;

The Polygon plot also displays the polygon label in many different ways.  Here it is displayed at the center of the polygon bounding box, using "," as the split character to wrap the long label within the body of the arrow.  The text has a horizontal orientation, and thus easier to read.  Rotated text can also be displayed if necessary.

PropnosisLblOften, it may be preferred to display the labels for each curve in the plot itself, thus eliminating the need for a legend.  This is often leads to a graph that is easier to decode as it is no longer necessary to look back and forth between the curves and a legend.  The curves are labeled where the eye is already.

Reducing eye movement necessary to decode the information in the graph leads to a more "effective" graph.

The answer to the question in the title then is: "No, it is the Polygon Plot".

Full SAS 9.4M2 Code:  Prognosis

Post a Comment

Report from PharmaSUG 2015

PharmaSUGPharmaSUG 2015 in Orlando was held at the Renaissance had a record breaking attendance of over 650.  Weather was great, except for a huge downpour on the evening of the last day.  All the popular presenters were in attendance including Art Carpenter, Kirk Lafler, Arthur Li and many others.

Presentations on graphics were aplenty,  using SG procedures, GTL, SAS/GRAPH and Annotate.  I tried to attend as many as I could but did not get to all of them due to Super Demo duty.  What got me really fired up was the creative ways in which users are utilizing the features of SG procedures and GTL to build their custom graphs.

SankeyBarOne standout example was the Sankey Bar Chart by Shane Rosanbalm of Rho Inc, a CRO right here in Chapel Hill.  The graph shows the the subject disease severity over visits at Baseline, 12, 30 and 60 months.  The stacked bars in the graph on the right shows the % occurrence of the disease by severity.

However, Shane also wanted to see the change in the severity over visits.  So he came up with this unique custom visual, depicting the severity % by visit and also the flow of subjects from one severity value to another.

Multi_Cell_GTL2The winner of the Data Visualization presentations was the paper by Creating Sophisticated Graphs using GTL by Kaitlyn McConville and Kristen Much, also of who presented a paper on building complex graphs using GTL.  The authors used a step-by-step method to explain the process, thus de-mystifying the learning curve.  It was good to see more and more users turning to GTL to create their graphs, willing to trade a little bit more programming complexity to obtain sophisticated graphs.

ForestPlot2Janette Garner of Gilead Sciences Inc presented an Enhanced Forest Plot macro using SAS.  It was gratifying to see some of the material previously presented in this blog taken further and put to use in a real world example.  Janette extended the traditional Forest Plot by adding a Bar Chart of the actual values.

Janette used GTL with a Layout Lattice to create the multi-cell layout.  She nested another Layout Lattice in the SideBar of the first one to define the graph headers.

Variable_WidthShe used the HighLowPlot in different ways to draw many elements of the graph, including the subgroups and labels with indentations on the left, the bar chart itself, the bar labels, the odds ratio and the labels on the right.  This truly shows the flexibility of this plot statement.

Songtao Jiang displayed a creative usage of the GTL Series Plot to create a Variable Width Plot shown above.

WaterFall_By_DoseMurali Kanakenahalli and Avani Kaja of Seattle Genetics showed how to create multiple graphs for Oncology Trials using GTL.  This included Kaplan-Meier graphs, Waterfall Charts by Dose Group, a Swimmer plot and more.

Here is the link to the data visualization section of the conference proceedings for these and other excellent papers.





Post a Comment

Report from SGF 2015

SGF_2015_Logo_2SGF 2015 was a blast with a focus on Visual Analytics, SAS Studio, Hadoop and more.  Graphs were everywhere, and it was a banner year for ODS Graphics with over 15 papers and presentations by users on creating graphs using SG Procedures, GTL and Designer.

Dan Heath, Prashant Hebbar, Scott Singer and I were alternatively manning the ODS Graphics station and the Super Demo station.  We had a steady stream of users sharing their experiences with these graph tools.  The general feed back was awesome, and we were impressed with the level at which you folks have adopted these tools and using them to create graphs.  I was pleasantly surprised some of you already using the SAS 9.40M2 features, including the TextPlot (looking at you, Jim!).

Dan_800Dan presented the new features in SG Procedures 4-5 times.  Here he is expounding upon the sorting features in the SGPanel procedure.

Normally, Super Demos are scheduled for 20-30 minutes, but Dan was holding forth well into the full hour.  This may have caused Scott Singer a bit of stress as he was often following Dan with a Super Demo on the ODS Graphics Editor and Designer.

Scott_800Scott's Super Demos on Designer and Editor were also well attended with many in the audience wondering why they were only now hearing about these tools.  Designer has been included with SAS since 9.2M3, and been available off the Tools menu since SAS 9.3.

Designer is an interactive graph creation tool using which you can create many common graphs with a point-n-click GUI interface.  Scott also demonstrated the "Auto Chart" feature in Designer allowing you to create literally hundreds of graphs from your selected data and variables in minutes.  Designer generates the required GTL for the graph that can be viewed as the graph is being created, and the code can be copied and pasted into the Program window for further customization.  If you have not seen it yet, click on Tools->ODS Graphics Designer to launch the application.

Kirk_Lafler_2It is always a pleasure to meet with you folks and discuss the ways in which you are using the graphics tools, your pains and innovative solutions.   Kirk presented a paper on  building an interactive dash board using ODS Graphics.  He showed ways to include bar charts and pie charts in the display with URL links in each.  There seems a lot of potential to create innovative dashboards using these tools included in Base SAS.   Previously, I had taken a stab at creating some Dashboard widgets using SGPlot.  The picture on the left with Kirk is at the Kennedy Memorial on the way back from the awesome R J Mexican restaurant in West End.

An exciting new development is the ability to include "native" Excel charts in the Excel destination using the new MSCHART procedure.  This procedure will be released preproduction with SAS 9.4M3 in summer.  Scott Huntley and Nancy Goodling presented multiple Super Demos on this topic and got an enthusiastic reception from many of you who frequently send output to Excel spreadsheets.

We were gratified to attend papers on graphics by Philip Holland, Jeffrey Meyers, Susan Slaughter and Lora Delwiche, Rebecca Ottesen, Chuck Kincaid, Kirk Lafler, LeRoy Bessler and many more.  Prashant and I presented papers describing new features in SAS 9.4.

Here are sone of graph papers that come to mind.

1601 - Nesting Multiple Box Plots and BLOCKPLOTs Using Graph Template Language and Lattice Overlay - Greg Stanek.

2242 - Creative Uses of Vector Plots Using SAS® - Deli Wang.

2441 - Graphing Made Easy with SGPLOT and SGPANEL Procedures - Susan Slaughter and Lora Delwiche.

2480 - Kaplan-Meier Survival Plotting Macro %NEWSURV - Jeffrey Meyers.

2686 - Converting Annotate to ODS Graphics. Is It Possible? - Philip Holland.

2986 - Introduction to Output Delivery System (ODS) - Chuck Kincaid.

2988 - Building a Template from the Ground Up with Graph Template Language - Jed Teres.

3080 - Picture-Perfect Graphing with Graph Template Language - Julie VanBuskirk.

3193 - Mapping out SG Procedures and Using PROC SGPLOT for Mapping - Frank Poppe.

3419 - Forest Plotting Analysis Macro %FORESTPLOT - Jeffrey Meyers.

3432 - Getting Your Hands on Reproducible Graphs - Rebecca Ottesen.

3487 - Dynamic Dashboards Using SAS® - Kirk Lafler.

3518 - Twelve Ways to Better Graphs - LeRoy Bessler.

SAS1748 - Lost in the Forest Plot? Follow the Graph Template Language AXISTABLE Road! - Prashant Hebbar.

SAS1780 - Graphs Are Easy with SAS® 9.4 - Sanjay Matange

PharmaSUG 2015 is just around the corner in Orlando next week.  I look forward to more presentations on innovative usage of graphics.  I will present a 1/2 day seminar on "Clinical Graphs using SG Procedures" on Wednesday.  Hope to see you there.






Post a Comment