Big Data Visualization

Big data is a popular topic, with multiple articles about the analysis of the same.  Today, "Big Data" is measured in multiple of Tera Bytes, and SAS provides special software for analysis and visualization of Big Data - Visual Analytics.

HeatMapWhen data is very big, it may be meaningless, let alone inefficient, to plot a scatter plot of such data. This is especially true when the data is on a server, and we want to create a X-Y plot on a local computer.  Bringing all the data down to plot is prohibitive, and the result is not very helpful.

With the release of SAS 9.40M3 this week, the SGPLOT procedure introduces the HEATMAP statement, a plot type suited for visualization of bigger data.  In this case, the data can be analyzed and binned into discrete bins along X and Y axis, and the results displayed using a color gradient.

The graph above shows a heat map of the distribution of the subjects in a study for Diastolic and Systolic blood pressure.  Admittedly, this graph is of a relatively small data set "sashelp.heart".  This data set has about 5200 observations, which is small from a "Big Data" perspective.  But for our purposes, we can assume we have a data like this for millions of subjects or billions of credit card transactions.  The binning of the data is done on a fast server, along with the computation of the regression fit.  Only the "graphical" information for drawing the bins and the curve are sent to the renderer to creating this graph.

title 'Distribution of Blood Pressure';
proc sgplot data=sashelp.heart;
  heatmap x=systolic y=diastolic / colormodel=(white green yellow red)
          nxbins=40 nybins=30 name='a';
  reg x=systolic y=diastolic / nomarkers degree=2 legendlabel='Fit';
  gradlegend 'a';
  keylegend / linelength=20 location=inside position=topright noborder;

HeatMapCLThis graph now allows us to view the blood pressure distribution of the subjects in a study.   The Heat Map statement works seamlessly with most other statements available in the SGPLOT procedure, so we can plot a regression plot on the heat map as easily as we did on the scatter plot.  In the graph above, I have set a custom color model for the display of the frequency data, starting with white to green to yellow to red, as displayed in the gradient legend on the right.  A discrete legend is displayed identifying the Fit plot.  This results in a nice, clean graph.

We can go a step further, and display the confidence and prediction limits on the heat map as shown on the right.  Once again, the same options are used as would be in the case of a scatter plot.

NumHeatMapResponseFor both of these graphs, the X and Y axis represent continuous, numeric data.  The data is binned into a set number of bins by default as determined by the underlying analytical code.  Bin counts can be controlled as we we have done using the statement options.

Heat Maps are also useful to view response data for the binned data, as shown in the graph on the right.  Here, we have a heat map of weight by height of the subjects in the study.  However, now each bin now shows the Mean of the Cholesterol level for all the subjects in the bin.  This show us the associations between Cholesterol by two analysis variables.

Another interesting use case would be to visualize the credit card balance for all customers of a bank by family income and value of the mortgage.

DiscreteHeatMapResponseThe SGPLOT heat supports numeric axes and discrete axes, and any combination of the two.  The graph on the right displays the mean MSRP value of the cars by Type and Make.  Both axes are discrete, and each bin displays the mean value of MSRP for all the observations in the bin.

Heat Maps have been supported in GTL, and you can find previous articles on GTL Heat Maps and Calendar Heat Maps.

SAS 9.4M3 code for Heat Maps:  HeatMap

Post a Comment

Row Lattice Headers

The SGPANEL procedure makes it easy to create graph panels that are classified by one or more classifiers.  The "Panel" layout is the default and it places the classifier values in cell headers at the top of each cell.

RowLatticeWhen using LAYOUT=Latice or RowLattice, the row headers are placed at the right side of each row, and the header text is rotated as shown in the example on the right.  The graph shown the distribution of Cholesterol and the panel variable (classifier) is "DeathCause".  Three cells are created and each cell displays the value of "DeathCause" on the right.

There are two obvious problems with this arrangement.

  1. Long text strings in the header are truncated, as for "Coronary Heart Disease" and "Cerebral Vascular Disease".
  2. The text strings are displayed in a vertical orientation that is hard to read.

Users have often complained about this, as admittedly, this is not a ideal arrangement.   The SAS code is included below.  Note the use of OFFSETMIN=0 for ROWAXIS, and usage of SPACING=10 for the cells.

proc sgpanel data=heart noautolegend;
  panelby deathcause / layout=rowlattice onepanel novarname spacing=10;
  histogram cholesterol;
  density cholesterol;
  rowaxis offsetmin=0;
  colaxis max=420;

RowLatticeInsetSAS 9.4M2 release provides a way to improve the arrangement of such a graph.  Here is a variation where I have suppressed the row headers entirely, and used the INSET statement to display the "DeathCause" values inside the cell at the top left.

The variable provided for the inset statement should have the values we want in each cell to be match merged with the panel by row variable.  In this case we are using the classifier variable itself.  Even though the column has the values repeated multiple times in the data, the value is drawn only once, and from the first observation only.

The NOHEADER option suppresses the row headers.  The INSET statement with column "DeathCause" inserts the text value into the top left of the cell.  In the case of this distribution plot, empty space is often available at the upper corners of the cell.  If not, you can add some offset to the top of the ROWAXIS.

proc sgpanel data=heart noautolegend;
  panelby deathcause / layout=rowlattice onepanel noheader spacing=10;
  inset deathcause / position=topleft nolabel;
  histogram cholesterol;
  density cholesterol;
  rowaxis offsetmin=0;
  colaxis max=420;

To draw the eye to the classifier value, the inset can be highlighted by using a background color or a border on the INSET statement as shown below left. Below right we have a 2x3 panel, showing both the row and column classifiers as insets. Note, I have added the "Death Cause" first since it has long textual values. I also added a OFFSETMAX=0.15 to create some space at the top of each cell.




Full SAS 9.4M2 Code: Lattice

Post a Comment

Attributes Priority for the Inquiring Mind

When ODS Graphics was first released with SAS 9.2 in 2008, a conscious effort was made to create graphs that were consistent and aesthetically pleasing out of the box.  Features in the graph derive their visual attributes from the active Style.  When Group classifications are in effect, the different classification levels of the group variable are represented on the screen using the attributes from the GraphData1 - GraphData12 elements of the Style.

AttrPriority_ListingThese attributes were carefully designed so the 12 colors are distinct from each other. The groups use up to 11 line patterns and 7 marker symbols. For each group value, the color, marker symbol and pattern are derived sequentially from these lists of 12 colors, 11 patterns and 7 symbols.  So the first group level gets the first color, first pattern and the first symbol.  The second group level gets the second color, second pattern and the second symbol.  This goes on till we run out of the list of symbols (there are only 7).  So, the eighth group level will get the eighth color, eighth pattern and the first symbol.  This goes on in this manner so we can have 84 distinct colored symbols and 132 distinct colored patterns.

The graph above uses the LISTING style.  The list of marker symbols has been changed to include filled markers.  Here you can see the assignment of colors, line patterns and marker symbols for each of the three group values.  Click on the graph to see a higher resolution image.

ods listing style=listing;
title 'Style=Listing'; 
proc sgplot data=seriesGroup;
  styleattrs datasymbols=(circlefilled trianglefilled squarefilled);
  series x=date y=val / group=drug lineattrs=(thickness=2);
  scatter x=date y=val2 / group=drug filledoutlinedmarkers 
  keylegend / title='' location=inside position=topright across=1;
  xaxis display=(nolabel);
  yaxis display=(nolabel) integer values=(4 to 20 by 4);

Note the use of FilledOutlinedMarkers in the Scatter plot.  Also, I have used the SAS 9.4 STYLEATTRS feature to change the group symbols to the list of three filled symbols.

AttrPriority_HTMLBlueSoon it was perceived that it is not always necessary to change all the attributes of the element for each group value.  This was especially true for the line patterns.  When using a color Style, it was felt that it was not necessary to change both line color and pattern, but only the color till all colors from the list are used.

The graph above is created using the SAS 9.3 HTMLBlue style.   In this style, the cycling of the attributes (color, symbol and line pattern) is different from the LISTING style.  As you can see, all the three groups have solid line patterns and circle marker.  Only the line color is changed per group.  So, the first 12 group values get the 12 different colors from the Style, along with the first line pattern and first symbol.  The 13th group level will get the 1st color with the 2nd line pattern and 2nd symbol.  Most of the time we only have a handful of group levels, so only color change is seen.

I recall seeing a presentation where the presenter was baffled on why he was not seeing different marker symbols for his scatter plot when he ran his SAS code, but was seeing only circle markers with different colors.  This was because, while at home he was using SAS 9.2, the presentation laptop had SAS 9.3 with the default destination of HTML with the HTMLBlue style.  He had to change the style back to LISTING to see the different shaped markers.

This behavior of the different Styles is called Attribute Priority.  The default AttrPriority is NONE, meaning that all the attributes are cycled together as for the LISTING style.  HTMLBlue has AttrPriority=Color.  This means that only the color attribute is cycled first holding the symbol and pattern constant till all the 12 colors are use up.  Then, we go to the second symbol and second pattern and cycle through all 12 colors again.

While this behavior was first introduced in SAS 9.3, this AttrPriority behavior was internally implemented.  With SAS 9.3M1, the AttrPriority option was surfaced in the Style.  With SAS 9.4, AttrPriority option was surfaced in the ODS Graphics statement.

AttrPriority_HTMLBlue_NoneNow, with SAS 9.4, you can make any Style behave in any attribute priority you want by setting the AttrPriority= option in the ODS Graphics statement.  Here is the HTMLBlue style with the AttrPriority set to NONE.  Now, all the visual elements come from the HTMLBlue Style (except the overridden symbols), but now all the attributes are cycled together.  So, the 2nd group value now gets a dashed line pattern and the TriangleFilled symbol.

ods listing style=htmlblue;
ods graphics / attrpriority=none;
title 'Style=HTMLBlue (Attrpriority=None)'; 
proc sgplot data=seriesGroup;
  styleattrs datasymbols=(circlefilled trianglefilled squarefilled);
  series x=date y=val / group=drug lineattrs=(thickness=2);
  scatter x=date y=val2 / group=drug filledoutlinedmarkers 
  keylegend / title='' location=inside position=topright across=1;
  xaxis display=(nolabel);
  yaxis display=(nolabel) integer values=(4 to 20 by 4);

AttrPriority_Analysis_ColorHere is an example of the same graph with the ANALYSIS style with AttrPriority=COLOR.  Note, in this case, both line pattern and marker color are held constant while color changes.

Often, one really does want the colors and symbols to change with group level, but not the line pattern.  This could be another value for the AttrPriority option (future).  But currently, we have only provided for AttrPriority of NONE and COLOR.

AttrPriority_Analysis_SolidTo create graph where the colors and symbols for groups change but not the line pattern, you will have to use a AttrPriority=NONE and hold the pattern by setting it to SOLID in the series plot.  Sure, this is not as good as having a value for AttrPriority that could do that for you, but that will have to wait till there is a strong demand for it.  Note, in the graph on the right, the color and symbols are changing, but the line pattern is held constant by setting lineattrs=(pattern=SOLID) in the code.

Full SAS 9.4 Code:  AttrPriority

Post a Comment

Bubble Plots

Bubble_Linear_SG3Bubble Plots provide additional ways to visualize your data.  The plot supports display of multiple response characteristics of the data in one graph.  Bubble plots were introduced with SAS 9.3 in GTL and SG Procedures.

A bubble is drawn at each (x, y) point in the graph, and each bubble is sized based on a third column.  Bubbles can be grouped by a classifier as shown here, or can be colored by a numeric response variable.

In the example above, we have specified an aspect=0.7,but this is not necessary.  Note, we have also used some special labeling to see how the markers sizes are scaled.  The graph is shown on the right.  Click on the graph for a higher resolution image.  The SGPLOT code is shown below, where I have used an additional TEXT plot to display some data in the graph.

proc sgplot data=bubble noautolegend aspect=0.7;
  bubble x=x y=y size=size / group=type datalabel=linlbl splitchar='-' 
         dataskin=gloss nooutline;
  text x=x y=y text=size / position=center;
  xaxis min=0 max=100 offsetmin=0 offsetmax=0.1 display=(nolabel) grid;
  yaxis min=0 max=70 offsetmin=0 offsetmax=0.1 display=(nolabel) grid;

The bubbles are sized based on the SIZE role shown in the code above.  By default, the sizing is done using a "Linear" scaling.  The smallest bubble size (on screen) has a diameter of the default marker size (7 px) and the largest bubble has a diameter of the three times the default marker size (3*7 px = 21 px).  The observation with the smallest value for "Size" gets the smallest bubble (7 px), and the observation with the largest value for "Size" gets the biggest bubble (21 px).   The on-screen size for the smallest or largest bubble can be set using the BRADIUSMIN and BRADIUSMAX options.  All other observations get a size between these two, scaled by the area of the bubble.  This is the default "Linear" scaling method.  More on this later.

Bubble_Abs_SGThe graph above shows a bubble plot with "Relative" scaling.  This means that the bubble sizes have no direct association with the dimensions on the axes.   They are sized as noted above, relative to each other.

Another useful way to see a bubble chart is where the size values are relative to the axis values.  In this case, a size of 10 means the bubble should have a radius of 10 units along the each axis.  Such a graph is shown on the right.

In this graph, each bubble has a size on the screen such that the radius of the bubble represents the distance along the axis.  So, the bubble with size=13 is centered at (50, 10), and has a radius of 13 units.  Such graphs are very useful when the observations represent some physical entity in geographic space, and the X and Y axes are equated.  In this graph we have set an ASPECT=0.7 and set the axes such that they have an aspect of 0.7.   Note the use of the absscale option and the grid lines create a mesh of square regions.

proc sgplot data=bubble noautolegend aspect=0.7;
  bubble x=x y=y size=size / group=type datalabel=size datalabelpos=center 
         <strong>absscale</strong> dataskin=sheen nooutline datalabelattrs=(size=10);
  xaxis min=0 max=100 offsetmin=0.05 offsetmax=0.1 display=(nolabel) grid;
  yaxis min=0 max=70 offsetmin=0.05 offsetmax=0.1 display=(nolabel) grid;

Bubble_Prop_GTLLet us take another look at the issue of "Linear" scaling in the graph at the top.  Here, the relationship between different values can be a bit confusing.    A bubble for an observation of size 2x will not be twice the size of the bubble for obs with size x.

It is often useful to have a graph where an observation with size=100 will be drawn with a bubble area twice as much as the bubble for an observation with size=50.  This scaling is called "Proportional", as shown in the graph on the right.

In the graph on the right, the "Size" is shown in the middle of the bubble,  The "Value Area" and the "Pixel Area" are shown in the outer label.  Now, the bubble of size 13 is only a little smaller than the bubble of size 15.  If we had a bubble of size 7.5, its area would be exactly half of the bubble with size 15.  In this method of scaling, the scaling line passes through zero and the max value.  So, observations with a response value of zero can (technically) have an area of zero.  However, BRADIUSMIN is used as a cutoff value to draw something on the screen.

The RELATIVESCALETYPE option can be used to get this last graph.  Except, this option is not currently available with the SGPLOT Bubble Plot statement.  If you need to create a bubble plot with proportional scaling, you will need to use the GTL version shown below.

/*--Template for Bubble Chart with Proportional scaling--*/
proc template;
  define statgraph Bubble;
      entrytitle 'Proportional Bubble Size - GTL'; ;
      layout overlay /   aspectratio=0.7 
                         xaxisopts=(display=(ticks tickvalues line) griddisplay=on 
                           linearopts=(viewmin=0 viewmax=100) offsetmin=0 offsetmax=0.1)
                         yaxisopts=(display=(ticks tickvalues line) griddisplay=on
                           linearopts=(viewmin=0 viewmax=70) offsetmin=0 offsetmax=0.1);
         bubbleplot x=x y=y size=size/ group=type datalabel=PropLbl 
                relativescaletype=proportional datalabelsplit=true 
                datalabelsplitchar='-' name='a' dataskin=sheen display=(fill);
        textplot x=x y=y text=size / position=center;
/*--Bubble Chart with Proportional scaling--*/
proc sgrender data=bubble template=bubble;

In this case, we can actually use the GTL LAYOUT OVERLAYEQUATED.  This layout ensures that each axis uses the same pixel to data scale, so a value interval of 10 units is represented by 10 pixels on each axis.

Full SAS 9.4 code:  Bubble

Scaling Diagrams (by Rick Wicklin):  Scaling_Diagram



Post a Comment

Is that Annotate?

The SGPLOT procedures includes features to add annotations to your graph in many different ways.  Annotations provide you a flexible way to add features to your graph that are not available through the standard plot statements.

Survival_Prognosis2Recently, I saw this graph on the web that caught my attention.  Clearly, this looks like a good candidate to use Annotate to create the arrows that explain the behavior of cancers with different severity of aggressiveness.

SAS 9.4M2 release of SGPLOT procedures also includes the POLYGON plot that can handle many such tasks.  The Polygon plot is a unique statement that behaves like annotation where it will draw for you any figure you define as a polygon on the graph. The plot statement can be interleaved with other basic plot statements and can negotiate the coordinate space with the graph axes.

PropnosisHere, I created the same graph using the Series plot and the Polygon plots of the SGPLOT procedure.  The survival percentages over time for patients with different category of cancers are displayed using a Series plot with a Group role.

The arrows with the text explaining the behavior of the cancers are drawn using the Polygon plot using a "Id" role.

In this case, I have defined the data for the curves as Alive * Time by Severity.  Then, I created another data set "Arrows" to define the two arrows using (x, y) coordinates for each vertex by "Id".  There are two arrows with ID=1 and 2.  A label is also defined for each polygon.

Now, I use the Series statement to draw the three curves, and the Polygon statement to draw the polygons.  Note the long Y axis label is automatically split.

proc sgplot data=both;
  series x=time y=alive / group=severity smoothconnect 
         lineattrs=(thickness=4) nomissinggroup name='a';
  polygon id=id x=x y=y / fill outline label=label 
          labelpos=center nomissinggroup splitjustify=center 
          fillattrs=(color=lightblue transparency=0.5) 
          labelattrs=(size=8) splitchar=',';
  xaxis grid values=(0 to 72 by 12) offsetmin=0 offsetmax=0;
  yaxis grid values=(0 to 1.0 by 0.2) offsetmin=0 offsetmax=0.01;
  keylegend 'a' / title='' position=top linelength=20 noborder;

The Polygon plot also displays the polygon label in many different ways.  Here it is displayed at the center of the polygon bounding box, using "," as the split character to wrap the long label within the body of the arrow.  The text has a horizontal orientation, and thus easier to read.  Rotated text can also be displayed if necessary.

PropnosisLblOften, it may be preferred to display the labels for each curve in the plot itself, thus eliminating the need for a legend.  This is often leads to a graph that is easier to decode as it is no longer necessary to look back and forth between the curves and a legend.  The curves are labeled where the eye is already.

Reducing eye movement necessary to decode the information in the graph leads to a more "effective" graph.

The answer to the question in the title then is: "No, it is the Polygon Plot".

Full SAS 9.4M2 Code:  Prognosis

Post a Comment

Report from PharmaSUG 2015

PharmaSUGPharmaSUG 2015 in Orlando was held at the Renaissance had a record breaking attendance of over 650.  Weather was great, except for a huge downpour on the evening of the last day.  All the popular presenters were in attendance including Art Carpenter, Kirk Lafler, Arthur Li and many others.

Presentations on graphics were aplenty,  using SG procedures, GTL, SAS/GRAPH and Annotate.  I tried to attend as many as I could but did not get to all of them due to Super Demo duty.  What got me really fired up was the creative ways in which users are utilizing the features of SG procedures and GTL to build their custom graphs.

SankeyBarOne standout example was the Sankey Bar Chart by Shane Rosanbalm of Rho Inc, a CRO right here in Chapel Hill.  The graph shows the the subject disease severity over visits at Baseline, 12, 30 and 60 months.  The stacked bars in the graph on the right shows the % occurrence of the disease by severity.

However, Shane also wanted to see the change in the severity over visits.  So he came up with this unique custom visual, depicting the severity % by visit and also the flow of subjects from one severity value to another.

Multi_Cell_GTL2The winner of the Data Visualization presentations was the paper by Creating Sophisticated Graphs using GTL by Kaitlyn McConville and Kristen Much, also of who presented a paper on building complex graphs using GTL.  The authors used a step-by-step method to explain the process, thus de-mystifying the learning curve.  It was good to see more and more users turning to GTL to create their graphs, willing to trade a little bit more programming complexity to obtain sophisticated graphs.

ForestPlot2Janette Garner of Gilead Sciences Inc presented an Enhanced Forest Plot macro using SAS.  It was gratifying to see some of the material previously presented in this blog taken further and put to use in a real world example.  Janette extended the traditional Forest Plot by adding a Bar Chart of the actual values.

Janette used GTL with a Layout Lattice to create the multi-cell layout.  She nested another Layout Lattice in the SideBar of the first one to define the graph headers.

Variable_WidthShe used the HighLowPlot in different ways to draw many elements of the graph, including the subgroups and labels with indentations on the left, the bar chart itself, the bar labels, the odds ratio and the labels on the right.  This truly shows the flexibility of this plot statement.

Songtao Jiang displayed a creative usage of the GTL Series Plot to create a Variable Width Plot shown above.

WaterFall_By_DoseMurali Kanakenahalli and Avani Kaja of Seattle Genetics showed how to create multiple graphs for Oncology Trials using GTL.  This included Kaplan-Meier graphs, Waterfall Charts by Dose Group, a Swimmer plot and more.

Here is the link to the data visualization section of the conference proceedings for these and other excellent papers.





Post a Comment

Report from SGF 2015

SGF_2015_Logo_2SGF 2015 was a blast with a focus on Visual Analytics, SAS Studio, Hadoop and more.  Graphs were everywhere, and it was a banner year for ODS Graphics with over 15 papers and presentations by users on creating graphs using SG Procedures, GTL and Designer.

Dan Heath, Prashant Hebbar, Scott Singer and I were alternatively manning the ODS Graphics station and the Super Demo station.  We had a steady stream of users sharing their experiences with these graph tools.  The general feed back was awesome, and we were impressed with the level at which you folks have adopted these tools and using them to create graphs.  I was pleasantly surprised some of you already using the SAS 9.40M2 features, including the TextPlot (looking at you, Jim!).

Dan_800Dan presented the new features in SG Procedures 4-5 times.  Here he is expounding upon the sorting features in the SGPanel procedure.

Normally, Super Demos are scheduled for 20-30 minutes, but Dan was holding forth well into the full hour.  This may have caused Scott Singer a bit of stress as he was often following Dan with a Super Demo on the ODS Graphics Editor and Designer.

Scott_800Scott's Super Demos on Designer and Editor were also well attended with many in the audience wondering why they were only now hearing about these tools.  Designer has been included with SAS since 9.2M3, and been available off the Tools menu since SAS 9.3.

Designer is an interactive graph creation tool using which you can create many common graphs with a point-n-click GUI interface.  Scott also demonstrated the "Auto Chart" feature in Designer allowing you to create literally hundreds of graphs from your selected data and variables in minutes.  Designer generates the required GTL for the graph that can be viewed as the graph is being created, and the code can be copied and pasted into the Program window for further customization.  If you have not seen it yet, click on Tools->ODS Graphics Designer to launch the application.

Kirk_Lafler_2It is always a pleasure to meet with you folks and discuss the ways in which you are using the graphics tools, your pains and innovative solutions.   Kirk presented a paper on  building an interactive dash board using ODS Graphics.  He showed ways to include bar charts and pie charts in the display with URL links in each.  There seems a lot of potential to create innovative dashboards using these tools included in Base SAS.   Previously, I had taken a stab at creating some Dashboard widgets using SGPlot.  The picture on the left with Kirk is at the Kennedy Memorial on the way back from the awesome R J Mexican restaurant in West End.

An exciting new development is the ability to include "native" Excel charts in the Excel destination using the new MSCHART procedure.  This procedure will be released preproduction with SAS 9.4M3 in summer.  Scott Huntley and Nancy Goodling presented multiple Super Demos on this topic and got an enthusiastic reception from many of you who frequently send output to Excel spreadsheets.

We were gratified to attend papers on graphics by Philip Holland, Jeffrey Meyers, Susan Slaughter and Lora Delwiche, Rebecca Ottesen, Chuck Kincaid, Kirk Lafler, LeRoy Bessler and many more.  Prashant and I presented papers describing new features in SAS 9.4.

Here are sone of graph papers that come to mind.

1601 - Nesting Multiple Box Plots and BLOCKPLOTs Using Graph Template Language and Lattice Overlay - Greg Stanek.

2242 - Creative Uses of Vector Plots Using SAS® - Deli Wang.

2441 - Graphing Made Easy with SGPLOT and SGPANEL Procedures - Susan Slaughter and Lora Delwiche.

2480 - Kaplan-Meier Survival Plotting Macro %NEWSURV - Jeffrey Meyers.

2686 - Converting Annotate to ODS Graphics. Is It Possible? - Philip Holland.

2986 - Introduction to Output Delivery System (ODS) - Chuck Kincaid.

2988 - Building a Template from the Ground Up with Graph Template Language - Jed Teres.

3080 - Picture-Perfect Graphing with Graph Template Language - Julie VanBuskirk.

3193 - Mapping out SG Procedures and Using PROC SGPLOT for Mapping - Frank Poppe.

3419 - Forest Plotting Analysis Macro %FORESTPLOT - Jeffrey Meyers.

3432 - Getting Your Hands on Reproducible Graphs - Rebecca Ottesen.

3487 - Dynamic Dashboards Using SAS® - Kirk Lafler.

3518 - Twelve Ways to Better Graphs - LeRoy Bessler.

SAS1748 - Lost in the Forest Plot? Follow the Graph Template Language AXISTABLE Road! - Prashant Hebbar.

SAS1780 - Graphs Are Easy with SAS® 9.4 - Sanjay Matange

PharmaSUG 2015 is just around the corner in Orlando next week.  I look forward to more presentations on innovative usage of graphics.  I will present a 1/2 day seminar on "Clinical Graphs using SG Procedures" on Wednesday.  Hope to see you there.






Post a Comment

Difference can be misleading

A very common type of graph contains two series plot, where the user is expected to evaluate the difference visually.

2015Blog_NYT_Malpractice2I saw one such plot on the web today shown on the right.  This graph has two curves, one for malpractice premiums and one for claims, with a shaded band in the middle.  The shaded region represents the difference, or the profit made by the companies issuing the insurance.

What caught my eye was the multiple elements in the graph the often requires the usage of annotation to pull off.   The graph features the following:

  • The two series plot of the data.
  • The shaded band in between.
  • The labeling for each plot and the band.
  • Axis on the right.
  • Grid lines that only go up to the Premium plot.
  • Title and a "story" that this graph is telling.

Normally, I try to avoid using annotation to create a graph unless it is indispensable.  Annotation is harder to use and not scalable to different situations, and should be used sparingly.  So, I set about to see if I could make this graph using SAS 9.4M2 SAS SGPlot procedure without use of annotation.

Premiums2The resulting graph is shown on the right.  First of all, I had to eyeball the data in the graph above to extract the data.  Not too much work.  Then, I used the SAS9.4M2 features of the SGPLOT procedure to create the graph.  Click on the graph for a higher resolution image.  Pretty close, don't you think?

Here is what I used to create the graph:

  • StyleAttrs to set the two colors and the two markers (the left and right triangles).
  • A series plot to draw the upper curve with Y2 axis.
  • A series plot to draw the upper curve with Y2 axis.
  • A band plot to draw the shaded area with Y2 axis.
  • A band plot with white color to cover the grid lines.
  • One label for each line and band.
  • Inset for the "story" the graph is telling.
  • No annotation.


One problem with evaluating differences visually is the eye sees difference as the "shortest" distance between the curves.  The actual difference we are plotting for any year is the "vertical" distance.  These two are not the same.  While the two plots pinch together in two places in the graph, the actual minimum vertical distance is larger than what the eye sees.

The graph on the right adds faint vertical lines in the banded area. These lines help the eye see the vertical distance instead of the smallest distance.  We have done that by layering a HighLow plot on top of the band using default Type=line.  At the pinch near 1985 the vertical difference is almost 50% larger than what the eye sees as the closest points on the two lines.

Here is the SGPLOT code:

title h=20pt 'Ahead of the Curve';
footnote j=l 'Source:  A. M. Best';
proc sgplot data=premiums noborder noautolegend;
  styleattrs datasymbols=(triangleleftfilled trianglerightfilled);
  highlow x=year low=claims high=premium / y2axis lineattrs=(color=verylightgray);
  band x=year lower=premium upper=10.1 / y2axis fillattrs=(color=white);
  band x=year lower=claims upper=premium / y2axis 
       fillattrs=(color=lightgray transparency=0.7);
  series x=year y=claims / y2axis lineattrs=(thickness=3 color=darkgreen);
  series x=year y=premium / y2axis lineattrs=(thickness=3 color=olive);
  scatter x=year y=yl / y2axis group=grp markerattrs=(color=black) nomissinggroup;
  text x=year y=yl text=label1 / y2axis splitpolicy=splitalways splitchar=',' 
       position=right contributeoffsets=none textattrs=(size=9);
  text x=year y=yl text=label2 / y2axis splitpolicy=splitalways splitchar=',' 
       position=left contributeoffsets=none textattrs=(size=9);
  text x=year y=yl text=label3 / y2axis splitpolicy=splitalways splitchar=','
       contributeoffsets=none textattrs=(size=9 style=italic);
  xaxis minor minorcount=4 offsetmin=0 values=(1975 to 2003 by 5) min=1975 valueshint;
  y2axis display=(noticks noline) grid gridattrs=(color=gray) min=0 valueshint 
         offsetmin=0 values=(2 to 10 by 2)
         gridattrs=(pattern=dash) label='(Billions)' labelpos=top;
  inset 'Medical malpractice premiums' 'have soared in recent years,' 
        'outpacing the rise in payments' 'for malpractice claims.' / 
        position=topleft textattrs=(size=10);

DifferenceNote the use of the following features in the graph.

  • Text plot is used instead of the usual scatter plot with markerchar to place the labels.  The text plot is specialized for text and has custom options include ContributeOffset.
  • X axis has minor ticks and minor tick count.
  • Y2 axis places the axis label on top instead on side.

To make your graph more effective, it is better to display the actual derived value directly, instead of relying on each consumer of the graph to evaluate the difference accurately.  So, I added a green band showing the actual difference between Premiums and Claims.

Full SAS 9.4M2 code: Premiums

Finally, next week is SAS Global Forum 2015 in Dallas.  It is a great year for data visualization with many user presentations on graphics using SG Procedures and GTL.  Visual Analytics is also on display.  We will be there to meet with you, answer your questions and to hear your pains.  See you at SGF in Dallas.

Post a Comment

Micro Maps

MicroMaps are a powerful way to display data where the display includes small, lightweight maps to provide geographical information regarding the data.  This geographical information gives clues to the relationship between the data that could lead to more insight.

The SAS SG Procedures and GTL do not currently have built-in features to create a micro map type display, however, you can still create one using the current feature set with some effort.  Let us examine how this can be done.

Map2First of all, how do you create a map using SG or GTL?  While you can use SGPLOT to create a map display, I will use GTL as we will progress towards making a micromap.  With SAS 9.40M1 the PolygonPlot statement was introduced in GTL.  A similar Polygon statement is available in SGPlot.  The Polygon Plot statement is a versatile tool to create custom displays using GTL or SG.  If you can think of a display type you want, you can do it using polygon plot.

Clearly, the purpose of this statement is to plot general polygonal shapes in your graph.  Well, a map is a special form of polygonal data, so we can use the MAPS.States data set directly to create a map using the PolygonPlot statement.

Retaining the aspect ratio of the data space is important for plotting a map.  So, the best way to create a map is to use the GTL Layout OverlayEquated.  GTL code for the map is shown below.  Click on map for higher resolution image.  Some data processing is required to for states that have multiple polygons and to project the map data.  See link below for the full code.

proc template;
define statgraph Map;
  dynamic _skin _color;
    begingraph / designwidth=6in designheight=6in subpixel=on;
      entrytitle 'USA Map by Region';
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=region display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) label=statecode;
          discretelegend 'map' / location=inside across=1 halign=right 
                    valign=bottom valueattrs=(size=7) border=false;
      entryfootnote halign=left 'Using Polygon Plot in an Layout OverlayEquated';
proc sgrender data=usapb template=Map;
 dynamic _skin="sheen" _color='Black';

Notable items in the code above are:

  • Using LAYOUT OVERLAYEQUATED container.  This ensures the aspect of the data is retained regardless of the dimensions of the graph container.
  • Wall, X andY axes are suppressed.
  • Using PolygonPlot to draw the polygons of the map.  We have used GROUP=Region, so polygons in each region are colored the same.  We could use GROUP=State to color each state differently.
  • The polygon plot can draw the label in various locations.  Here they are drawn in the center of the bounding box.  It is possible we could add an option to draw the label at the weighted center of the polygon.
  • Usage of a skin gives the embossed effect for each state.

MicroMaps2Now, let us take the next step to draw a column of micro maps as shown on the right.  Click on the graph for a higher resolution image.  What we have done here is created three rows of the same map, but each map highlights the states in on of three regions - NorthWest, SouthWest and South.  The GTL template now has more code to address the three cells in a LAYOUT LATTICE.

While the code below looks long, you will see that it is highly structured, and once you understand how one cell (Row) is defined, the other rows are similar.

We use a LAYOUT LATTICE with one column.  The 3 rows result from the fact that we have added three cells, each defined by the LAYOUT OVERLAY - ENDLAYOUT block.

We have colored a set of states by the region they belong in.  Other states have the missing color.   We have displayed the state names for only the states in the region.

To color the regions, we have defined a Discrete Attributes Map, which defines the color by the name of the region.  To display state names only for the region, we have used an expression that returns a state label only if the region is the one specified.  This is a powerful way to subset the data in the template.


proc template;
  define statgraph MicroMaps;
  dynamic _skin _color;
    begingraph / designwidth=4in designheight=6in subpixel=on;
      entrytitle 'Revenues by Region and Product';
      discreteattrmap name="states" / ignorecase=true;
         value "NorthWest"  / fillattrs=graphdata1; 
	 value "SouthWest"  / fillattrs=graphdata2;
         value "South"      / fillattrs=graphdata3;
      discreteattrvar attrvar=southfill var=south attrmap="states";
      discreteattrvar attrvar=northwestfill var=northwest attrmap="states";
      discreteattrvar attrvar=southwestfill var=southwest attrmap="states";
      layout lattice / columns=1;
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=northwestfill display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) 
          entry halign=right 'NorthWest' / textattrs=(size=7); 
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=southwestfill display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) 
		  entry halign=right 'SouthWest' / textattrs=(size=7); 
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=southfill display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) 
          entry halign=right 'South' / textattrs=(size=7); 
     entryfootnote halign=left 'Using Polygon in a Layout Lattice' / 

MicroMapBarFinally, We will combine some other relevant data to the display.  Here I have shown a Horizontal Bar of Revenues by Product in each region.  Clearly, this can be used to show different species of animals, local trees, or health data by region.

The graph can be any graph that can be placed in a Layout Overlay, such as scatter, series, histogram, etc.  The possibilities are endless.  Here I have added only one cell in addition to the map, but you can have any number of cells in a row.

Clearly, this requires us to create a data set that is a combination of the map and the data needed for the bar chart or any other plot.  Some creative coding is needed to get all the data in one data set such that the different items are still clearly accessible to the plot statements.

Just like we did with AXISTABLE, once we have a better understanding of the different ways you could use this type of a graph, we could develop a statement or features to make this process easier.  We would love to hear from you on how you might use such a graph.  Please feel free to chime in.

Full SAS 9.40M2 program:  MicroMaps


Post a Comment

Conditional Highlighting - 2

Back in late 2012 I discussed a technique for Conditional Highlighting, where additional attributes can be displayed in a graph.

ConditionalHighlightingSkinIn the previous article the goal was to display a graph of Response by Year by Drug.  We used a cluster grouped bar chart to create the bar chart.  We also wanted to tag cases where the sample size was lower than a threshold, and we did that by adding a cross hatch pattern for such cases.  Click on the graph for a higher resolution image.

So, the idea is that if the available features in a graph are already used up to show some data attributes, how can we add more features to the graph to display additional attributes.  These attributes are added based on other conditions, and hence the term "Conditional Highlighting".

With SAS 9.4M1, additional features are supported in the SG Procedures to do more such things.  Specifically, I am referring to the ability to create marker symbols from images and characters of a font.  I discussed this in  the article on Marker Symbols.  Let us use this feature to add other attributes to a graph based on some conditions.

Conditional_1The graph on the right displays the Sales by Person, and also displays the gender of each sales person using a bar chart.  We also want to display if the sales person is under, over or well over the projected performance.  We have done that by adding an icon near the top of the bar.  The icon has three versions, with sad or happy faces.

We have done this by layering a scatter plot on the bar chart.  We used the VBarParm to display the bar chart as we have summarized data, and VBarParm allows layering with other basic plots.  Here is the SAS9.4 code.

title 'Sales and Status by Sales Person';
proc sgplot data=sales;
  symbolimage name=bad  image="C:\Work\Images\Conditional\Sad_Tran.png";
  symbolimage name=good image="C:\Work\Images\Conditional\Happy_Tran.png";
  symbolimage name=great image="C:\Work\Images\Conditional\VeryHappy_Tran.png";
  styleattrs datasymbols=(great good bad) datacolors=(pink cx4f5faf);
  vbarparm category=name response=sales / group=gender dataskin=gloss 
           filltype=gradient groupdisplay=cluster;
  scatter x=name y=ys / group=status markerattrs=(size=30);
  yaxis offsetmin=0 offsetmax=0 grid;
  xaxis display=(nolabel) offsetmin=0.1 offsetmax=0.1;

To do this, we computed the "Status" based on the above condition.  We defined three new symbols from the image files called "Sad_Tran.png", "Happy_Tran.png" and "VeryHappy_Tran.png" using the SymbolImage statement.  These are transparent images.  All image files are inherently rectangular in shape.  However, the picture occupies only a part of the image, like the happy face.  The pixels around that are black, or some other background color.  We have used a image processing software to make these background pixels transparent, so when the image is drawn, these transparent pixels are not displayed.

We have used the StyleAttrs statement to define our list of group markers to include only these three new symbols only using the DataSymbols option..  We have also set the two colors we want for the "Male" and "Female" group value using the DataColors option.  The StyleAttrs option allows you to define your own group attributes within the SG Procedure without having to define a new ODS Style.

We have also used the FillType=Gradient option to fill the bars with a gradient effect.  I understand usage of such effects in the graph, often referred to as "Chart Junk" a term coined by Edward Tufte, is not preferred in many domains  However, in some domains this can be useful.

Conditional_2Now, let us take another step and add another conditional attribute to the graph as shown on the right to show you the possibilities of this approach.  Here, we have added a "Blue Ribbon" for the salesperson with the highest sales.  Note, the blue ribbon may be awarded based on other conditions.

I have done that by defining another symbol using the image "Blue_Ribbon_Tran.png".  Note in this case, I have used Rotate=20 option to add some pizzazz to the visual.  Actually, markers with a few different rotation angles can be used for classifier too.

Note, in this graph, the grid lines are not visible through the bottom part of the bars any more.   FillType=Gradient uses a transparency gradient to fill the bars.  This allows the grid lines to be visible through the bottom part of the bars, where the bars are more transparent.  To prevent that, I have used another VBarParm with plain white color behind the one with gradient.  This suppresses the bleeding of the grid lines. The full code is attached below.

Combining the ability to define your own symbols, along with the ability of layering plot statements together provides you with powerful ways to create all types of graphs.  Conditional highlights can be many colored dot or swatches added to the the bars or any other element in the graph to convey more information to the reader.

Full SAS 9.4 Code:  Conditional_Highlighting


Post a Comment