Report from PharmaSUG 2015

PharmaSUGPharmaSUG 2015 in Orlando was held at the Renaissance had a record breaking attendance of over 650.  Weather was great, except for a huge downpour on the evening of the last day.  All the popular presenters were in attendance including Art Carpenter, Kirk Lafler, Arthur Li and many others.

Presentations on graphics were aplenty,  using SG procedures, GTL, SAS/GRAPH and Annotate.  I tried to attend as many as I could but did not get to all of them due to Super Demo duty.  What got me really fired up was the creative ways in which users are utilizing the features of SG procedures and GTL to build their custom graphs.

SankeyBarOne standout example was the Sankey Bar Chart by Shane Rosanbalm of Rho Inc, a CRO right here in Chapel Hill.  The graph shows the the subject disease severity over visits at Baseline, 12, 30 and 60 months.  The stacked bars in the graph on the right shows the % occurrence of the disease by severity.

However, Shane also wanted to see the change in the severity over visits.  So he came up with this unique custom visual, depicting the severity % by visit and also the flow of subjects from one severity value to another.

Multi_Cell_GTL2The winner of the Data Visualization presentations was the paper by Creating Sophisticated Graphs using GTL by Kaitlyn McConville and Kristen Much, also of who presented a paper on building complex graphs using GTL.  The authors used a step-by-step method to explain the process, thus de-mystifying the learning curve.  It was good to see more and more users turning to GTL to create their graphs, willing to trade a little bit more programming complexity to obtain sophisticated graphs.

ForestPlot2Janette Garner of Gilead Sciences Inc presented an Enhanced Forest Plot macro using SAS.  It was gratifying to see some of the material previously presented in this blog taken further and put to use in a real world example.  Janette extended the traditional Forest Plot by adding a Bar Chart of the actual values.

Janette used GTL with a Layout Lattice to create the multi-cell layout.  She nested another Layout Lattice in the SideBar of the first one to define the graph headers.

Variable_WidthShe used the HighLowPlot in different ways to draw many elements of the graph, including the subgroups and labels with indentations on the left, the bar chart itself, the bar labels, the odds ratio and the labels on the right.  This truly shows the flexibility of this plot statement.

Songtao Jiang displayed a creative usage of the GTL Series Plot to create a Variable Width Plot shown above.

WaterFall_By_DoseMurali Kanakenahalli and Avani Kaja of Seattle Genetics showed how to create multiple graphs for Oncology Trials using GTL.  This included Kaplan-Meier graphs, Waterfall Charts by Dose Group, a Swimmer plot and more.

Here is the link to the data visualization section of the conference proceedings for these and other excellent papers.





Post a Comment

Report from SGF 2015

SGF_2015_Logo_2SGF 2015 was a blast with a focus on Visual Analytics, SAS Studio, Hadoop and more.  Graphs were everywhere, and it was a banner year for ODS Graphics with over 15 papers and presentations by users on creating graphs using SG Procedures, GTL and Designer.

Dan Heath, Prashant Hebbar, Scott Singer and I were alternatively manning the ODS Graphics station and the Super Demo station.  We had a steady stream of users sharing their experiences with these graph tools.  The general feed back was awesome, and we were impressed with the level at which you folks have adopted these tools and using them to create graphs.  I was pleasantly surprised some of you already using the SAS 9.40M2 features, including the TextPlot (looking at you, Jim!).

Dan_800Dan presented the new features in SG Procedures 4-5 times.  Here he is expounding upon the sorting features in the SGPanel procedure.

Normally, Super Demos are scheduled for 20-30 minutes, but Dan was holding forth well into the full hour.  This may have caused Scott Singer a bit of stress as he was often following Dan with a Super Demo on the ODS Graphics Editor and Designer.

Scott_800Scott's Super Demos on Designer and Editor were also well attended with many in the audience wondering why they were only now hearing about these tools.  Designer has been included with SAS since 9.2M3, and been available off the Tools menu since SAS 9.3.

Designer is an interactive graph creation tool using which you can create many common graphs with a point-n-click GUI interface.  Scott also demonstrated the "Auto Chart" feature in Designer allowing you to create literally hundreds of graphs from your selected data and variables in minutes.  Designer generates the required GTL for the graph that can be viewed as the graph is being created, and the code can be copied and pasted into the Program window for further customization.  If you have not seen it yet, click on Tools->ODS Graphics Designer to launch the application.

Kirk_Lafler_2It is always a pleasure to meet with you folks and discuss the ways in which you are using the graphics tools, your pains and innovative solutions.   Kirk presented a paper on  building an interactive dash board using ODS Graphics.  He showed ways to include bar charts and pie charts in the display with URL links in each.  There seems a lot of potential to create innovative dashboards using these tools included in Base SAS.   Previously, I had taken a stab at creating some Dashboard widgets using SGPlot.  The picture on the left with Kirk is at the Kennedy Memorial on the way back from the awesome R J Mexican restaurant in West End.

An exciting new development is the ability to include "native" Excel charts in the Excel destination using the new MSCHART procedure.  This procedure will be released preproduction with SAS 9.4M3 in summer.  Scott Huntley and Nancy Goodling presented multiple Super Demos on this topic and got an enthusiastic reception from many of you who frequently send output to Excel spreadsheets.

We were gratified to attend papers on graphics by Philip Holland, Jeffrey Meyers, Susan Slaughter and Lora Delwiche, Rebecca Ottesen, Chuck Kincaid, Kirk Lafler, LeRoy Bessler and many more.  Prashant and I presented papers describing new features in SAS 9.4.

Here are sone of graph papers that come to mind.

1601 - Nesting Multiple Box Plots and BLOCKPLOTs Using Graph Template Language and Lattice Overlay - Greg Stanek.

2242 - Creative Uses of Vector Plots Using SAS® - Deli Wang.

2441 - Graphing Made Easy with SGPLOT and SGPANEL Procedures - Susan Slaughter and Lora Delwiche.

2480 - Kaplan-Meier Survival Plotting Macro %NEWSURV - Jeffrey Meyers.

2686 - Converting Annotate to ODS Graphics. Is It Possible? - Philip Holland.

2986 - Introduction to Output Delivery System (ODS) - Chuck Kincaid.

2988 - Building a Template from the Ground Up with Graph Template Language - Jed Teres.

3080 - Picture-Perfect Graphing with Graph Template Language - Julie VanBuskirk.

3193 - Mapping out SG Procedures and Using PROC SGPLOT for Mapping - Frank Poppe.

3419 - Forest Plotting Analysis Macro %FORESTPLOT - Jeffrey Meyers.

3432 - Getting Your Hands on Reproducible Graphs - Rebecca Ottesen.

3487 - Dynamic Dashboards Using SAS® - Kirk Lafler.

3518 - Twelve Ways to Better Graphs - LeRoy Bessler.

SAS1748 - Lost in the Forest Plot? Follow the Graph Template Language AXISTABLE Road! - Prashant Hebbar.

SAS1780 - Graphs Are Easy with SAS® 9.4 - Sanjay Matange

PharmaSUG 2015 is just around the corner in Orlando next week.  I look forward to more presentations on innovative usage of graphics.  I will present a 1/2 day seminar on "Clinical Graphs using SG Procedures" on Wednesday.  Hope to see you there.






Post a Comment

Difference can be misleading

A very common type of graph contains two series plot, where the user is expected to evaluate the difference visually.

2015Blog_NYT_Malpractice2I saw one such plot on the web today shown on the right.  This graph has two curves, one for malpractice premiums and one for claims, with a shaded band in the middle.  The shaded region represents the difference, or the profit made by the companies issuing the insurance.

What caught my eye was the multiple elements in the graph the often requires the usage of annotation to pull off.   The graph features the following:

  • The two series plot of the data.
  • The shaded band in between.
  • The labeling for each plot and the band.
  • Axis on the right.
  • Grid lines that only go up to the Premium plot.
  • Title and a "story" that this graph is telling.

Normally, I try to avoid using annotation to create a graph unless it is indispensable.  Annotation is harder to use and not scalable to different situations, and should be used sparingly.  So, I set about to see if I could make this graph using SAS 9.4M2 SAS SGPlot procedure without use of annotation.

Premiums2The resulting graph is shown on the right.  First of all, I had to eyeball the data in the graph above to extract the data.  Not too much work.  Then, I used the SAS9.4M2 features of the SGPLOT procedure to create the graph.  Click on the graph for a higher resolution image.  Pretty close, don't you think?

Here is what I used to create the graph:

  • StyleAttrs to set the two colors and the two markers (the left and right triangles).
  • A series plot to draw the upper curve with Y2 axis.
  • A series plot to draw the upper curve with Y2 axis.
  • A band plot to draw the shaded area with Y2 axis.
  • A band plot with white color to cover the grid lines.
  • One label for each line and band.
  • Inset for the "story" the graph is telling.
  • No annotation.


One problem with evaluating differences visually is the eye sees difference as the "shortest" distance between the curves.  The actual difference we are plotting for any year is the "vertical" distance.  These two are not the same.  While the two plots pinch together in two places in the graph, the actual minimum vertical distance is larger than what the eye sees.

The graph on the right adds faint vertical lines in the banded area. These lines help the eye see the vertical distance instead of the smallest distance.  We have done that by layering a HighLow plot on top of the band using default Type=line.  At the pinch near 1985 the vertical difference is almost 50% larger than what the eye sees as the closest points on the two lines.

Here is the SGPLOT code:

title h=20pt 'Ahead of the Curve';
footnote j=l 'Source:  A. M. Best';
proc sgplot data=premiums noborder noautolegend;
  styleattrs datasymbols=(triangleleftfilled trianglerightfilled);
  highlow x=year low=claims high=premium / y2axis lineattrs=(color=verylightgray);
  band x=year lower=premium upper=10.1 / y2axis fillattrs=(color=white);
  band x=year lower=claims upper=premium / y2axis 
       fillattrs=(color=lightgray transparency=0.7);
  series x=year y=claims / y2axis lineattrs=(thickness=3 color=darkgreen);
  series x=year y=premium / y2axis lineattrs=(thickness=3 color=olive);
  scatter x=year y=yl / y2axis group=grp markerattrs=(color=black) nomissinggroup;
  text x=year y=yl text=label1 / y2axis splitpolicy=splitalways splitchar=',' 
       position=right contributeoffsets=none textattrs=(size=9);
  text x=year y=yl text=label2 / y2axis splitpolicy=splitalways splitchar=',' 
       position=left contributeoffsets=none textattrs=(size=9);
  text x=year y=yl text=label3 / y2axis splitpolicy=splitalways splitchar=','
       contributeoffsets=none textattrs=(size=9 style=italic);
  xaxis minor minorcount=4 offsetmin=0 values=(1975 to 2003 by 5) min=1975 valueshint;
  y2axis display=(noticks noline) grid gridattrs=(color=gray) min=0 valueshint 
         offsetmin=0 values=(2 to 10 by 2)
         gridattrs=(pattern=dash) label='(Billions)' labelpos=top;
  inset 'Medical malpractice premiums' 'have soared in recent years,' 
        'outpacing the rise in payments' 'for malpractice claims.' / 
        position=topleft textattrs=(size=10);

DifferenceNote the use of the following features in the graph.

  • Text plot is used instead of the usual scatter plot with markerchar to place the labels.  The text plot is specialized for text and has custom options include ContributeOffset.
  • X axis has minor ticks and minor tick count.
  • Y2 axis places the axis label on top instead on side.

To make your graph more effective, it is better to display the actual derived value directly, instead of relying on each consumer of the graph to evaluate the difference accurately.  So, I added a green band showing the actual difference between Premiums and Claims.

Full SAS 9.4M2 code: Premiums

Finally, next week is SAS Global Forum 2015 in Dallas.  It is a great year for data visualization with many user presentations on graphics using SG Procedures and GTL.  Visual Analytics is also on display.  We will be there to meet with you, answer your questions and to hear your pains.  See you at SGF in Dallas.

Post a Comment

Micro Maps

MicroMaps are a powerful way to display data where the display includes small, lightweight maps to provide geographical information regarding the data.  This geographical information gives clues to the relationship between the data that could lead to more insight.

The SAS SG Procedures and GTL do not currently have built-in features to create a micro map type display, however, you can still create one using the current feature set with some effort.  Let us examine how this can be done.

Map2First of all, how do you create a map using SG or GTL?  While you can use SGPLOT to create a map display, I will use GTL as we will progress towards making a micromap.  With SAS 9.40M1 the PolygonPlot statement was introduced in GTL.  A similar Polygon statement is available in SGPlot.  The Polygon Plot statement is a versatile tool to create custom displays using GTL or SG.  If you can think of a display type you want, you can do it using polygon plot.

Clearly, the purpose of this statement is to plot general polygonal shapes in your graph.  Well, a map is a special form of polygonal data, so we can use the MAPS.States data set directly to create a map using the PolygonPlot statement.

Retaining the aspect ratio of the data space is important for plotting a map.  So, the best way to create a map is to use the GTL Layout OverlayEquated.  GTL code for the map is shown below.  Click on map for higher resolution image.  Some data processing is required to for states that have multiple polygons and to project the map data.  See link below for the full code.

proc template;
define statgraph Map;
  dynamic _skin _color;
    begingraph / designwidth=6in designheight=6in subpixel=on;
      entrytitle 'USA Map by Region';
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=region display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) label=statecode;
          discretelegend 'map' / location=inside across=1 halign=right 
                    valign=bottom valueattrs=(size=7) border=false;
      entryfootnote halign=left 'Using Polygon Plot in an Layout OverlayEquated';
proc sgrender data=usapb template=Map;
 dynamic _skin="sheen" _color='Black';

Notable items in the code above are:

  • Using LAYOUT OVERLAYEQUATED container.  This ensures the aspect of the data is retained regardless of the dimensions of the graph container.
  • Wall, X andY axes are suppressed.
  • Using PolygonPlot to draw the polygons of the map.  We have used GROUP=Region, so polygons in each region are colored the same.  We could use GROUP=State to color each state differently.
  • The polygon plot can draw the label in various locations.  Here they are drawn in the center of the bounding box.  It is possible we could add an option to draw the label at the weighted center of the polygon.
  • Usage of a skin gives the embossed effect for each state.

MicroMaps2Now, let us take the next step to draw a column of micro maps as shown on the right.  Click on the graph for a higher resolution image.  What we have done here is created three rows of the same map, but each map highlights the states in on of three regions - NorthWest, SouthWest and South.  The GTL template now has more code to address the three cells in a LAYOUT LATTICE.

While the code below looks long, you will see that it is highly structured, and once you understand how one cell (Row) is defined, the other rows are similar.

We use a LAYOUT LATTICE with one column.  The 3 rows result from the fact that we have added three cells, each defined by the LAYOUT OVERLAY - ENDLAYOUT block.

We have colored a set of states by the region they belong in.  Other states have the missing color.   We have displayed the state names for only the states in the region.

To color the regions, we have defined a Discrete Attributes Map, which defines the color by the name of the region.  To display state names only for the region, we have used an expression that returns a state label only if the region is the one specified.  This is a powerful way to subset the data in the template.


proc template;
  define statgraph MicroMaps;
  dynamic _skin _color;
    begingraph / designwidth=4in designheight=6in subpixel=on;
      entrytitle 'Revenues by Region and Product';
      discreteattrmap name="states" / ignorecase=true;
         value "NorthWest"  / fillattrs=graphdata1; 
	 value "SouthWest"  / fillattrs=graphdata2;
         value "South"      / fillattrs=graphdata3;
      discreteattrvar attrvar=southfill var=south attrmap="states";
      discreteattrvar attrvar=northwestfill var=northwest attrmap="states";
      discreteattrvar attrvar=southwestfill var=southwest attrmap="states";
      layout lattice / columns=1;
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=northwestfill display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) 
          entry halign=right 'NorthWest' / textattrs=(size=7); 
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=southwestfill display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) 
		  entry halign=right 'SouthWest' / textattrs=(size=7); 
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=southfill display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) 
          entry halign=right 'South' / textattrs=(size=7); 
     entryfootnote halign=left 'Using Polygon in a Layout Lattice' / 

MicroMapBarFinally, We will combine some other relevant data to the display.  Here I have shown a Horizontal Bar of Revenues by Product in each region.  Clearly, this can be used to show different species of animals, local trees, or health data by region.

The graph can be any graph that can be placed in a Layout Overlay, such as scatter, series, histogram, etc.  The possibilities are endless.  Here I have added only one cell in addition to the map, but you can have any number of cells in a row.

Clearly, this requires us to create a data set that is a combination of the map and the data needed for the bar chart or any other plot.  Some creative coding is needed to get all the data in one data set such that the different items are still clearly accessible to the plot statements.

Just like we did with AXISTABLE, once we have a better understanding of the different ways you could use this type of a graph, we could develop a statement or features to make this process easier.  We would love to hear from you on how you might use such a graph.  Please feel free to chime in.

Full SAS 9.40M2 program:  MicroMaps


Post a Comment

Conditional Highlighting - 2

Back in late 2012 I discussed a technique for Conditional Highlighting, where additional attributes can be displayed in a graph.

ConditionalHighlightingSkinIn the previous article the goal was to display a graph of Response by Year by Drug.  We used a cluster grouped bar chart to create the bar chart.  We also wanted to tag cases where the sample size was lower than a threshold, and we did that by adding a cross hatch pattern for such cases.  Click on the graph for a higher resolution image.

So, the idea is that if the available features in a graph are already used up to show some data attributes, how can we add more features to the graph to display additional attributes.  These attributes are added based on other conditions, and hence the term "Conditional Highlighting".

With SAS 9.4M1, additional features are supported in the SG Procedures to do more such things.  Specifically, I am referring to the ability to create marker symbols from images and characters of a font.  I discussed this in  the article on Marker Symbols.  Let us use this feature to add other attributes to a graph based on some conditions.

Conditional_1The graph on the right displays the Sales by Person, and also displays the gender of each sales person using a bar chart.  We also want to display if the sales person is under, over or well over the projected performance.  We have done that by adding an icon near the top of the bar.  The icon has three versions, with sad or happy faces.

We have done this by layering a scatter plot on the bar chart.  We used the VBarParm to display the bar chart as we have summarized data, and VBarParm allows layering with other basic plots.  Here is the SAS9.4 code.

title 'Sales and Status by Sales Person';
proc sgplot data=sales;
  symbolimage name=bad  image="C:\Work\Images\Conditional\Sad_Tran.png";
  symbolimage name=good image="C:\Work\Images\Conditional\Happy_Tran.png";
  symbolimage name=great image="C:\Work\Images\Conditional\VeryHappy_Tran.png";
  styleattrs datasymbols=(great good bad) datacolors=(pink cx4f5faf);
  vbarparm category=name response=sales / group=gender dataskin=gloss 
           filltype=gradient groupdisplay=cluster;
  scatter x=name y=ys / group=status markerattrs=(size=30);
  yaxis offsetmin=0 offsetmax=0 grid;
  xaxis display=(nolabel) offsetmin=0.1 offsetmax=0.1;

To do this, we computed the "Status" based on the above condition.  We defined three new symbols from the image files called "Sad_Tran.png", "Happy_Tran.png" and "VeryHappy_Tran.png" using the SymbolImage statement.  These are transparent images.  All image files are inherently rectangular in shape.  However, the picture occupies only a part of the image, like the happy face.  The pixels around that are black, or some other background color.  We have used a image processing software to make these background pixels transparent, so when the image is drawn, these transparent pixels are not displayed.

We have used the StyleAttrs statement to define our list of group markers to include only these three new symbols only using the DataSymbols option..  We have also set the two colors we want for the "Male" and "Female" group value using the DataColors option.  The StyleAttrs option allows you to define your own group attributes within the SG Procedure without having to define a new ODS Style.

We have also used the FillType=Gradient option to fill the bars with a gradient effect.  I understand usage of such effects in the graph, often referred to as "Chart Junk" a term coined by Edward Tufte, is not preferred in many domains  However, in some domains this can be useful.

Conditional_2Now, let us take another step and add another conditional attribute to the graph as shown on the right to show you the possibilities of this approach.  Here, we have added a "Blue Ribbon" for the salesperson with the highest sales.  Note, the blue ribbon may be awarded based on other conditions.

I have done that by defining another symbol using the image "Blue_Ribbon_Tran.png".  Note in this case, I have used Rotate=20 option to add some pizzazz to the visual.  Actually, markers with a few different rotation angles can be used for classifier too.

Note, in this graph, the grid lines are not visible through the bottom part of the bars any more.   FillType=Gradient uses a transparency gradient to fill the bars.  This allows the grid lines to be visible through the bottom part of the bars, where the bars are more transparent.  To prevent that, I have used another VBarParm with plain white color behind the one with gradient.  This suppresses the bleeding of the grid lines. The full code is attached below.

Combining the ability to define your own symbols, along with the ability of layering plot statements together provides you with powerful ways to create all types of graphs.  Conditional highlights can be many colored dot or swatches added to the the bars or any other element in the graph to convey more information to the reader.

Full SAS 9.4 Code:  Conditional_Highlighting


Post a Comment

Sankey Diagrams

Sankey Diagrams have found increasing favor for visualization of data.  This visualization tool has been around for a long time, traditionally used to visualize the flow of energy, or materials.   .

Now to be sure, GTL does have a statement design for a Sankey Diagram which was implemented only in Flex for use in interactive visualization cases.  The GTL Sankey Diagram statement was not implemented for use in MVA visualization cases due to lack of demand.

However, recently a SAS user asked about creating such graphs using SAS MVA graphics tools.  With SAS 9.4 there are sufficient tools in place to create such a diagram using custom coding without use of annotation.  In SAS 9.4M3, more tools are available that makes this task easier.  I have outlined the process below.

Sankey_2_940The diagram created using the SAS 9.4 SGPLOT procedure is shown on the right.  Click on the diagram to see bigger view.  Since no SANKEY statement is available in SGPLOT, such a diagram requires custom coding.  However, no annotation is required.   The program uses the following statements:

  • Series with SmoothConnect for the curves.
  • Highlow plots nodes and link values.
  • Scatter plot with MarkerChar for node labels.
  • Series plot to draw the brackets.
  • Scatter plot with MarkerChar for labels 1,2,3.

A custom data set has to be created to draw the different parts of the diagram as shown in the attached program link at the bottom.

SankeyThe diagram shown on the right uses the new SPLINE statement to be released soon with SAS 9.4M3.  This makes the process a little easier, as the spline is a smooth curve that does not need to pass through each of the vertex points.  The SAS 9.4M3 SGPLOT also supports varying line thickness for series and spline statements.

Clearly the data is hand-built for this particular diagram.  I believe this process can be converted to a macro to create a Sankey Diagram from a node-link data set with the appropriate information.  Things will get more interesting as the diagram includes links splits or merges at various nodes.

SAS 9.4 SGPLOT Code:   Sankey_940

Post a Comment

A 3D Scatter Plot Animation Macro

In the previous article, I described the process to create a 3D Scatter Plot using a 3D Orthographic View matrix and the SGPLOT procedure.  I posted a macro that can be used to create a 3D scatter plot from any SAS data set, using 3 numeric columns, one each for X, Y and Z (Response) axes.

Visualization of 3D data can be improved by providing interaction or animation.  Here I have described a way to create an animation using the idea described in the previous article.


The setup for the animation is as follows:

options papersize=('5 in', '4 in') printerpath=gif animation=start 
        animduration=0.05 animloop=yes noanimoverlay;
ods printer dpi=100 file='C:\Class3DScatterAnim.gif';
ods listing image_dpi=200;
ods graphics / reset attrpriority=color width=5in height=4in imagefmt=GIF;
%run_anim_macro(data=sashelp.class, start=-30, end=-60, incr=-1);
%run_anim_macro(data=sashelp.class, start=-60, end=-30, incr=1);
options printerpath=gif animation=stop;
ods printer close;

I have modified the %Ortho3D_Macro provided in the previous article, and added a loop to render multiple graphs with changing value for the Z-Rotation from -30 to -60 and back by 1 degree.  Here I have created a GIF animation.  An SVG animation can also be created using the code at the bottom of the attached file.

3D Animation Macro:  Ortho_3D_Animation

Matrix Multiplication Function:  Matrix_Functions

Post a Comment

A 3D Scatter Plot Macro

The SG Procedures do not support creating a 3D scatter plot.   GTL has some support for 3D graphs, including a 3D Bi-variate Histogram and a 3D Surface, but still no 3D point cloud.  The lack of such a feature is not due to any difficulty in doing this as GTL already support the LAYOUT OVERLAY3D container, but the fact that there was no one urgently requesting such a feature.

However, often we do have a need for  visualization of 3D data, and it would be nice to be able to do this.  So, here I have presented a macro that uses the features of the SGPLOT procedure to display 3D data.  This uses SAS 9.4 features to render the walls, axis labels and the  filled "spherical-looking" markers.

%Ortho3D_Macro (Data=sashelp.class, WallData=wall_Axes, 
                X=height, Y=Age, Z=Weight,
                Lblx=Height, Lbly=Age, Lblz=Weight, 
                Group=Sex, Attrmap=attrmap, Tilt=65, Rotate=-55, 
                Title=Plot of Weight by Height and Age);

Note the following items in the macro invocation above:

  • The data set to be viewed is provided.
  • A data set defining the 3D walls is provided.  This is shown in detail in the program code.
  • The three columns to be mapped to each axis are provided.
  • X and Y form the two independent variables, and the response variable is displayed on the vertical Z axis.
  • Labels for each axis can be specified.
  • An Attribute map is used to set the visual attributes of the walls and bounding box of the data.  This is shown in the code.
  • A group variable can be used to color the markers.
  • Viewing parameters Tilt (0 to 90) and Rotate (-15 to -75)  are ideal.
  • Title can be set.

The macro maps the 3D data to a unit cube, and projects the data into the view space using an ORTHOGRAPHIC projection.  This avoids distortion of the data that can happen when using a perspective projection.


Here are the features of the graph:

  • Spherical looking markers are drawn at each (x, y, z) location.
  • Axis labels are drawn, but not the tick values.  That could be added, but can get messy.   The idea is to really see the shape of the data.
  • Relative positions of the markers can be a challenge to view in a static 3D view.  So, the X-Y, X-Z and Y-Z projections for each point are also displayed to help locate the points in 3D space.
  • Needles are dropped to the floor.
  • View parameters are displayed.

The same macro can be invoked for other data such as all the Sedans in the Cars data as shown below:

%Ortho3D_Macro (Data=sedans, WallData=wall_Axes, 
                X=horsepower, Y=Weight, Z=mpg_city,
                Lblx=Horsepower, Lbly=Weight, Lblz=Mileage, 
                Group=origin, Attrmap=attrmap, Tilt=45, Rotate=-60, 
                Title=Plot of Mileage by Horsepower and Weight);


For those interested in the process for projection of 3D data on to a 2D plane, the View Matrix for the Orthographic Projection is shown below.


We can also use the standard way to create an animated GIF or SVG file that helps in the visualization of the data.  I will include that in the next post.

Note the macro is provided for illustration purposes on what is possible.  I have not rigorously tested all settings and use cases.  If visualization of 3D data is something you feel you need, please chime in with your suggestions for more 3D plots right here, or to SAS Technical Support.

Full SAS 9.4 Code:  Ortho_3D_Macro_94

Matrix Functions:  Matrix_Functions

Post a Comment

Margin Plots

Last week a user wanted to view the distribution of data using a Box Plot.  The issue was the presence of a lot of "bad" data.  I got to thinking of ways such data can be visualized.  I also discussed the matter with our resident expert Rick Wicklin who pointed me to a couple of resources including some information on visualization of missing data on the web.

First, my usual disclaimer:  I am only a "Graph Guy", and not a Statistician.  So, my thoughts below are mainly graphical suggestions.  Please feel free to point out pros and cons of the techniques discussed below.

Box_MissingOn the issue of visualizing data using Box plots, I simulated some data using sashelp.heart. by setting some data to missing, and setting those values to zero in another column.  Then, I used a box plot to view the data, and overlaid a scatter plot to view the values that were set to missing.  Since I put those observations in another column with a value of zero, they all show up at the bottom of the graph.  You can select the appropriate value.  I set the Y axis so zero is not on the axis.

SGPLOT with  SAS 9.40M1  supports overlays of basic plots with a VBOX.   Note how we can see that some of the "Cancer" and "Coronary Heart Disease" data is "bad", in this case, "missing".

title 'Cholesterol by Death Cause';
proc sgplot data=heart_Box noautolegend;
  vbox cholesterol / category=deathcause extreme;
  scatter x=deathcause y=chol / markerattrs=graphdata1(symbol=circlefilled) 
          transparency=0.5  name='s' jitter jitterwidth=0.5 legendlabel='Missing Data';
  keylegend 's' / location=inside position=topleft;
  xaxis display=(noticks nolabel);
  yaxis values=(100 to 500 by 100) min=0 valueshint;

Margin_Systolic_BoxThe user also made a comment on how the data was so skewed, that a box plot was not possible.  That got me looking for another way to view the same data.  This time, I replaced some values for Cholesterol and Systolic with missing values, copying them into other variables.

Now, I plotted Systolic by Cholesterol, which displayed the cloud of non-missing values.  Then I added a box plot for all the values and a box for just the values where cholesterol was missing.  The graph is shown on the right.  Click on graph for a higher resolution image.

The blue box is of all the observations where cholesterol is non-missing.  Red box is for observations where cholesterol is missing, but the systolic has a valid value.  Once again, this is possible with SAS 9.40M1 SGPLOT.  For the VBOX data, I have set the "category" values to 10 and 20.  Since the axes are "Linear" by the Scatter plot, this combination is possible.

proc sgplot data=heart_2D noautolegend;
  scatter x=chol y=syst / name='s' markerattrs=graphdata1  legendlabel='Non Missing Data'
          markerattrs=graphdata1(symbol=circlefilled) transparency=0.7;
  vbox systBox / category=cholA extreme group=systgrp fill nooutliers name='b' boxwidth=1;
  keylegend 's' 'b';
  xaxis min=0 values=(100 to 500 by 100) valueshint grid label='Cholesterol';
  yaxis min=0 values=( 50 to 300 by 50) valueshint grid label='Systolic';

Margin_Cholesterol_ScatNow, this gives us some idea where all the data is, but still this may not work well if the distribution of the data is bi-modal. We can create the same graph using a scatter plot of the data instead of box.

Here, I have displayed a scatter of the non-missing data  along with another scatter plot with two groups - All data and Missing Systolic data.  Maybe this view can provide a better visualization of the missing data.  We can certainly add insets to indicate the percentage of the missing data.

Another way may be to use a HISTOGRAM instead of the VBOX or SCATTER to view the distribution of the missing data.  I will take that up in a follow-up post.

Even with SAS 9.40M1, SGPLOT will allow us only to view one distribution at a time.  If we want to plot both the distribution of the Systolic for missing Cholesterol and vice-versa, we will need to use GTL.  Also, if you have a SAS release prior to SAS 9.40M1, you can use GTL to create the VBOX + SCATTER overlay graphs shown above.

Margin_2DThe graph with box plots of all and missing data is shown on the right.  This graph is created using GTL.  It uses only one LAYOUT OVERLAY, since the categorical values for the box plots is also numeric.  However, we can use a LAYOUT LATTICE to create other combinations.

 Full SAS9.40M1 Code:  Margin_Plot

Post a Comment

Cancer Deaths Averted

cancer_mortalitySignificant progress in reduction of Cancer mortality is shown in a graph that I noticed recently on the Cancer Network web site.  This graph showed the actual and projected cancer mortality by year for males.  The graph is shown on the right.

The graph plots the projected and actual numbers by year, and highlights the difference using the hatched pattern.  The total number of Cancer Deaths Averted is shown.

The graph on the right includes a Y axis data range all the way down to zero, where it is really not necessary.  But, we can use this space that is otherwise wasted to display more information.

Creating the graph is easy, using the following SGPLOT code.  Some options are trimmed to fit the space.  See full code in link at bottom for the details.

title 'Cancer Deaths';
proc sgplot data=mortality nocycleattrs nowall noborder;
  styleattrs datalinepatterns=(solid);
  highlow x=year low=actual high=projected / type=line;
  series x=year y=projected / name='b' legendlabel='Projected';
  series x=year y=actual / name='a' legendlabel='Actual';
  keylegend 'a' 'b' / location=inside position=topleft linelength=20;
  xaxis values=(1975 to 2010 by 5) grid;
  yaxis values=(0 to  450000 by 50000) grid;

Mortality_Diff_2The graph is shown on the right is created by the code shown above.  The data is "eye-balled" from the original graph and includes the columns of Year, Actual, Predicted and Diff.  The total number of deaths averted is saved in a macro, and also inserted into the label to be displayed.

Two SERIES plots are used to plot the actual and predicted curves.  A HIGHLOW plot is used to draw the vertical hatch marks showing the reduction in the cancer deaths since 1990.  A legend is added to indicate the actual and predicted curves.


For the graph shown below on the right, a Band plot is added to display the reduction in cancer deaths by year explicitly.   Also, we have used a TEXT plot statement to display the inset indicating the number of deaths averted.



There some benefits of this addition.  The empty area at the bottom of the graph is utilized.  The actual deaths averted are drawn from a common baseline, thus removing the distortions in the hatched area due the varying baseline.  An an explicit inset shows the estimated number of deaths averted.

The TEXT plot is a SAS 9.4M2 feature, but one could use an INSET or a SCATTER with MarkerChar to do something similar.

Full SAS code:  CancerDeaths

Post a Comment