Difference can be misleading

A very common type of graph contains two series plot, where the user is expected to evaluate the difference visually.

2015Blog_NYT_Malpractice2I saw one such plot on the web today shown on the right.  This graph has two curves, one for malpractice premiums and one for claims, with a shaded band in the middle.  The shaded region represents the difference, or the profit made by the companies issuing the insurance.

What caught my eye was the multiple elements in the graph the often requires the usage of annotation to pull off.   The graph features the following:

  • The two series plot of the data.
  • The shaded band in between.
  • The labeling for each plot and the band.
  • Axis on the right.
  • Grid lines that only go up to the Premium plot.
  • Title and a "story" that this graph is telling.

Normally, I try to avoid using annotation to create a graph unless it is indispensable.  Annotation is harder to use and not scalable to different situations, and should be used sparingly.  So, I set about to see if I could make this graph using SAS 9.4M2 SAS SGPlot procedure without use of annotation.

Premiums2The resulting graph is shown on the right.  First of all, I had to eyeball the data in the graph above to extract the data.  Not too much work.  Then, I used the SAS9.4M2 features of the SGPLOT procedure to create the graph.  Click on the graph for a higher resolution image.  Pretty close, don't you think?

Here is what I used to create the graph:

  • StyleAttrs to set the two colors and the two markers (the left and right triangles).
  • A series plot to draw the upper curve with Y2 axis.
  • A series plot to draw the upper curve with Y2 axis.
  • A band plot to draw the shaded area with Y2 axis.
  • A band plot with white color to cover the grid lines.
  • One label for each line and band.
  • Inset for the "story" the graph is telling.
  • No annotation.


One problem with evaluating differences visually is the eye sees difference as the "shortest" distance between the curves.  The actual difference we are plotting for any year is the "vertical" distance.  These two are not the same.  While the two plots pinch together in two places in the graph, the actual minimum vertical distance is larger than what the eye sees.

The graph on the right adds faint vertical lines in the banded area. These lines help the eye see the vertical distance instead of the smallest distance.  We have done that by layering a HighLow plot on top of the band using default Type=line.  At the pinch near 1985 the vertical difference is almost 50% larger than what the eye sees as the closest points on the two lines.

Here is the SGPLOT code:

title h=20pt 'Ahead of the Curve';
footnote j=l 'Source:  A. M. Best';
proc sgplot data=premiums noborder noautolegend;
  styleattrs datasymbols=(triangleleftfilled trianglerightfilled);
  highlow x=year low=claims high=premium / y2axis lineattrs=(color=verylightgray);
  band x=year lower=premium upper=10.1 / y2axis fillattrs=(color=white);
  band x=year lower=claims upper=premium / y2axis 
       fillattrs=(color=lightgray transparency=0.7);
  series x=year y=claims / y2axis lineattrs=(thickness=3 color=darkgreen);
  series x=year y=premium / y2axis lineattrs=(thickness=3 color=olive);
  scatter x=year y=yl / y2axis group=grp markerattrs=(color=black) nomissinggroup;
  text x=year y=yl text=label1 / y2axis splitpolicy=splitalways splitchar=',' 
       position=right contributeoffsets=none textattrs=(size=9);
  text x=year y=yl text=label2 / y2axis splitpolicy=splitalways splitchar=',' 
       position=left contributeoffsets=none textattrs=(size=9);
  text x=year y=yl text=label3 / y2axis splitpolicy=splitalways splitchar=','
       contributeoffsets=none textattrs=(size=9 style=italic);
  xaxis minor minorcount=4 offsetmin=0 values=(1975 to 2003 by 5) min=1975 valueshint;
  y2axis display=(noticks noline) grid gridattrs=(color=gray) min=0 valueshint 
         offsetmin=0 values=(2 to 10 by 2)
         gridattrs=(pattern=dash) label='(Billions)' labelpos=top;
  inset 'Medical malpractice premiums' 'have soared in recent years,' 
        'outpacing the rise in payments' 'for malpractice claims.' / 
        position=topleft textattrs=(size=10);

DifferenceNote the use of the following features in the graph.

  • Text plot is used instead of the usual scatter plot with markerchar to place the labels.  The text plot is specialized for text and has custom options include ContributeOffset.
  • X axis has minor ticks and minor tick count.
  • Y2 axis places the axis label on top instead on side.

To make your graph more effective, it is better to display the actual derived value directly, instead of relying on each consumer of the graph to evaluate the difference accurately.  So, I added a green band showing the actual difference between Premiums and Claims.

Full SAS 9.4M2 code: Premiums

Finally, next week is SAS Global Forum 2015 in Dallas.  It is a great year for data visualization with many user presentations on graphics using SG Procedures and GTL.  Visual Analytics is also on display.  We will be there to meet with you, answer your questions and to hear your pains.  See you at SGF in Dallas.

Post a Comment

Micro Maps

MicroMaps are a powerful way to display data where the display includes small, lightweight maps to provide geographical information regarding the data.  This geographical information gives clues to the relationship between the data that could lead to more insight.

The SAS SG Procedures and GTL do not currently have built-in features to create a micro map type display, however, you can still create one using the current feature set with some effort.  Let us examine how this can be done.

Map2First of all, how do you create a map using SG or GTL?  While you can use SGPLOT to create a map display, I will use GTL as we will progress towards making a micromap.  With SAS 9.40M1 the PolygonPlot statement was introduced in GTL.  A similar Polygon statement is available in SGPlot.  The Polygon Plot statement is a versatile tool to create custom displays using GTL or SG.  If you can think of a display type you want, you can do it using polygon plot.

Clearly, the purpose of this statement is to plot general polygonal shapes in your graph.  Well, a map is a special form of polygonal data, so we can use the MAPS.States data set directly to create a map using the PolygonPlot statement.

Retaining the aspect ratio of the data space is important for plotting a map.  So, the best way to create a map is to use the GTL Layout OverlayEquated.  GTL code for the map is shown below.  Click on map for higher resolution image.  Some data processing is required to for states that have multiple polygons and to project the map data.  See link below for the full code.

proc template;
define statgraph Map;
  dynamic _skin _color;
    begingraph / designwidth=6in designheight=6in subpixel=on;
      entrytitle 'USA Map by Region';
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=region display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) label=statecode;
          discretelegend 'map' / location=inside across=1 halign=right 
                    valign=bottom valueattrs=(size=7) border=false;
      entryfootnote halign=left 'Using Polygon Plot in an Layout OverlayEquated';
proc sgrender data=usapb template=Map;
 dynamic _skin="sheen" _color='Black';

Notable items in the code above are:

  • Using LAYOUT OVERLAYEQUATED container.  This ensures the aspect of the data is retained regardless of the dimensions of the graph container.
  • Wall, X andY axes are suppressed.
  • Using PolygonPlot to draw the polygons of the map.  We have used GROUP=Region, so polygons in each region are colored the same.  We could use GROUP=State to color each state differently.
  • The polygon plot can draw the label in various locations.  Here they are drawn in the center of the bounding box.  It is possible we could add an option to draw the label at the weighted center of the polygon.
  • Usage of a skin gives the embossed effect for each state.

MicroMaps2Now, let us take the next step to draw a column of micro maps as shown on the right.  Click on the graph for a higher resolution image.  What we have done here is created three rows of the same map, but each map highlights the states in on of three regions - NorthWest, SouthWest and South.  The GTL template now has more code to address the three cells in a LAYOUT LATTICE.

While the code below looks long, you will see that it is highly structured, and once you understand how one cell (Row) is defined, the other rows are similar.

We use a LAYOUT LATTICE with one column.  The 3 rows result from the fact that we have added three cells, each defined by the LAYOUT OVERLAY - ENDLAYOUT block.

We have colored a set of states by the region they belong in.  Other states have the missing color.   We have displayed the state names for only the states in the region.

To color the regions, we have defined a Discrete Attributes Map, which defines the color by the name of the region.  To display state names only for the region, we have used an expression that returns a state label only if the region is the one specified.  This is a powerful way to subset the data in the template.


proc template;
  define statgraph MicroMaps;
  dynamic _skin _color;
    begingraph / designwidth=4in designheight=6in subpixel=on;
      entrytitle 'Revenues by Region and Product';
      discreteattrmap name="states" / ignorecase=true;
         value "NorthWest"  / fillattrs=graphdata1; 
	 value "SouthWest"  / fillattrs=graphdata2;
         value "South"      / fillattrs=graphdata3;
      discreteattrvar attrvar=southfill var=south attrmap="states";
      discreteattrvar attrvar=northwestfill var=northwest attrmap="states";
      discreteattrvar attrvar=southwestfill var=southwest attrmap="states";
      layout lattice / columns=1;
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=northwestfill display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) 
          entry halign=right 'NorthWest' / textattrs=(size=7); 
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=southwestfill display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) 
		  entry halign=right 'SouthWest' / textattrs=(size=7); 
        layout overlayEquated / walldisplay=none xaxisopts=(display=none)
          polygonPlot x=x y=y id=pid / group=southfill display=(fill outline) 
                    outlineattrs=(color=black) dataskin=_skin 
                    labelattrs=(color=black size=5) 
          entry halign=right 'South' / textattrs=(size=7); 
     entryfootnote halign=left 'Using Polygon in a Layout Lattice' / 

MicroMapBarFinally, We will combine some other relevant data to the display.  Here I have shown a Horizontal Bar of Revenues by Product in each region.  Clearly, this can be used to show different species of animals, local trees, or health data by region.

The graph can be any graph that can be placed in a Layout Overlay, such as scatter, series, histogram, etc.  The possibilities are endless.  Here I have added only one cell in addition to the map, but you can have any number of cells in a row.

Clearly, this requires us to create a data set that is a combination of the map and the data needed for the bar chart or any other plot.  Some creative coding is needed to get all the data in one data set such that the different items are still clearly accessible to the plot statements.

Just like we did with AXISTABLE, once we have a better understanding of the different ways you could use this type of a graph, we could develop a statement or features to make this process easier.  We would love to hear from you on how you might use such a graph.  Please feel free to chime in.

Full SAS 9.40M2 program:  MicroMaps


Post a Comment

Conditional Highlighting - 2

Back in late 2012 I discussed a technique for Conditional Highlighting, where additional attributes can be displayed in a graph.

ConditionalHighlightingSkinIn the previous article the goal was to display a graph of Response by Year by Drug.  We used a cluster grouped bar chart to create the bar chart.  We also wanted to tag cases where the sample size was lower than a threshold, and we did that by adding a cross hatch pattern for such cases.  Click on the graph for a higher resolution image.

So, the idea is that if the available features in a graph are already used up to show some data attributes, how can we add more features to the graph to display additional attributes.  These attributes are added based on other conditions, and hence the term "Conditional Highlighting".

With SAS 9.4M1, additional features are supported in the SG Procedures to do more such things.  Specifically, I am referring to the ability to create marker symbols from images and characters of a font.  I discussed this in  the article on Marker Symbols.  Let us use this feature to add other attributes to a graph based on some conditions.

Conditional_1The graph on the right displays the Sales by Person, and also displays the gender of each sales person using a bar chart.  We also want to display if the sales person is under, over or well over the projected performance.  We have done that by adding an icon near the top of the bar.  The icon has three versions, with sad or happy faces.

We have done this by layering a scatter plot on the bar chart.  We used the VBarParm to display the bar chart as we have summarized data, and VBarParm allows layering with other basic plots.  Here is the SAS9.4 code.

title 'Sales and Status by Sales Person';
proc sgplot data=sales;
  symbolimage name=bad  image="C:\Work\Images\Conditional\Sad_Tran.png";
  symbolimage name=good image="C:\Work\Images\Conditional\Happy_Tran.png";
  symbolimage name=great image="C:\Work\Images\Conditional\VeryHappy_Tran.png";
  styleattrs datasymbols=(great good bad) datacolors=(pink cx4f5faf);
  vbarparm category=name response=sales / group=gender dataskin=gloss 
           filltype=gradient groupdisplay=cluster;
  scatter x=name y=ys / group=status markerattrs=(size=30);
  yaxis offsetmin=0 offsetmax=0 grid;
  xaxis display=(nolabel) offsetmin=0.1 offsetmax=0.1;

To do this, we computed the "Status" based on the above condition.  We defined three new symbols from the image files called "Sad_Tran.png", "Happy_Tran.png" and "VeryHappy_Tran.png" using the SymbolImage statement.  These are transparent images.  All image files are inherently rectangular in shape.  However, the picture occupies only a part of the image, like the happy face.  The pixels around that are black, or some other background color.  We have used a image processing software to make these background pixels transparent, so when the image is drawn, these transparent pixels are not displayed.

We have used the StyleAttrs statement to define our list of group markers to include only these three new symbols only using the DataSymbols option..  We have also set the two colors we want for the "Male" and "Female" group value using the DataColors option.  The StyleAttrs option allows you to define your own group attributes within the SG Procedure without having to define a new ODS Style.

We have also used the FillType=Gradient option to fill the bars with a gradient effect.  I understand usage of such effects in the graph, often referred to as "Chart Junk" a term coined by Edward Tufte, is not preferred in many domains  However, in some domains this can be useful.

Conditional_2Now, let us take another step and add another conditional attribute to the graph as shown on the right to show you the possibilities of this approach.  Here, we have added a "Blue Ribbon" for the salesperson with the highest sales.  Note, the blue ribbon may be awarded based on other conditions.

I have done that by defining another symbol using the image "Blue_Ribbon_Tran.png".  Note in this case, I have used Rotate=20 option to add some pizzazz to the visual.  Actually, markers with a few different rotation angles can be used for classifier too.

Note, in this graph, the grid lines are not visible through the bottom part of the bars any more.   FillType=Gradient uses a transparency gradient to fill the bars.  This allows the grid lines to be visible through the bottom part of the bars, where the bars are more transparent.  To prevent that, I have used another VBarParm with plain white color behind the one with gradient.  This suppresses the bleeding of the grid lines. The full code is attached below.

Combining the ability to define your own symbols, along with the ability of layering plot statements together provides you with powerful ways to create all types of graphs.  Conditional highlights can be many colored dot or swatches added to the the bars or any other element in the graph to convey more information to the reader.

Full SAS 9.4 Code:  Conditional_Highlighting


Post a Comment

Sankey Diagrams

Sankey Diagrams have found increasing favor for visualization of data.  This visualization tool has been around for a long time, traditionally used to visualize the flow of energy, or materials.   .

Now to be sure, GTL does have a statement design for a Sankey Diagram which was implemented only in Flex for use in interactive visualization cases.  The GTL Sankey Diagram statement was not implemented for use in MVA visualization cases due to lack of demand.

However, recently a SAS user asked about creating such graphs using SAS MVA graphics tools.  With SAS 9.4 there are sufficient tools in place to create such a diagram using custom coding without use of annotation.  In SAS 9.4M3, more tools are available that makes this task easier.  I have outlined the process below.

Sankey_2_940The diagram created using the SAS 9.4 SGPLOT procedure is shown on the right.  Click on the diagram to see bigger view.  Since no SANKEY statement is available in SGPLOT, such a diagram requires custom coding.  However, no annotation is required.   The program uses the following statements:

  • Series with SmoothConnect for the curves.
  • Highlow plots nodes and link values.
  • Scatter plot with MarkerChar for node labels.
  • Series plot to draw the brackets.
  • Scatter plot with MarkerChar for labels 1,2,3.

A custom data set has to be created to draw the different parts of the diagram as shown in the attached program link at the bottom.

SankeyThe diagram shown on the right uses the new SPLINE statement to be released soon with SAS 9.4M3.  This makes the process a little easier, as the spline is a smooth curve that does not need to pass through each of the vertex points.  The SAS 9.4M3 SGPLOT also supports varying line thickness for series and spline statements.

Clearly the data is hand-built for this particular diagram.  I believe this process can be converted to a macro to create a Sankey Diagram from a node-link data set with the appropriate information.  Things will get more interesting as the diagram includes links splits or merges at various nodes.

SAS 9.4 SGPLOT Code:   Sankey_940

Post a Comment

A 3D Scatter Plot Animation Macro

In the previous article, I described the process to create a 3D Scatter Plot using a 3D Orthographic View matrix and the SGPLOT procedure.  I posted a macro that can be used to create a 3D scatter plot from any SAS data set, using 3 numeric columns, one each for X, Y and Z (Response) axes.

Visualization of 3D data can be improved by providing interaction or animation.  Here I have described a way to create an animation using the idea described in the previous article.


The setup for the animation is as follows:

options papersize=('5 in', '4 in') printerpath=gif animation=start 
        animduration=0.05 animloop=yes noanimoverlay;
ods printer dpi=100 file='C:\Class3DScatterAnim.gif';
ods listing image_dpi=200;
ods graphics / reset attrpriority=color width=5in height=4in imagefmt=GIF;
%run_anim_macro(data=sashelp.class, start=-30, end=-60, incr=-1);
%run_anim_macro(data=sashelp.class, start=-60, end=-30, incr=1);
options printerpath=gif animation=stop;
ods printer close;

I have modified the %Ortho3D_Macro provided in the previous article, and added a loop to render multiple graphs with changing value for the Z-Rotation from -30 to -60 and back by 1 degree.  Here I have created a GIF animation.  An SVG animation can also be created using the code at the bottom of the attached file.

3D Animation Macro:  Ortho_3D_Animation

Matrix Multiplication Function:  Matrix_Functions

Post a Comment

A 3D Scatter Plot Macro

The SG Procedures do not support creating a 3D scatter plot.   GTL has some support for 3D graphs, including a 3D Bi-variate Histogram and a 3D Surface, but still no 3D point cloud.  The lack of such a feature is not due to any difficulty in doing this as GTL already support the LAYOUT OVERLAY3D container, but the fact that there was no one urgently requesting such a feature.

However, often we do have a need for  visualization of 3D data, and it would be nice to be able to do this.  So, here I have presented a macro that uses the features of the SGPLOT procedure to display 3D data.  This uses SAS 9.4 features to render the walls, axis labels and the  filled "spherical-looking" markers.

%Ortho3D_Macro (Data=sashelp.class, WallData=wall_Axes, 
                X=height, Y=Age, Z=Weight,
                Lblx=Height, Lbly=Age, Lblz=Weight, 
                Group=Sex, Attrmap=attrmap, Tilt=65, Rotate=-55, 
                Title=Plot of Weight by Height and Age);

Note the following items in the macro invocation above:

  • The data set to be viewed is provided.
  • A data set defining the 3D walls is provided.  This is shown in detail in the program code.
  • The three columns to be mapped to each axis are provided.
  • X and Y form the two independent variables, and the response variable is displayed on the vertical Z axis.
  • Labels for each axis can be specified.
  • An Attribute map is used to set the visual attributes of the walls and bounding box of the data.  This is shown in the code.
  • A group variable can be used to color the markers.
  • Viewing parameters Tilt (0 to 90) and Rotate (-15 to -75)  are ideal.
  • Title can be set.

The macro maps the 3D data to a unit cube, and projects the data into the view space using an ORTHOGRAPHIC projection.  This avoids distortion of the data that can happen when using a perspective projection.


Here are the features of the graph:

  • Spherical looking markers are drawn at each (x, y, z) location.
  • Axis labels are drawn, but not the tick values.  That could be added, but can get messy.   The idea is to really see the shape of the data.
  • Relative positions of the markers can be a challenge to view in a static 3D view.  So, the X-Y, X-Z and Y-Z projections for each point are also displayed to help locate the points in 3D space.
  • Needles are dropped to the floor.
  • View parameters are displayed.

The same macro can be invoked for other data such as all the Sedans in the Cars data as shown below:

%Ortho3D_Macro (Data=sedans, WallData=wall_Axes, 
                X=horsepower, Y=Weight, Z=mpg_city,
                Lblx=Horsepower, Lbly=Weight, Lblz=Mileage, 
                Group=origin, Attrmap=attrmap, Tilt=45, Rotate=-60, 
                Title=Plot of Mileage by Horsepower and Weight);


For those interested in the process for projection of 3D data on to a 2D plane, the View Matrix for the Orthographic Projection is shown below.


We can also use the standard way to create an animated GIF or SVG file that helps in the visualization of the data.  I will include that in the next post.

Note the macro is provided for illustration purposes on what is possible.  I have not rigorously tested all settings and use cases.  If visualization of 3D data is something you feel you need, please chime in with your suggestions for more 3D plots right here, or to SAS Technical Support.

Full SAS 9.4 Code:  Ortho_3D_Macro_94

Matrix Functions:  Matrix_Functions

Post a Comment

Margin Plots

Last week a user wanted to view the distribution of data using a Box Plot.  The issue was the presence of a lot of "bad" data.  I got to thinking of ways such data can be visualized.  I also discussed the matter with our resident expert Rick Wicklin who pointed me to a couple of resources including some information on visualization of missing data on the web.

First, my usual disclaimer:  I am only a "Graph Guy", and not a Statistician.  So, my thoughts below are mainly graphical suggestions.  Please feel free to point out pros and cons of the techniques discussed below.

Box_MissingOn the issue of visualizing data using Box plots, I simulated some data using sashelp.heart. by setting some data to missing, and setting those values to zero in another column.  Then, I used a box plot to view the data, and overlaid a scatter plot to view the values that were set to missing.  Since I put those observations in another column with a value of zero, they all show up at the bottom of the graph.  You can select the appropriate value.  I set the Y axis so zero is not on the axis.

SGPLOT with  SAS 9.40M1  supports overlays of basic plots with a VBOX.   Note how we can see that some of the "Cancer" and "Coronary Heart Disease" data is "bad", in this case, "missing".

title 'Cholesterol by Death Cause';
proc sgplot data=heart_Box noautolegend;
  vbox cholesterol / category=deathcause extreme;
  scatter x=deathcause y=chol / markerattrs=graphdata1(symbol=circlefilled) 
          transparency=0.5  name='s' jitter jitterwidth=0.5 legendlabel='Missing Data';
  keylegend 's' / location=inside position=topleft;
  xaxis display=(noticks nolabel);
  yaxis values=(100 to 500 by 100) min=0 valueshint;

Margin_Systolic_BoxThe user also made a comment on how the data was so skewed, that a box plot was not possible.  That got me looking for another way to view the same data.  This time, I replaced some values for Cholesterol and Systolic with missing values, copying them into other variables.

Now, I plotted Systolic by Cholesterol, which displayed the cloud of non-missing values.  Then I added a box plot for all the values and a box for just the values where cholesterol was missing.  The graph is shown on the right.  Click on graph for a higher resolution image.

The blue box is of all the observations where cholesterol is non-missing.  Red box is for observations where cholesterol is missing, but the systolic has a valid value.  Once again, this is possible with SAS 9.40M1 SGPLOT.  For the VBOX data, I have set the "category" values to 10 and 20.  Since the axes are "Linear" by the Scatter plot, this combination is possible.

proc sgplot data=heart_2D noautolegend;
  scatter x=chol y=syst / name='s' markerattrs=graphdata1  legendlabel='Non Missing Data'
          markerattrs=graphdata1(symbol=circlefilled) transparency=0.7;
  vbox systBox / category=cholA extreme group=systgrp fill nooutliers name='b' boxwidth=1;
  keylegend 's' 'b';
  xaxis min=0 values=(100 to 500 by 100) valueshint grid label='Cholesterol';
  yaxis min=0 values=( 50 to 300 by 50) valueshint grid label='Systolic';

Margin_Cholesterol_ScatNow, this gives us some idea where all the data is, but still this may not work well if the distribution of the data is bi-modal. We can create the same graph using a scatter plot of the data instead of box.

Here, I have displayed a scatter of the non-missing data  along with another scatter plot with two groups - All data and Missing Systolic data.  Maybe this view can provide a better visualization of the missing data.  We can certainly add insets to indicate the percentage of the missing data.

Another way may be to use a HISTOGRAM instead of the VBOX or SCATTER to view the distribution of the missing data.  I will take that up in a follow-up post.

Even with SAS 9.40M1, SGPLOT will allow us only to view one distribution at a time.  If we want to plot both the distribution of the Systolic for missing Cholesterol and vice-versa, we will need to use GTL.  Also, if you have a SAS release prior to SAS 9.40M1, you can use GTL to create the VBOX + SCATTER overlay graphs shown above.

Margin_2DThe graph with box plots of all and missing data is shown on the right.  This graph is created using GTL.  It uses only one LAYOUT OVERLAY, since the categorical values for the box plots is also numeric.  However, we can use a LAYOUT LATTICE to create other combinations.

 Full SAS9.40M1 Code:  Margin_Plot

Post a Comment

Cancer Deaths Averted

cancer_mortalitySignificant progress in reduction of Cancer mortality is shown in a graph that I noticed recently on the Cancer Network web site.  This graph showed the actual and projected cancer mortality by year for males.  The graph is shown on the right.

The graph plots the projected and actual numbers by year, and highlights the difference using the hatched pattern.  The total number of Cancer Deaths Averted is shown.

The graph on the right includes a Y axis data range all the way down to zero, where it is really not necessary.  But, we can use this space that is otherwise wasted to display more information.

Creating the graph is easy, using the following SGPLOT code.  Some options are trimmed to fit the space.  See full code in link at bottom for the details.

title 'Cancer Deaths';
proc sgplot data=mortality nocycleattrs nowall noborder;
  styleattrs datalinepatterns=(solid);
  highlow x=year low=actual high=projected / type=line;
  series x=year y=projected / name='b' legendlabel='Projected';
  series x=year y=actual / name='a' legendlabel='Actual';
  keylegend 'a' 'b' / location=inside position=topleft linelength=20;
  xaxis values=(1975 to 2010 by 5) grid;
  yaxis values=(0 to  450000 by 50000) grid;

Mortality_Diff_2The graph is shown on the right is created by the code shown above.  The data is "eye-balled" from the original graph and includes the columns of Year, Actual, Predicted and Diff.  The total number of deaths averted is saved in a macro, and also inserted into the label to be displayed.

Two SERIES plots are used to plot the actual and predicted curves.  A HIGHLOW plot is used to draw the vertical hatch marks showing the reduction in the cancer deaths since 1990.  A legend is added to indicate the actual and predicted curves.


For the graph shown below on the right, a Band plot is added to display the reduction in cancer deaths by year explicitly.   Also, we have used a TEXT plot statement to display the inset indicating the number of deaths averted.



There some benefits of this addition.  The empty area at the bottom of the graph is utilized.  The actual deaths averted are drawn from a common baseline, thus removing the distortions in the hatched area due the varying baseline.  An an explicit inset shows the estimated number of deaths averted.

The TEXT plot is a SAS 9.4M2 feature, but one could use an INSET or a SCATTER with MarkerChar to do something similar.

Full SAS code:  CancerDeaths

Post a Comment

Displaying Unicode Symbols in Legend

Including special Unicode symbols into the graph is getting more popular.  In general, SG procedures support Unicode strings in places where these strings are coded into the syntax such as TITLE, FOOTNOTE.  These support Unicode characters and also the  special {SUP} and {SUB} commands.  This is because these statements are rendered by the graph using Java string API.

Curve Labels and Axis Labels that are assigned in the procedure syntax can also support Unicode, but not the {SUP} and {SUB} commands.  This is because these items are passed to the graph rendering engine which cannot handle the {SUP} and {SUB} commands.  However, most of the popular numeric sub and super scripts are available in the Unicode fonts, so much of the need is covered.

Recently, a user chimed in on the Communities page, wanting to include Unicode values in the Legend.  The group variable values include Unicode strings like "Less than or equals", and the journal preferred usage of the Unicode <= symbol, not the "<=" sequence of characters.

Data_GroupsWith all the releases of SAS till date, the SGPLOT procedure cannot support Unicode from data or formats into the graph legends or axis.  However there is a way to do this  by restructuring the grouped data into a multi-column format.

A few observations of the original data are shown on the right.  I have added a column based on the level of the Systolic Blood Pressure called "Status".

We could plot a Graph of Weight by Height by Status, and get a scatter plot of the data, with the "Status" values displayed in the legend as "GE160" and so on.  However that is not what user wants, and rather have the numeric values with the "<=" symbols.

Data_ColumnsThe transformed data set is shown on the right.  Here, I have created four new columns, each containing the appropriate value for weight based on the Status.  So, this result in some missing values in the new columns.

Now, instead of using one scatter plot with the GROUP option, we will plot these four columns using four scatter plots as shown below.  All of the scatter plot are without any group variable, and I have used the LEGENDLABEL option to provide the label for each scatter plot.  These labels include Unicode characters.

ods escapechar '~';
title 'Blood Pressure by Weight by Height';
proc sgplot data=heart_cols;
  scatter x=height y=ge160 / legendlabel="160 ~{Unicode '2264'x} Systolic ";
  scatter x=height y=ge140 / legendlabel="140 ~{Unicode '2264'x} Systolic &lt; 160";
  scatter x=height y=ge120 / legendlabel="120 ~{Unicode '2264'x} Systolic &lt; 140";
  scatter x=height y=lt120 / legendlabel="Systolic &lt; 120";
  keylegend / title='' location=inside position=topleft across=1;


Click on the graph for a higher resolution view.  Note the legend on the top left contains the ranges for the Systolic blood pressure, using the appropriate Unicode symbols.  Each scatter plot in the graph is represented in the legend by the LEGENDLABEL.  The legend label can be assigned Unicode values as shown above.

Now, the legend in the graph can be improved if we can position all the "Systolic" labels in the legend vertically aligned.  To do this, one might want to add some blanks to the front of the text string in the Legend label for the fourth scatter plot.  However, this will not work, as all leading blanks are automatically stripped.  But, the system can be tricked to not strip the leading blanks by first adding a non-breaking space character 'A0'x in the label string followed by the required number of blanks. This is shown in the code and graph below.

ods escapechar '~';
title 'Blood Pressure by Weight by Height';
proc sgplot data=heart_cols;
  scatter x=height y=ge160 / legendlabel="160 ~{Unicode '2264'x} Systolic ";
  scatter x=height y=ge140 / legendlabel="140 ~{Unicode '2264'x} Systolic &lt; 160";
  scatter x=height y=ge120 / legendlabel="120 ~{Unicode '2264'x} Systolic &lt; 140";
  scatter x=height y=lt120 / legendlabel="~{Unicode '00a0'x}         Systolic &lt; 120";
  keylegend / title='' location=inside position=topleft across=1;


In the legend for the graph above, all the "Systolic" terms are correctly aligned, making the legend a bit easier to read.  Note, this process needs custom handling.  Full code is provided in the link below.

The good news is that support for Unicode in the graphs will be included with SAS 9.40M3 release using User Defined Formats.  With this approach, you will be able to format any data value into a string that can include Unicode symbols.  Thus group values or axis tick values can be customized programmatically.

Full SAS 9.3 Code:  LegendSymbols_930







Post a Comment

Marker Symbols

There has been much discussion on the SAS Communities page on usage of different symbols in a graph.  The solutioin can vary based on the SAS release.  New features have been added at SAS 9.4 releases to SG Procedures and GTL that make this very easy.  With SAS 9.4M1, almost any combination is possible.

Symbols_ColorOnlyThe user has a relative simple scatter plot with two class levels.  The graph on the right is easily created using a scatter plot with a group role.  The code is shown below.

Note, starting with SAS 9.3, ODS HTML is the default open destination, using the HTMLBlue style.  This is a "Color" priority style, where each group gets only a color change till all Style Elements are used.  So, you do not see varying marker symbols in the graph on the right.

title 'Mileage by Horsepower by Make'; 
proc sgplot data=cars;
  scatter x=horsepower y=mpg_city / group=make;
  keylegend / location=inside position=topright;
  yaxis grid integer;
  xaxis grid;

Symbols_ColorSymbolYou can run the same graph with a style like LISTING, or set ATTRPRIORITY=none in the ODS Graphics statement to get the graph on the right.  Now, each group gets a different color and a different marker symbol.  These come from the style GraphData1-12 elements.

 ods graphics / reset attrpriority=none;

The user wanted to use the symbols "X" and "Y" instead of the "circle" and "plus" symbols that are the default first two symbols in the GraphData1-12 elements list.  This in itself is very easy, since the "X" and "Y" symbols are included in the list of built-in symbols supported by these procedures.  All you need to do is change the default symbols in the GraphData1-12 elements.

Symbols_BuiltIn_94With SAS 9.4, it is very easy to change the group attributes by using the STYLEATTRS statement in SGPLOT.  This feature provides a simple in-line way to modify the list of color, contrast color, symbols and line patterns used for the group values, as shown in the code snippet below.

The list of values provided REPLACE the default group list as if this came from the style.  So, now the group cycling uses only the two symbols "X" and "Y" provided in the list.

proc sgplot data=cars;
  styleattrs datasymbols=(X Y);
  scatter x=horsepower y=mpg_city / group=make;

But what if you want to use some special symbols that are not provided in the built-in list of symbols?  You can do that with SAS9.4M1 using the new statements SYBMOLCHAR and SYMBOLIMAGE.  SymbolChar statement supports the ability to use any character from any font as a symbol.  Using a Unicode font allows you thousands of symbols that can be used.

Symbols_Others_94Say you want to use the greek symbols  for "Alpha" and "Beta" as the marker symbols.  You can define a new symbol name using the SYMBOLCHAR statement and then include that in the list of group symbols to be used using the STYLEATTRS statement.  The code snippet is shown below, and the resulting graph is shown on the right.  Click on graph for a higher resolution view.

proc sgplot data=cars;
  symbolchar name=Alpha char='03b1'x / scale=1.8;
  symbolchar name=Beta char='03b2'x  / scale=1.8;
  styleattrs datasymbols=(Alpha Beta);
  scatter x=horsepower y=mpg_city / group=make markerattrs=(size=9);

Note the use of the SCALE option above.  Most font glyphs do not occupy all the pixels in the glyph.  So, these symbols may appear small.  The scale options allows us to scale them up.

Symbols_Image_94And now, the "pièce de résistance".  In many cases, such as the case here, we can use symbols that not only distinguish between the group values, but by themselves provide information on what they represent.

The SYMBOLIMAGE statement allows you to define new symbols from images.  These can then be used for group values using the STYLEATTRS statement, just like shown above.  Here is a graph using image symbols.  Note, I have removed the legend just to make this point.  The markers do not require any legend to explain what they stand for (for most users).

It helps to make the images have a transparent background, so the shape of the icon is visible, and does not block other markers.  The images must be available on the local file system.

proc sgplot data=cars noautolegend;
  symbolimage name=BMW image="C:\BMWTrans.png";
  symbolimage name=Porsche image="C:\PorscheTrans.png";
  styleattrs datasymbols=(BMW Porsche);
  scatter x=horsepower y=mpg_city / group=make;

Full SAS 9.4 Code:  Symbol

Post a Comment