Histogram with Gradient Color

Recently a user posted a query on the SAS Communities page asking on how to create a histogram where the bins of the histogram are colored by the analysis variable using a three color ramp.  Essentially, he wanted the bins to be colored from "Low" to "High" along the horizontal axis.

Color_Group_BarThe benefit of coloring the bins of the histogram in this manner was not immediately clear to me, but i assume the user had a good reason.  So I gave some thought to how we might create such a graph using the SAS 9.3 SGPLOT procedure.  The result user wanted  is shown on the right.

Clearly, the SGPLOT Histogram statement does not support gradient coloring at any release, including SAS 9.40M3.  So, one way to do this is to use the VBAR statement with the GROUP option, where the group colors are obtained from the Discrete Attributes Map.  There may be other ways too, and I would be happy to hear your ideas.

ChBin_Data_2Step 1 is to bin the data, in this case the Cholesterol column in the sashelp.heart data set.  I have selected a bin interval of 10 and computed another variable ChBin whose values is between 90 and 360.  The min and max value of bin are saved in the macro variables.

/*--Bin the data by cholesterol--*/
data ChBin;
  label ChBin='Cholesterol';
  retain BinInt 10 maxbin 0 minbin 1e6;
  set sashelp.heart(where=(cholesterol > 80 and cholesterol < 360)
                                  keep=Cholesterol Systolic) end=last;
  if cholesterol ne . then ChBin=BinInt*floor(cholesterol/BinInt);
  minbin=min(minbin, chbin);
  maxbin=max(maxbin, chbin);
  if last then do;
    call symput("MinBin", minbin);
    call symput("MaxBin", maxbin);
    call symput("BinInt", BinInt);

ChBin_BarNow, we can plot a histogram of this data by using the VBAR statement as follows.  Since this is really a bar chart, the x axis is discrete, and each tick value is displayed.  I reduced the font to prevent the rotation of each value.  Click on the graph for a higher resolution image.

/*--Histogram using Bar Chart SAS 9.3--*/
proc sgplot data=ChBin noautolegend;
  vbar chbin / barwidth=0.9 nooutline;
  xaxis valueattrs=(size=6);

Now we will color the bars using GROUP=ChBin option, so each bar will be colored individually.  We need to compute a color for each bin value using the "Green" to "Yellow" to "Red" ramp.  We will compute these values and load the Discrete Attributes Map programmatically as follows.

/*--Define attributes map data set--*/
data AttrMap;
  length FillColor $8 LineColor $8;
  ghigh=192; /*--High value for Green--*/
  rhigh=255; /*--High value for Red--*/

  mid=(&minbin + &maxbin) / 2;

  do val=&minbin to &maxbin by &BinInt;
    value=put(val, 5.0);
    if val < mid then do;
      g=ghigh; b=0; r=rhigh*(val-&minbin)/ (mid-&minbin);
    else do;
      r=rhigh; b=0; g=ghigh*(1-((val-&minbin) - (mid-&minbin))/ (mid-&minbin));
    fillcolor='CX' || put(r, hex2.) || put(g, hex2.) || put(b, hex2.);

The color starts at green (CX00C000) and ends at red (CXFF0000).  Each "Value" and "FillColor" is saved into the AttrMap data set.  Then, we use this with the VBAR with GROUP=ChBin and AttrId=Hist.

/*--Histogram using Bar Chart group colors SAS 9.3--*/
proc sgplot data=ChBin dattrmap=AttrMap noautolegend;
  vbar chbin / barwidth=0.9 group=chbin attrid=Hist nooutline;
  xaxis valueattrs=(size=6);

Color_Resp_BarHow easy is this with SAS 9.40M3?  One still need to do the binning, but after that, we can just use the  ColorResponse=ChBin option with Stat=Mean.  The ColorModel can be easily set using the same three colors.

Note also the added benefit of setting a "Linear" x-axis with the VBAR.  This is now supported with SAS 9.40M3, providing us a nice interval axis without the clutter of a "Bin" axis with all values.

/*--Histogram using Bar Chart response colors SAS 9.40M3--*/
proc sgplot data=ChBin noautolegend;
  vbar chbin / barwidth=0.9 colorresponse=chbin colorstat=mean
                         colormodel=(cx00C000 cxFFC000 cxff0000) nooutline;
  xaxis type=linear;

Full program:  Resp_Color_Histogram

Post a Comment

Advanced ODS Graphics Examples

I have written a new book on advanced ODS Graphics examples. It is available as a free PDF file on the web. It is in color, and all of the SAS code is available by double clicking a link at the beginning of each example.

Advanced ODS Graphics Examples

I had presented an introductory tutorial on ODS Graphics for many years. Much of the tutorial material can also be found in these SAS/STAT chapters and SAS Press book that I wrote:

Using the Output Delivery System
Statistical Graphics Using ODS
ODS Graphics Template Modification
Customizing the Kaplan-Meier Survival Plot

Statistical Graphics in SAS®: An Introduction to the Graph Template Language and the Statistical Graphics Procedures

A few years ago, I presented a longer version of the tutorial that included SG annotation, axis tables, and other advanced topics. I started writing additional documentation that included these topics. The new book grew out of that effort. A few examples use the graph template language, most use PROC SGPLOT, and many use SG annotation. These are advanced examples, so if you are not familiar with ODS Graphics, start with the examples in the sources listed above.

The book has the following examples:


  • Multiple Axes, Offsets, and Drop Lines
  • Multiple Axes and Highlighted Points
  • Multiple Axes, Axis Alignment, and Many Tick Labels
  • Broken Axes
  • Multiple Plots with Equated Axes

Axis Tables

  • Axis Table Example Using PROC REG
  • Creating a Forest Plot Using PROC SGPLOT
  • Stem-and-Leaf Plot with a Box Plot
  • Axis Table Example Using PROC AUTOREG


  • Replacing Tick Labels
  • Understanding the Drawing Spaces
  • Displaying Text in a Graph
  • Drawing Lines
  • Custom Markers, No Markers, and the Data Region
  • Displaying Images in a Graph
  • Lines, Circles, Ovals, Rectangles, and Other Shapes
  • Watermarks
  • Rotating Text
  • Continuing Text
  • Shape and Scale of Arrowheads
  • Text Justification and Anchoring
  • Selecting the X, X2, Y, and Y2 Axes
  • Scaling Images
  • Adding Links to Graphs
  • SG Annotation Functions, Variables, and Their Values

Bars, Lines, Curves, and Arrows

  • Adverse Events Plot
  • Attribute Maps
  • Connecting Points with Lines, Arrows, and Curves

Plots of Labeled Points

  • Placing Labels in Scatter Plots
  • Changing How Vectors Are Displayed

Advanced Customization of Graphs That Analytical Procedures Produce

  • Changing Dynamic Variables by Using the ODS Document
  • Annotating Single-Panel Graphs That Analytical Procedures Produce
  • Annotating Multiple-Panel Graphs That Analytical Procedures Produce
Post a Comment

Adverse Events Graph with NNT

Early last year I wrote an article on how to create the "Most Frequent Adverse Events Sorted by Relative Risk" graph using the SGPLOT procedure.  The key issue here is that such a graph normally displays two plots side by side, a scatter plot of the proportion values by treatment on the left and the Relative Risk values on the right.  Since the SGPLOT procedure normally produces single cell graphs, I called this the "Two-in-One Graph".

The technique described in the article is meant for users familiar with SGPLOT who do not use GTL.   The article describes how to use the X and X2 axes with Y (or Y and Y2 with X) to split a single cell into two parts.  This technique useful in other situations too.

However, the right way to create this graph is by using GTL to define a graph layout with two cells using a LAYOUT LATTICE.  Each cell is populated with the appropriate plot to represent the data.  This is exactly what Matt Cravets and Jeff Kopicko of Receptos, Inc.  have done for the graph shown below.  They used GTL to define the graph, allowing them to place the x axis data at the bottom of both graphs.  In addition to that, Matt and Jeff wanted to display the corresponding values of "Numbers Needed to Treat" at the top of the graph on the right.  In this case NNT=1.0 / RiskDiff.

Clearly, the axis on top would include the value infinity when RiskDiff is 0.0.  This is where Matt wrote in to me to see if there was a way to create an X2 axis that would go from smller negative NNT values on the left to -infinity in the middle, and then +infinity to reducing positive NNT values to the right as shown here in the final graph.


In the graph above, the X2 axis on the top of the right cell spans from -5 to -infinity at the Riskdiff=0 point.  Then it goes from +infinity to 6.7.  These are two separate axes, and I cannot think of any way to create this as a real axis.  The way to get this was to draw the same RiskDiff axis for X2, but replace each tick value by the appropriate inverse value.  And for the inverse of 0.0, we used the Unicode "infinity" symbol.  This is done by using the TICKVALUELIST and TICKDISPLAYLIST options on the X2 axis:

x2axisopts=(label="Number needed to treat" tickvalueattrs=(size=7)
                        linearopts=(viewmin=-0.20 viewmax=0.35
                        tickvaluelist=(-0.20 -0.15 -0.1 -0.05 -0.025 0 0.025 0.05 0.1 0.15)
                        tickdisplaylist=('-5' '-6.7' '-10' '-20' '-40' "(*ESC*){unicode '221e'x}" '40' '20' '10' '6.7')

The above technique also made it easy to ensure the corresponding values on the X and X2 axes are correctly lined up.  Since the middle value is really both -/+ infinity, another minor change would be to place the Unicode -/+ sign in the middle as follows.  I have removed the value '40' to reduce congestion.  Note use of  TickValueFitPolicy=none to prevent dropping of valued due to collisions.


The infinity sumbol in the Unicode fonts used is a bit smaller than what I would prefer to see, but there is no way to increase the size of just one symbol in the list.  If this is necessary, one could use SGANNOTATE (SAS 9.40) to insert a bigger symbol.  The SAS 9.4 AXISTABLE is used to display the RiskCI values.

Full SAS 9.4 GTL code (courtesy of Matt and Jeff) : AE_Plot_with_Riskdiff_And_Nnt

Post a Comment

Axis Customizations

All axis customization features are always welcome.  Especially since SGPLOT statements can often be used to create non standard graphs, having the ability to customize the axes is important.  This article presents ways in which you can customize the discrete axes.

VBarDefaultBy default, the x axis will try to display the axis values so that the are legible, and avoid collisions.  Here is a graph for all the Female students in the class data set.  I intentionally set the width such that all the tick values do not fit without collision.  The default FitPolicy is SplitRotate, meaning, split the values if possible (on a white space), and if that does not fit, then rotate it 45 degrees.  This results in the rotated tick values on the axis.  Click on the graph for a higher resolution view.

Many FitPolicy settings are available and some of the common useful ones are Stagger, Rotate and None.

VBarStaggerRotated values are sometimes harder to read, and one can use FitPolicy settings to customize the arrangement.  FitPolicy=Stagger will place every other tick value on a separate line.  Only two levels are used for stagger, resulting in the arrangement on the right.  In this arrangement, the space available for each tick value is about double of the single level.

The software determines the space available, and whether or not a collision is occurring based on the bounding box of the text string.  Some allowances are made for clarity.

VBarNoneHowever, sometimes it is possible that the system decides there is a collision (based on the longest string), but to the eye it appears that the values could fit.  In the case above, the tick values will actually fit in the space provided, even though the software may not determine so.  When such a case is found, the user has the ability to turn off the fitting algorithms using FitPolicy=none.  In the resulting arrangement on the right, "Barbara" does get a bit tight, but this may be  acceptable for some use cases.

VBarVerticalIn some cases, the number of tick values are just too many and a vertical orientation for each tick value text may be useful, as shown on the right.  Here we have used the default FitPolicy, but set the ValuesRotate option to "Vertical", resulting in the arrangement shown on the right.

Many other settings are available for the FitPolicy option.  Please see the software documentation for all the details.

In case of the axis tick values on the Y axis, a different set of issues are in play, and the settings available for the FitPolicy option on the Y axis are different.

HBarDefaultHere is the HBAR graph of the same data for all students with the default settings.  Note, horizontal bars are shown for all the students, but the tick value is shown only for every other value.  This happens as all the tick values will not fit without collision.  In many cases, where there is some ordering in the discrete values this may be acceptable.  However, in the case on the right where each value is a unique string, dropping the display of alternate tick values causes a significant loss of information.


In the graph above, there seems to be sufficient space to draw all tick values.  However, the collision algorithm in the software takes into account things like glyph descenders, etc., resulting into potential collision for some of the values.

In such a case, using FitPolicy=none on the Y axis can be very useful.  As we can see in the graph on the right, all the values are now displayed.  There is really not such a great amount of clutter, and this arrangement may be acceptable.

HBarHAlignLeftAs we know, plot statements can be combined to create all kinds of standard and nonstandard graphs, including Forest Plots.  In such cases, it is often desirable for the Y axis tick values to be left justified, as in a table.  An option is available for the Y axis tick values to support just such a use case called ValuesHAlign=Left | Center | Right.


SAS 9.40M3 sample code:

title 'Height by Name';
  proc sgplot data=sashelp.class noborder;
  hbar name / response=height nostatlabel baselineattrs=(thickness=0)
            filltype=gradient dataskin=pressed fillattrs=graphdata3;
  yaxis display=(nolabel noline noticks) fitpolicy=none valueshalign=left;
  xaxis display=(noline);

Full SAS 9.40M3 code:  Axis_Values

Post a Comment

Response Colors and Thickness

Often there is a need to display more than one response simultaneously for a bar chart, series plot or a vector plot.  SAS 9.40M3 adds the options you need to get such results using two new options COLORRESPONSE and THICKRESPONSE where applicable.

VBarResponseColorThe Bar Chart on the right shows the frequency of subjects by death cause.  The height of the bar represents the frequency.  The color of the bar is mapped to the mean value of Cholesterol for each bar.  Here we can see that the subjects that died of Coronary Heart Disease had the highest mean cholesterol value.  We have used the COLORRESPONSE option to specify a different variable and also COLORSTAT=mean.

This graph also uses a slightly different look created by suppressing the y-axis label, line and ticks.  Only the values and grids are shown.  The x-axis line and ticks are suppressed.  Click on the graph for a high resolution display.

SAS 9.40M3 Code for Bar Chart:

title 'Frequency and Mean Cholesterol by Death Cause';
  proc sgplot data=sashelp.heart noborder;
  vbar deathcause / colorresponse=cholesterol colorstat=mean barwidth=0.7
           colormodel=(green gold red) datalabel dataskin=pressed
  xaxis display=(noticks nolabel noline) ;
  yaxis display=(noticks nolabel noline) grid ;

SeriesResponseColorSizeSimilarly, the SERIES plot also supports setting the color of each line by a response variable, and also display of the thickness of each line by another response variable.  The graph on the right shows two response values by time, one for the position of the point on the line and another for setting both color and the thickness.  We could have used a third variable for thickness.  The color and thickness response values are considered the same for the entire line, so only one value (the first one) is used per line.

SAS 9.40M3 Code for Series Plot:

title 'Values and Response by Treatment';
  footnote j=l 'Thickness by Response';
  proc sgplot data=seriesResp noborder;
  series x=date y=val / group=drug colorresponse=resp colormodel=(green gold red)
  xaxis display=(noticks nolabel noline) grid ;
  yaxis display=(noticks noline) grid ;

VectorResponseColorSizeColor and Thickness response are supported for Bar, Series, Step, Vector plots.  The graph on the right shows a VECTOR plot showing the blood pressure range for each subject in the study by Cholesterol.  The thickness and color of the vector are mapped to the cholesterol value.  Of course, you could map color and thickness to different variables for a more expressive graph.

SAS 9.40M3 Code for Vector Plot:

title 'Blood Pressure Range by Cholesterol';
footnote j=l 'Thickness by Cholesterol';
proc sgplot data=sashelp.heart(where=(ageatstart > 61)) noborder;
  vector x=cholesterol y=systolic / xorigin=cholesterol yorigin=diastolic
              colorresponse=cholesterol colormodel=(green gold red)
  xaxis display=(noticks noline) grid ;
  yaxis display=(noticks noline) grid label='Blood Pressure';

Full SAS 9.40M3 Code:  ResponseColor

Post a Comment

Something Different - SAS 9.40M3

The SGPLOT procedure provides great tools to create all kinds of graphs for all domains from business to clinical.  However, every so often, we need to create visuals that are not exactly graphs, but more like flow or network diagrams, or something entirely unique.  Some users may have tools to create network diagrams easily, but let us see what we can do with Base SAS.

SGPLOT provides you ways to create such visuals, without the need for custom tools or annotate.  If you can analyse your end result and break it down into component parts, you can often use the layering features of SGPLOT to create your graph.  With SAS 9.4, we have been adding features to SGPLOT that help you in these tasks.

In this article, I will cover two such visuals, a process flow type graph and a Diagram.  Questions about such visuals have been asked in recent posts to the communities page.

ClinicalTrialsProcessLabelHere is a flow chart showing the Clinical Trials Approval process that I recently saw on the web showing the steps in the approval process.  I created this using the SGPLOT procedure, by layering the appropriate plot statements to create the visual.  Click on the graph for a more detailed view.

Let us break down the components of the visual and see how we can layer different SGPLOT plot statements to create the visual

  1. The visual shows the 6 steps in the process.  I have created the 6 regions by date using a BLOCK plot.
  2. This is overlaid by a STEP plot showing the same steps in an incremental fashion.  Note the arrow head at the end of the step plot.  SAS 9.4M3 now provides you with an easy way to add arrowheads at the ends of any series or step plot.  The matte skin provides a faint drop shadow.
  3. Each step of the process is labeled by a using a TEXT plot.  The label is wrapped over multiple lines using options.
  4. The notes about the duration of each segment are displayed by a TEXT plot and a VECTOR plot.  The third label is split over multiple lines by using an explicit split character.

Here is the SAS 9.40M3 SGPLOT code.  Full code is attached in the link at the bottom:

proc sgplot data=process noautolegend;
  block x=x block=Stage / filltype=alternate nooutline novalues  <options>;
  step x=x y=y / arrowheadpos=end lineattrs=(thickness=5 color=white) dataskin=matte;
  text x=xs y=ys text=Stage / splitpolicy=split;
  text x=xs y=yd text=Duration / splitchar='.' splitpolicy=splitalways;
  vector x=x1 y=y1 / xorigin=x2 yorigin=y1;
  inset "Clinical Trials Process" / position=topleft;
  yaxis min=0 max=1 display=none;
  xaxis offsetmin=0.02 offsetmax=0.02 display=none;

DiagramNow, let us review another example of a non-graph visual created using the SGPLOT procedure.  This is really in response to a question on the communities page last month.  The user wanted to create a diagram showing the relationship between providers and patients in this simple diagram.   Click on the graph for a more detailed view.

The diagram layout is really simple.  Since a complex layout does not need to be computed,  we can use the SGPLOT procedure to create this one.   The dataset is simply a combination of Node data and Link data.  The node data shows 5 providers listed vertically and 5 patients by node id.  These could be more or less.  The link data is multi point series by link id.   The links also have a "Response" value representing the number of visits represented visually as the line thickness.

  1. I use a BUBBLE plot to draw the nodes, placed explicitly as provided by the (x, y) coordinates provided in the data set.
  2. I used a SERIES plot to draw the connections by LinkId between the nodes.  While I did this using hard coded values in the data, these can be computed based on connectivity using a Hash component.
  3. The thickness of each link is determined by the THICKRESP option.
  4. X2Axis is used to place the labels (tick values) at the top.

proc sgplot data=Diagram noautolegend nowall noborder;
  series x=xl y=yl / group=link thickresp=ls thickmaxresp=5 thickmax=5 lineattrs=graphdatadefault x2axis;
  bubble x=xn y=yn size=ns / bradiusmin=15 bradiusmax=16 datalabel=node datalabelpos=center
               x2axis dataskin=gloss;
  x2axis display=(nolabel noticks noline) offsetmin=0.2 offsetmax=0.2;
  yaxis display=none;

DiagramSplineSAS 9.40M3 also provides a new SPLINE statement.  Here, to keep the example simple, I have added another vertex in between "Provider" and "Patient" called "Mid", and then suppressed its display on the X2Axis.  Using the SPLINE statement instead of SERIES draws the curved links.  See full code in the link below.

SAS 9.40M3 Code:  Diagrams

Post a Comment

Broken Axis Redux

Often when the data includes some extreme difference in measures or some outliers, the plot of the data points can get skewed due to the need to accommodate the extreme outliers.  The bulk of the observations get squeezed into a smaller region of the plot.  While this may be useful in some cases, often we want to allow the data to fill the plot region while still drawing the outliers.

BarIn such cases, a broken axis  allows us to do just this.  The previous article described how to use the Broken Axis feature first released with SAS 9.40M1.

On the right is a contrived example of a case where the response value for one of the categories (E) is very large as compared to the other values.  The large value causes the other values for categories "A", "B", "C" and "D" to be scaled down to where it is hard to make a good comparisons between these values.


The graph on the right uses the Broken Axis feature first released with SAS 9.4M1, using the code shown below.  The RANGES option on the YAXIS statement allows us to specify the data ranges we want to keep in the graph.

proc sgplot data=tallbar;
  vbar x / response=y nostatlabel;
  yaxis ranges=(min-44  384-max) values=(0 to 400 by 10);

In this case, we have requested to keep the ranges from minimum to 44 and from 384 to maximum.  Note, the value "44" is selected to place the break between the ticks.  Also, the axis VALUES need to be provided to give the best results.  The same tick interval is preserved in all the axis ranges.  There can be more than one break, but on only one axis at a time.

ScatterBrokenAxisBracketWith SAS 9.40M1, the break was represented in the graph as the "Full Break" shown in the graph above.  This was useful when breaking plot elements like bars or needles that have a continuous nature, and span over large parts of the graph width or height.  However, this type of break was considered too distracting, and not preferred when breaking an axis for a scatter plot.

Based on such feedback from users, SAS 9.40M3 release includes the ability for us to request a lighter weight "Axis Break" as shown in the graph on the right.  The code is shown below.  Note the use of the AXISBREAK option inthe STYLEATTRS statement.  Here we have selected "Bracket", which breaks only the axis using the two short bars.  Click on the graph for a higher resolution view.

proc sgplot data=outOfRange;
  styleattrs axisbreak=bracket;
  reg x=x y=y / clm markerattrs=(size=5);
  yaxis ranges=(min-1.5 8.9-max) values=(0 to 10 by 0.2) valueshint;

ScatterBrokenAxisSparkMany different types of break symbols can be used, such as the "Spark" shown in the graph on the right.  If the axis line is absent, then the symbol is not shown.

Note another interesting feature in the graph on the right.  The X and Y axis lines does not extend all the way to touch each other.  This provides another "look and feel" for the graph display, often preferred by some users. This is controlled by the AXISEXTENT option on the STYLEATTRS statement as shown below.  Note, the axis line extends to the data range only.  So, in this case, the data range goes beyond the extreme tick marks on the y-axis.

styleattrs axisbreak=spark axisextent=data ;

Full SAS9.40M3 Code: BrokenAxis 

Post a Comment

A Macro for Polygon Area and Center

A few weeks back I saw a couple of posts on the Communities page from users wanting to find ways to compute the area of an general polygon and also the center of the area.  I felt such features likely existed somewhere in the SAS/GRAPH set of procedures, so I asked our resident expert(s).  Initially, there was some miscommunication due to the requirement to compute areas.  The %Centroid macro is available among the Annotate macros but it does not report areas.  Also, see another clarification at the bottom of this article.

DataIn the meantime, this piqued my interest and I took a stab at it and wrote up a macro to compute the Area and the centroid of the polygon as described below.

First, I needed a few random polygons, so I wrote a small routine to generate the data in the format on the right.   Changing the seed values can generate different shapes, some concave.   I generated 3 polygons with random number of nodes, and  added one custom polygon to get a specific shapes like the right triangle and a concave "L" shape for verification.

polyAreaSGThen, I used the POLYGON statement in SAS 9.4 SGPLOT procedure to plot the polygons to see what I have, as shown on the right.  See code in the link below.

The macro is included in the code and takes a data set with columns shown above, along with the global XMin and YMin of the entire data (easily computed) and placed into a couple of macro variables XMin and YMin.  The macro takes the name of the input data set "DS", the xmin and ymin values previously computed (just to save a step in the macro to compute them), and polygon data with polygon "ID", and the coordinates.  The result is placed in the output data set "OUT".

%macro polyarea (ds=, xmin=, ymin=, out=, Id=, X=, Y=);

The macro steps around the polygon nodes and incrementally adds the segment areas of each parallelogram against both the X and Y axis.  The sum of each segment around X or Y should be (and are) equal.  Then it takes the first moment of the area of each parallelogram segment with its center about the XMin and YMin axes to compute the center of the area in each direction.  The "X" areas are used for computing the x coordinate, and "Y" areas for the y coordinate.

polyAreaCentroidSG1The result is plotted with the SGPLOT polygon statement with a TEXT overlay to display the area and the location (x, y) of the center of area as shown on the right.  The results "look" kind of right to the eye.

I did not attempt to handle polygons with holes.  There is a good chance the algorithm itself will work for a polygon with holes.

I did not attempt to handle polygons with multiple segments, like in a map data sets.  This macro could be extended to get the area and CG of each segment by making each polygon have a unique id.  But to get the area of the composite multi-segment polygon would need some more thought.  That exercise is left to the motivated reader (meaning I am ducking that exercise).  :-).

If the polygon is highly concave, it is possible the CG will not be within the polygon boundary.  That would be another good exercise to think about.

I mentioned the %Centroid macro earlier.  Another difference between this macro and the %Centroid Macro is that this macro computes the mathematical centroid of each polygon, while the %Centroid macro computes a "good" location for labeling of a polygon, and not the mathematical centroid.  If you want to label a polygon (like state or country name), the %Centroid may be the preferred tool.

Area and Centroid Macro:  AreaCentroidMacro 


Post a Comment

Annotating multiple panels

In the past few weeks, I have written two blogs on SG annotation and on saving and then modifying the graphs that analytical procedures produce:
  Modifying dynamic variables in ODS Graphics
  Annotating graphs from analytical PROCs

Today, I finish this series with one more blog. This one shows how you can annotate graphs with multiple panels. If you want to fully understand today's blog, you will need to understand my previous two blogs. Those blogs show you how to run an analytical procedure, output the data object that underlies a graph, save the dynamic variables in an ODS document, process either the template or the PROC SGRENDER call to incorporate the dynamic variables, and then modify the graph. In this example, you will learn how to use a macro to add ANNOTATE statements to each LAYOUT OVERLAY code block in a template that an analytical procedure uses. This enables you to send annotations to each panel and use panel-specific drawing spaces. In contrast, my previous blog showed you how to add a single ANNOTATE statement to the template, which enables annotation but does not provide the ability to specify panel-specific drawing spaces. This example also modifies the data object and the graph template.

I hope that everyone cringed after reading the last sentence. Modifies the data object? Really? Yes, there are times when it makes sense. However, you should never change the data that underlie a graph. This example changes the data object in order to change which parts of the graph are labeled; no numbers are changed.

This example works with the standardized coefficients progression plot in PROC GLMSELECT. The following step creates the plot:

ods graphics on;
proc glmselect data=sashelp.baseball plots=coefficients;
   class league division;
   model logSalary = nAtBat nHits nHome nRuns nRBI nBB
                     yrMajor|yrMajor crAtBat|crAtBat crHits|crHits
                     crHome|crHome crRuns|crRuns crRbi|crRbi
                     crBB|crBB league division nOuts nAssts nError /
                     selection=forward(stop=AICC CHOOSE=SBC);

Click on a graph to enlarge.


The graph shows how the coefficients change as new terms enter the model. PROC GLMSELECT labels some of the series plots. It is common in this graph for several coefficients to have similar values in the final model. PROC GLMSELECT tries to thin labels to avoid conflicts. For example, the first term that enters the model after the intercept is CrRuns. Its label is not displayed since it would conflict with the label for CrHits. In this example, you will learn how to select a different set of labels to display. In particular, you will display labels for the standardized coefficients in the selected model that are outside the range -1 to 1. This requires you to change the data object to change which series plots are labeled. Then you can add annotation to highlight the selected model. In PROC GLMSELECT, the final model does not usually correspond to the end of the progression of the coefficients. In this case, it corresponds to the model that is displayed at the reference line at step 9.

You can preview the results as they will be after annotation next.


You begin by creating a data object and storing the graph along with the dynamic variables in an ODS document:

ods document name=MyDoc (write);
proc glmselect data=sashelp.baseball plots=coefficients;
   ods select CoefficientPanel;
   ods output CoefficientPanel=cp;
   class league division;
   model logSalary = nAtBat nHits nHome nRuns nRBI nBB
                     yrMajor|yrMajor crAtBat|crAtBat crHits|crHits
                     crHome|crHome crRuns|crRuns crRbi|crRbi
                     crBB|crBB league division nOuts nAssts nError /
                     selection=forward(stop=AICC CHOOSE=SBC);
ods document close;

The next step reads the data object, extracts the parameter labels for the coefficients that are greater than 1 and less then -1 in the selected model, and outputs to a macro variable the number of the last step:

data labelthese(keep=parameter rename=(parameter=par));
   set cp end=eof;
   if eof then call symputx('_step', step);
   if step eq 9 and (StandardizedEst gt 1 or StandardizedEst lt -1);

This step relies on knowing that the selected model was found in step 9. If you are writing a general purpose program to do this modification, you can process the __outdynam data set that the macro makes below, output the value of the variable _ChosenValue, and then run the preceding step.

The next step processes the data set that was created from the data object:

data cp2;
   set cp;  
   match = 0;
   if step ne &_step then return;
   do i = 1 to ntolable;
      set labelthese point=i nobs=ntolable;
      match + (par = parameter);
   if not match then parameter = ' ';
   if nmiss(rhslabelYvalue) then rhslabelYvalue = StandardizedEst;

This data object is typical of the data objects that are used to make graphs. It has several components of different sizes and missing values elsewhere. The last part of the data set contains the coordinates and strings that are needed to label each profile. The preceding step sets the parameter value to blank in the last step (the one that corresponds to the labels) for all but the terms with the most extreme coefficients. When the Y coordinate for a label is missing (because PROC GLMSELECT suppressed it due to collisions), the Y coordinate value is restored.

The next step provides the macro that contains the code that modifies the graph template:

%macro tweak;
   if index(_infile_, 'datalabel=PARAMETER') then 
      _infile_ = tranwrd(_infile_, 'datalabel', 
                          'markercharacterposition=right markercharacter');
   if index(_infile_, 'curvelabel="Selected Step"') then 
      _infile_ = tranwrd(_infile_, 'curvelabel="Selected Step"', ' ');

It performs two changes. By default, labels are positioned by using the DATALABEL= option in a SCATTERPLOT statement. This step removes that option and instead specifies the MARKERCHARACTER= option. You can use the MARKERCHARACTER= option to position labels precisely at a point. In contrast, the DATALABEL= option moves labels that conflict. The first IF statement also adds the option MARKERCHARACTERPOSITION=RIGHT so that labels are positioned to the right of the coordinates. This change is based on the idea that sometimes it is better for labels to be precisely positioned, even if they collide. You can additionally modify the label coordinates if minimizing collisions is important. The TRANWRD (translate word) function performs the change, substituting a longer string from a shorter string. The second IF statement removes the curve label. You will later add it back in through SG annotation.

The next step creates the SG annotation data set:

data anno;
   length ID $ 3 Function $ 9 Label $ 40;
   retain X1Space Y1Space X2Space Y2Space 'DataPercent' Direction 'In';
   length Anchor $ 10 xc1 xc2 $ 20;
   retain Scale 1e-12 Width 100 WidthUnit 'Data' CornerRadius 0.8 
          TextSize 7 TextWeight 'Bold'
          LineThickness 0.7 DiscreteOffset -0.3 LineColor 'Green';
   ID       = 'lo1';            Function  = 'Text';           
   Anchor   = 'Right';          TextColor = 'Green';        
   x1       = 55;               y1        = 94; 
   Label    = 'Coefficients for the Selected Model';
   Function = 'Line';           x1        = .;     
   X1Space  = 'DataValue';      X2Space   = X1Space;
   xc1      = '9+CrBB';         xc2       = '8+CrRuns*CrRuns'; 
   y1       = 94;               y2        = 94;          
   Function = 'Rectangle';      Y1Space   = 'WallPercent';
   Anchor   = 'BottomLeft';     y1        = 10;
   Height   = 80;               Width     = 0.6;
   ID       = 'lo3';            Width     = 100;              
   Function = 'Text ';          Label     = 'Selected Value';
   X1Space  = 'DataPercent';    Y1Space   = X1Space;
   Anchor   = 'Left';           TextColor = 'Blue';
   x1       = 86;               y1        = 84; 
   Function = 'Arrow';          LineColor = 'Blue';
   X1Space  = 'DataValue';      X2Space   = X1Space;
   xc1      = '9+CrBB';         xc2       = '12+CrHits*CrHits'; 
   y1       = 4;                y2        = 83;
   DiscreteOffset = .1;         x1        = .;     

This step creates a data set with five observations:
   1) the text 'Coefficients for the Selected Model'
   2) a line from the text to the rectangle
   3) a rectangle with rounded corners that surrounds the coefficients for the selected model
   4) the text 'Selected Value'
   5) an arrow pointing from the text to the selected value

This SG annotation data set has many variables and options. More will be said about the SG annotation data set after the graph is displayed. Fully explaining SG annotation is beyond the scope of this blog.

The template processing macro, %ProcAnnoAdv, is next:

%macro procannoadv(data=, template=, anno=anno, document=mydoc, adjust=,
   proc document name=&document;
      ods exclude properties;
      ods output properties=__p(where=(type='Graph'));
      list / levels=all;
   data _null_;
      set __p;
      call execute("proc document name=&document;");
      call execute("ods exclude dynamics;");
      call execute("ods output dynamics=__outdynam;");
      call execute(catx(' ', "obdynam", path, ';'));
   proc template; 
      source &template / file='temp.tmp';
   data _null_;
      infile 'temp.tmp';
      if _n_ = 1 then call execute('proc template;');
      %if &adjust ne %then %do; %&adjust %end;
      call execute(_infile_);
      if &overallanno and _infile_ =:     '   BeginGraph' then bg + 1;
      else if not &overallanno and index(_infile_, '   layout overlay') 
         then lo + 1;
      if bg and index(_infile_, ';') then do;
         bg = 0;
         call execute('annotate;'); 
      if lo and index(_infile_, ';') then do;
         lo = 0;
         lonum + 1;
         call execute(catt('annotate / id="lo', lonum, '";'));
   data _null_;
      set __outdynam(where=(label1 ne '___NOBS___')) end=eof;
      if nmiss(nvalue1) and cvalue1 = '.' then cvalue1 = ' ';
      if _n_ = 1 then do;
         call execute("proc sgrender data=&data");
         if symget('anno') ne ' ' then call execute("sganno=&anno");
         call execute("template=&template;");
         call execute('dynamic');
      if cvalue1 ne ' ' then 
         call execute(catx(' ', label1, '=',
                      ifc(n(nvalue1), cvalue1, quote(trim(cvalue1)))));
      if eof then call execute('; run;');
   proc template; 
      delete &template;

This macro is similar to the %ProcAnno macro that I provided and explained in my previous blog. The macro adds the ANNOTATE statements to the template and calls PROC SGRENDER with the appropriate dynamic variables specified. You can specify a macro name in the ADJUST= argument to insert code into the macro to edit the graph template. In this case, you will add the macro %Tweak. You can set the ANNO= option to blank to prevent PROC SGRENDER from specifying the SGANNO= option. By default, when OVERALLANNO=1, a single ANNOTATE statement is added to the template (as in my previous blog). In this example, OVERALLANNO=0 and an ANNOTATE statement is added to each layout overlay. The following statements are added to the template:

   annotate / id="lo1";
   annotate / id="lo2";
   annotate / id="lo3";

You can use the three IDs in your annotation data set to modify each of the three overlays. In this template, the first layout is unconditionally used and either the second and or third layout is conditionally used. In this example, the first and third layouts are used.

The following step runs the macro and creates the modified graph:

%procannoadv(data=cp2, template=Stat.GLMSELECT.Graphics.CoefficientPanel,
             adjust=tweak, overallanno=0)


This SG annotation data set is large. There are many variables, and varying subsets are used for each annotation. The output shown in the links below list the relevant subsets.

Click to see a subset of observation 1

Observation 1 positions text in the LAYOUT OVERLAY labeled 'lo1'. It specifies coordinates based on the percentage of the data area. The string is anchored on the right, next to the line.

Click to see a subset of observation 2

Observation 2 draws a line in the LAYOUT OVERLAY labeled 'lo1'. The X coordinates are in the space 'DataValue'. Since the X axis variable is a character variable, the variables x1c and x2c are used. When the variables x1 and x2 exist for other observations, they must be set to missing for this observation. The Y coordinates are in the space 'DataPercent', and the variables y1 and y2 provide coordinates. Each pair of X and Y coordinates specifies one end of the line. The discrete offset of -0.3 moves the line 0.3 data units to the left from the coordinates specified in (x1c, y1) and (x2c, y2).

Click to see a subset of observation 3

Observation 3 draws a rounded rectangle in the LAYOUT OVERLAY labeled 'lo1'. There is one X and one Y coordinate. The X coordinate is in the data space and the Y coordinate is in the wall percentage space. The rectangle is anchored in the bottom left (that is where drawing starts), then it is drawn with a height of 80% of the wall and a width of 0.6 times the width of a discrete cell. The discrete offset of -0.3 moves the rectangle 0.3 data units to the left from the coordinates specified in (x1c, y1). The CornerRadius variable controls the degree of rounding. The result is a rounded rectangle centered around the reference line for the selected step.

Click to see a subset of observation 4

Observation 4 positions text in the LAYOUT OVERLAY labeled 'lo3'. Notice that the layout has changed with this observation. The text is anchored on the left, next to the arrow.

Click to see a subset of observation 5

Observation 5 draws an arrow in the LAYOUT OVERLAY labeled 'lo13'. The X coordinates are in the space 'DataValue'. Since the X axis variable is a character variable, the variables x1c and x2c are used. When the variables x1 and x2 exist for other observations, they must be set to missing for this observation. The Y coordinates are in the space 'DataPercent', and the variables y1 and y2 provide coordinates. Each pair of X and Y coordinates specifies one end of the arrow. The discrete offset of 0.1 moves the arrow 0.1 data units to the right from the coordinates specified in (x1c, y1) and (x2c, y2). The Scale variable scales the size of the arrowhead. The Direction variable points the arrow in (toward x1c and y1).

In summary, this example builds on examples in my previous blogs to show you a small part of the flexibility of ODS Graphics. You can modify graph templates, dynamic variables, and you can use SG annotation to customize the graphs that analytical procedures produce. While not shown here, you can also change styles. You can even (cringe!) modify the data object.

Post a Comment

Bar Chart on Interval Axis - SAS 9.40M3

When we first released GTL and SG Procedures back with SAS 9.2, Box Plots and Bar Charts would always treat the category axis as discrete.  We realized soon enough that we need to support box plots on scaled interval axes for many clinical applications, and this was added in SAS 9.3.

Data2The same is now true for Bar Chart.  With SAS 9.40M3, a bar chart can now display data on a scaled interval axis like Linear, Time or Log.  For this article, I created a simple data set of simulated revenues by region and date.  The dates are 01Jan2014, 07Jan2014, 15Jan2014, 01Feb2014, 01Mar2014 and 01Apr2014.  A few observations for the data set are shown on the right.

title 'Revenues by Date';
  proc sgplot data=Sales noborder cycleattrs;
  vbar date / response=sales nostatlabel dataskin=pressed;
  xaxis type=time display=(nolabel noline);
  yaxis grid;

IntervalBarThe graph on the right shows the Revenues by Date and the x-axis TYPE has been set to "Time".  Each bar is now drawn in the correct scaled position along the x-axis displaying the summarized value for Revenues.

One might ask what is the benefit of this over a needle plot which can do something similar.

  • A needle plot does not summarize the data by the category value.
  • The needles are plotted as lines with a thickness of 1 pixel, or as set by user.
  • The bar does a good job of setting the bar width, using the default 85% of the "effective" midpoint spacing, which is determined by the minimum spacing between the values on the x-axis.  This behavior is very similar the the box plot on interval axis and works well for cluster groups.

IntervalStackedBarAnother bar feature that just works in this case is the default stacked and cluster groups.  For the graph on the right, the data is summarized by date and region and plotted as stacked bars on the scaled time axis.

By default (without setting the TYPE option), the behavior remains the same as before, and the bar chart will still force a discrete axis.  There will be no change for your existing programs.

Note the legend at the bottom.  I have used the new SAS 9.40M3 Legend options FILLHEIGHT and the FILLASPECT to get larger skinned color swatches.

IntervalClusterBarFor cluster groups, the cluster width is determined by the minimum distance between the values on the x-axis.  Now the bar widths are correctly sized to fit all the four regions in the "Effective" midpoint spacing.  Since the smallest interval is 7 days on the left side of the graph, the available spacing for each cluster is determined by that.  To improve the clustering effect, I have set clusterwidth=0.75.

Also, I have customized the legend swatches to a thinner and longer shape to match the thinner bars in the graph.  The new options allow you to do such customization.

IntervalClusterBarLineThe interval axis VBAR can also be used in conjunction with the VLINE to create a BarLine graph on a time axis.  Here, the bars show Sales by date for two regions.  The line shows the Target by date for the two regions.  Both statements use GROUPDISPLAY=CLUSTER to the right colored lined join the same colored bars.

Recently there was a question on the communities web site from a user who wanted to plot a bar chart with target values.  The graph above displays the target values using a VLINE plot.  However, a full line overlay may not be desirable, and we may want individual markers over each bar to indicate the target for just that bar.  We can extend the above technique to do something like the graph shown below.  Click on the graph for a detailed view.  I would agree the target markers may need more emphasis.

IntervalClusterBarTargetNormally, one cannot overlay a "basic" plot like scatter on a VBAR statement to display the target values.  You would have to first summarize the data using PROC MEANS, and then use a combination of the VBarParm overlaid with a scatter or other basic plot.

For this graph, we have overlaid the VBAR with a VLINE, and turned on markers and turned off the display of the line by setting the following option.

vline date / response=target group=region  markers lineattrs=(thickness=0);

This will draw the default marker on each bar, and we can change the marker shape to one of the supported shapes.  However, since there is no "Bar" shape available, we have used the SYMBOLCHAR statement to define a custom marker called "Line" using one of the characters from the Unicode font.

symbolchar name=line char='2012'x / voffset=0.08;

Note the use of the VOFFSET=0.08.   If you overplot a regular marker on the line marker defined using SYMBOLCHAR, you will see they do not line up exactly.  This is because in the character glyph defined for value '2012'x, the line is not exactly in the middle of the glyph bounding box.  This can convey the wrong value to the graph consumer.  So, I have used the VOFFSET to shift the line up a bit to ensure it is exactly lined up with a regular marker.  The voffset feature was added just such cases.

Now, you have one more tool in your tool box to create effective graphs.  The interval bar chart should fill a gap and make it easier to create graphs.

Full SAS 9.40M3 Code:  Interval_Bar



Post a Comment