Custom Labels

Over the Christmas Holidays I saw an graph of agricultural exports to Russia in 2013.  The part that caught my eye was the upper part of the graph, showing the breakdown of the trade with Russia as a horizontal stacked bar with custom labels.

TopThe value for each region / country is labeled individually along the top and bottom of the bar for each segment, as shown on the right.  Each label is at a custom location along the bar with some on top, some at the bottom.  Most labels include the name of the region and the amount, but others have the name in the label, but the amount in the bar (European Union).

Russia_LegendMaking this graph as a regular stacked horizontal bar with a legend is very simple and also scalable and extensible to other data.  I used the colors from the graph above, but then added a few other colors to distinguish the segments so they can be identified in the legend.  Click on the graph for a more detailed view.

proc sgplot data=russia noborder nocycleattrs;
  styleattrs datacolors=(%rgbhex(207, 49, 36) %rgbhex(225, 100, 50) 
             gold yellow lightgreen);
  hbarparm  category=cat response=value / group=label groupdisplay=stack 
            outlineattrs=(color=lightgray) 
            baselineattrs=(thickness=0) barwidth=0.5 grouporder=data;
  keylegend / title='' noborder location=inside position=top;
  yaxis display=none  colorbands=odd offsetmin=0.3;
  xaxis display=none;
run;

The main reason the original graph is interesting is the attempt to "move"  the  legend entries closer to the bar itself.   The benefit of this  is that the values can be read directly and easily and the graph is easier to decode.  In the legend case, one has to move the eye between the legend and the graph.  First, identify the color of the segment in the bar, then find its value from the legend.  Also, the small green segment for Australia could be missed.

Direct labeling is often useful for decoding a graph, especially where the graph is not too complicated.  But, direct labeling in this case also requires custom code, either annotation or something else.  So, there is a balance to be achieved between the two.

Russia_Labels_4Since I try to avoid annotation as much as possible, first I tried to create this graph using other means with SAS 9.4M2.  Here is what I was able to do with some coding.  My goal is to break up the legend and move each individual value closer to the bar segment itself.  I kept the color swatches to avoid the need the call-out line to each bar segment.

Clearly, the coding is more elaborate, as I have to place each color marker and the text close to where it needs to go, switching between above and below the bar as shown in the code below.  Some appearance options are trimmed to fit.  See full code in the link below.

proc sgplot data=russia_labels noborder noautolegend nocycleattrs;
  styleattrs datacolors=(%rgbhex(207, 49, 36) %rgbhex(225, 100, 50) 
             gold yellow lightgreen)
             datacontrastcolors=(%rgbhex(207, 49, 36) %rgbhex(225, 100, 50) 
             gold yellow lightgreen)
             datasymbols=(squarefilled);
  hbarparm  category=cat response=value / group=label groupdisplay=stack 
            baselineattrs=(thickness=0) barwidth=0.5 grouporder=data;
 
  scatter x=xlbl1 y=cat / discreteoffset=-0.35 group=label;
  text x=tlbl1 y=cat text=label / discreteoffset=-0.35 position=left 
       contributeoffsets=none splitpolicy=splitalways splitchar='=';
 
  scatter x=xlbl2 y=cat / discreteoffset= 0.35 group=label;
  text x=tlbl2 y=cat text=label / discreteoffset= 0.35 position=left 
       contributeoffsets=none splitpolicy=splitalways splitchar='=';
 
  yaxis display=none  colorbands=odd;
  xaxis display=none;
run;

Note, the code is longer because there are 2 pairs of scatter and text plot statements, one for the labels along the top and one for those at the bottom, because of the different values of DiscreteOffset.  The positions for the markers and the text are computed for each value in the code.  Now, each label and value are effectively moved close to the segment, making the graph easier to decode.

In this exercise, I have used the new TEXT plot statement added with SAS 9.4M2.  This statement is customized to draw text strings in the graph, and has many features for handling text.  We did not want to overload the scatter plot (with MarkerChar).  Going forward, you would be better off using the TEXT plot in place of cases where you used MarkerChar.  For earlier releases, you could use the scatter with MarkerChar or DataLabel to do something similar. This exercise is left to the motivated reader.

Alternatively, one could exactly duplicate the original graph by using SG Annotate to do the labeling, including the call out lines from the text to the segment.  In both cases, the code is heavily customized, and not easily scalable to other data.

I have presented my opinion on the pros and cons of each method.   I would love to hear your opinion too.

SAS 9.4M2 Code:  Russia_3

 

Post a Comment

Scatter Plot with Stacked Histograms

scatter_and_hist_borderLast week a user expressed the need to create a graph like the one shown on the right using SAS.   This seems eminently doable using GTL and I thought I would undertake making this graph using SAS 9.3.

The source data required to create this graph is only the X-Y information in the scatter plot.   Not having access to the original data in this graph, I simulated some data using random functions in three DO loop, one each for the three groups, in a DATA STEP.   The groups are 'A', 'B' and 'C', in place of the values like 'Center = 0.29' and so on.  See the full program in the link at the bottom.

The graph on the right can be constructed as a LATTICE of four cells with the following contents.

  • The cell on the bottom left is a regular X-Y scatter plot by group.
  • The cell at the top left is a stacked vertical histogram of counts for the x-bins by group.
  • The cell at the bottom right a stacked horizontal histogram of counts for the y-bins by group.
  • The cell at the top right contains the legend.

SAS 9.3 SGPLOT or GTL does not have a statement to draw a stacked histogram by group.  So, we have to find another way to do this.   We will us the HighLowPlot plot statement, which shows the group segments where we place them, and also supports a numeric x axis.  We now have to build the data set appropriate for the plot.

The good new is that we can leverage the SGPLOT Histogram statement to generate the bins and counts we need for X and BY=group as follows:

ods _all_ close;
ods output sgplot=xa;
proc sgplot data=scatter(where=(x le 5));
  by grp;
  histogram x / scale=count binstart=0 binwidth=0.25; 
  run;

xBinsThis program will bin the data by X, with BinStart and BinWidth set as needed.  The output is written the the 'XA' data set.  The SGPLOT generates the required bins and count columns using variable names that are based on the original variables.  You can turned off all destinations, so no graph is actually created but the data set is written out.  You can view the data set to find these new variables.

After this step I cleaned up this data set to create a data set of the xBins and the Counts by Group.  A snippet of the data set is shown on the right.

data xBins;
  set xa(where=(Bin_X_Scale_count_Binstart_0___Y ne .));
  drop x;
  rename Bin_X_Scale_count_Binstart_0___Y=count
         Bin_X_Scale_count_Binstart_0___X=xBin;
run;
proc sort data=xBins out=xBinsByBin;
  by xBin;
run;

xBinsHighLowNow we have the bins and the counts by group.  We need to stack the values so we can use the HighLowPlot to draw the stacked bins.  The data step shown below does just that, but creating the Low and High values for each group in a bin as stacked on the previous value.

The final data set is shown on the right.  We can plot it using the HighLow plot statement in SGPLOT to create just the horizontal stacked Histogram, to see if we have the right data.  I will save that step for later.

data HighLowX;
  drop count;
  retain Low High;
  set xBinsbyBin;
  by xBin;
  if first.xBin then Low=0;
  High=Low+Count; output;
  Low=High;
run;

We go through the same steps above for creating the binned data for the Y axis.  then, I merge the original X-Y data with the X and Y bin data sets to get the final data set ready for plotting.   I can plot each graph separately form this merged data set to ensure everything is working correctly.  The xBin, Low and High values are in a block of the data where other columns are missing, and so on.  Here is the graph for just the horizontal stacked histogram.

HighLow_X

The next step is to create a GTL template with a 2x2 layout of cells and common uniform axes.  See the program link at the bottom for the full code.  Here is the layout of the template.

proc template;
  define statgraph Scatter_Layout;
    begingraph;
      entrytitle 'Distribution by Group';
      /*--Outermost Lattice Container--*/
      layout lattice / rows=2 columns=2 rowweights=(0.3 0.7) columnweights=(0.7 0.3)
                       columndatarange=union rowdatarange=union
                       rowgutter=5 columngutter=5;
	/*--Common Row axes--*/
        rowaxes;
	  rowaxis / offsetmin=0 display=(ticks tickvalues) griddisplay=on;
	  rowaxis / label='Mean of Full Rho' griddisplay=on 
                    linearopts=(tickvaluesequence=(start=0 increment=0.5 end=3.5));
	endrowaxes;
	/*--Common Column axes--*/
        columnaxes;
	  columnaxis / label='Ratio of Full Rho' griddisplay=on);
	  columnaxis / offsetmin=0 display=(ticks tickvalues) griddisplay=on);
	endcolumnaxes;
 
	/*--Upper Left cell with Stacked X Bins counts by group--*/
        layout overlay;
          highlowplot x=xBin low=low high=high / group=grp type=bar;
	endlayout;
	/*--Upper Right cell with Legend--*/
        layout overlay;
          discretelegend 'a';
	endlayout;
	/*--Lower Left cell with SX-Y Scatter Plot--*/
        layout overlay;
          scatterplot x=x y=y / group=grp markerattrs=(symbol=circlefilled size=5) 
                      name='a';
	endlayout;
	/*--Lower Right cell with Stacked Y Bins counts by group--*/
        layout overlay;
          highlowplot y=yBin low=low high=high / group=grp type=bar;
	endlayout;
      endlayout;
    endgraph;
  end;
run;

Here is the Graph.  You can adjust the font sizing for the axes if needed.  Click on graph for a high resolution image.  Note, we are using common external Row and Column axes since these are uniform and should not be repeated.

Scatter_Layout

Full SAS 9.3 code:  Scatter_Layout

Post a Comment

Dual Response Axis Graphs

Often we need graphs that display two or more responses by the same category values.  In many cases it is useful to plot both responses on the same response (Y) axis.  This can be helpful to understand the data and compare the magnitudes side by side.  This works when the scales of both the response values are comparable and consistent.

ElectricPlot_SGHowever, the scales for the two responses may not be similar or consistent.  One common use case is when we are visualizing the actual and % changes for some categories as shown in the graph on the right.

For this example, I have run the MEANS procedure to compute the revenues by year for all the customers by year, and selected only the "Residential" customer for the graph.  I have also computed the change in the values for subsequent years from the first year (1994).

In the graph above, I have  plotted the actual revenues in Billions of $ for Residential customers as a bar chart on the default Y (left) axis.  The "Change" values with a PERCENT format are plotted as a Series plot on the Y2 (right) axis .  I have colored the Y axis ticks and label using a color consistent with the bars and the Y2 axis ticks and label using color consistent with the line.  This graph displays all the data correctly, in a way that is easy to comprehend.  Note:  I am actually using HIGHLOW instead of VBAR as it allows me to use a linear axis.

title 'Revenues and Growth over Time for Residential Customer';
proc sgplot data=ElecRevChange(where=(customer='Residential'));
  styleattrs datacolors=(orange orange) datacontrastcolors=(cx8f3f00 darkgreen);
  highlow x=year low=zero high=revenue / name='a' legendlabel='Revenue' type=bar 
          nooutline fillattrs=graphdata1 dataskin=pressed;
  series x=year y=change /  name='b' lineattrs=graphdata2(thickness=5) y2axis;
  xaxis integer display=(nolabel);
  yaxis offsetmin=0 min=0 valueattrs=graphdata1 labelattrs=graphdata1 grid;
  y2axis offsetmin=0 min=0 values=(0 .30 .60 .90 1.20 1.50) valueattrs=graphdata2 
         labelattrs=graphdata2;
  keylegend / linelength=20px;
run;

DataAs we can see in the data table on the right, while the "Change" values are shown with a % format, the values themselves are fractional between 1.0 - 2.0.   The Percent format converts the fractional values into a % number.  So, mixing values with Percent and non-Percent format on the same axis can result in a bad graph.

The axis format is determined by the "Primary" plot, usually the first plot in the list.  In this case, the revenues are plotted first using a bars on the default Y axis.  So, the default format for the Y axis comes from the bar.  If the series plot is also plotted on the same axis, those fractional values will be displayed with a non-percent format, and will not be visible in comparison with the revenue values as shown in the graph below on the right.

ElectricPlot_SGIn the graph on the right, the green line showing change is way down near the baseline.  This is because the response values are all fractional numbers between 1-2, and are plotted on the same axis as the revenues with an axis range of 100.

Things get even worse if the plot with the % format is primary, causing the axis format to be %.  Plotting data having a n0n-percent format on the same axis,will cause those values to be scaled by 100.

proc sgplot data=ElecRevChange(where=(customer='Residential'));
styleattrs datacolors=(orange orange) datacontrastcolors=(cx8f3f00 darkgreen);
  highlow x=year low=zero high=revenue / name='a' legendlabel='Revenue' type=bar 
          nooutline fillattrs=graphdata1 dataskin=pressed;
  series x=year y=change /  name='b' lineattrs=graphdata2(thickness=5);
  xaxis integer display=(nolabel);
  yaxis offsetmin=0 min=0 valueattrs=graphdata1 labelattrs=graphdata1 grid;
  keylegend / linelength=20px;
run;

In such cases, it is best to use a graph with two independent response axes, as shown in the graph at the top of this article.  Now, each axis has data with consistent formats, and life is good.  Note, each axis has its own data range.  In order to have nice grid lines, one has to ensure each axis has equal number of ticks so the grid lines from one axis can work for both.  Else, you will have two sets of grid lines.

ElectricPanel_GTLSo far so good.  But now let us take the next step.  We want to plot the graph for all customers, Commercial, Industrial and Residential in a panel.  We still want to see both revenues and change as a panel shown on the right.

One would think this would be a simple matter of changing from using a SGPLOT to SGPANEL, using "customer" as the panel variable.  In general, you would be right, except here we have crossed the 80-20 feature balance between SG and GTL. Supporting dual response axes for SGPANEL is a much harder task, and something not frequently requested by users. So, what do we do, and how did make the graph on the right?

Well, here is where we have to step out of the comfort zone of SG Procedures and move into the domain of GTL.  Clearly, all of SG features are implemented using GTL programs behind the scenes.  SGPANEL uses the GTL LAYOUT DATAPANEL and LAYOUT DATALATTICE to create the panels.  GTL does support dual response (and category) axes for panels.  So, now I have used the Layout DataPanel container in GTL, along with the BarChart and SeriesPLot statements.  The relevant part of the code is shown below, stripping all the options.  As you can see, it is not so hard to follow.  Full code is included in the attached program.

layout datapanel classvars=(customer) / rows=1 headerlabeldisplay=value 
  layout prototype / cycleattrs=true;
    highlowplot x=year low=zero high=revenue / name='a' legendlabel='Revenue' type=bar;
    seriesplot x=year y=change /  name='b' lineattrs=graphdata2(thickness=5) yaxis=y2;
  endlayout;
endlayout;

Dual Axis Graphs:   DualAxis

 

 

Post a Comment

Graph Table

Table7A common scenario is where we have a table of multiple measures over time. Here we have a simple example of Frequency and Response by Day.  The Response is a linear function of the Frequency, as shown in the table on the left below.

The  shape of the data is not easily seen in the table alone.  Here is where we can benefit from a visual display of the same data as shown in the graph below on the right.

 

graphThe shape of the data is clearly visible in the graph of Frequency by Day.  The Frequency values are also displayed as bar labels.  The Response values will have the same shape, but displaying more than one bar value will add clutter to the graph.

Here is where a "Graph Table" comes very handy.  Instead of displaying a separate graph, I can display all the data columns and also add a display of the shape of the data all in one display.

With SAS 9.4, the YAxisTable can be used to easily create such a graph.  The code is shown below:

proc sgplot data=data nowall noborder;
  hbar day / response=Frequency filltype=gradient
       fillattrs=graphdata2 nostatlabel
       baselineattrs=(thickness=0);
  yaxistable Day Frequency Response / location=inside
       position=left nostatlabel;
  yaxis display=none;
  xaxis display=none grid offsetmin=0.05;
  run;

graphTableThe Graph Table display is shown on the right.  Note, all the columns from the table are included, and a HBar is added to display the shape of the response columns.  In this case since the shape of both the columns is the same, I have left out the x axis information.

An additional benefit of this display is that it is scalable.  As the data set gets longer, using a traditional VBAR chart can get cumbersome.  The x axis will need to get longer till the graph will no longer fit a traditional report.  A HBAR however, can grow vertically with the data to as much height as you may want.

For the graph below on the right, I have tied the height of the graph to the number of observations in the data set.  So the graph grows with the table.  See the full program attached at the bottom.

graphTableBigIf the data set gets too long to fit on a page of a document, you can split the graph into smaller sections to fit one on each page.  This can be done by adding a classifier column with page values '1', '2' and so on for every N observations and then use the "BY" statement to produce graphs with a fixed number of observations per page.  The graph axis is automatically scaled uniformly across all pages.  This extension is left to the reader.

A popular use case of the Graph Table is the Forest plot.  Here, you have multiple observations, one per study, with multiple columns of data and an odds ratio graph.  The YAxisTable or its GTL sibling - the AxisTable makes creating such graphs very easy.

SAS 9.4 Code for Graph Tables:  GraphTable

Post a Comment

Fun with Bar Charts

Salary_2As Sheldon Cooper would say, this is the first episode of "Fun with Charts".  I did not find a cool term like "Vexillology" and "Cartography" is taken by map making, so let us go with "Chartology".

Yesterday, I saw a couple of interesting bar charts as shown on the right.  I thought this may provide for some fun creating different appearances with Bar Charts using SAS 9.4M2.

The first graph uses a color gradient from a lighter color at the top to a darker shade at the bottom, though the one in the middle seems to have a reddish tinge.

The second graph on the right uses an interesting way to label the categories.  Let us see what we can do with the new features added to SGPLOT procedure with SAS 9.4M3.

Salary_4The SGPLOT procedure supports new features including the ability to have a gradient fill for bars and histogram bins.  We will use this feature to create these charts.  The VBar statement can be overlaid with itself as long as the category and group classifications are the same.  We can use this feature to create the first chart.

The second graph needs a little more creative construction, using a mixture of plots.  The VBarParm statement can be freely combined with any other basic plot to create more interesting combinations.  We will use that to create the second graph.

Law_GradientThe Bar and Histograms now support a FillType=Gradient option, that fills the bar with the bar color that is fully opaque (or the specified fill transparency) at the top, and gradiates to transparent at the bottom.  Some of the background (wall) color shows through, including anything behind the bar, such as grid lines.

For the graph on the right, I have used a VBAR with groups set to the same as the category to get the bars colored by the category.  I have set FillType=Gradient which results in the nice gradient from saturated color at the top to the transparent at the bottom.  The code is shown below.  Click on the graph for a higher resolution image.

SAS 9.4M3  SGPLOT Code:

proc sgplot data=Law_Salaries nowall noborder noautolegend;
  styleattrs datacolors=(red gold green) datacontrastcolors=(black);
  vbar profession / response=salary group=profession groupdisplay=cluster 
                    filltype=gradient baselineattrs=(thickness=0) 
                    datalabel datalabelattrs=(size=14);
  xaxis discreteorder=data display=(noticks nolabel noline);
  yaxis display=(noticks nolabel noline) grid;
  run;

Law_White_GradientNote in the program above, we have set a few wall, border and axis options to get this appearance. GroupDisplay is set to "Cluster" to get the data labels on each bar.  Also note the grid lines are visible through the bar towards the bottom as the transparency increases.  If this is undesirable (as in this case), we can address this by placing a VBar behind it with opaque white bars as shown in the graph on the right.

Only thing remaining to do now is to change the gradient to transition to black instead of white at the bottom.  The way to do this is probably obvious to you.  Instead of an opaque white backing VBar, use black.

Law_Black_Gradient_MatteHere is the final result, with bars gradiating to black and using the "Matte" skin.  Except for the slightly reddish tinge at the bottom of the 2nd bar, this is pretty close, I think.

Now, let us turn to the second example.  This graph is relatively straightforward, except for the interesting way the category values are labeled along the side of each cluster.  Also, the bar values are inside the bar at the top end, rotated vertically.

Pys_Salaries_1We will start by creating a basic cluster grouped bar chart using the VBarParm statement as shown on the right.  We used VBarParm because we know we will need to use other statements to do the special effects, and VBarParm allows us to layer it with other plots.

The graph on the right really gets the information across just fine, with the nice split tick values on the x axis.  This part is all done by the single VBarParm.  We have overlaid on each bar a new TEXT statement to display the bar values at the top of the bar, but inside the bar.  The values are rotated vertically, and aligned such that the right edge of the text is along the top of the bar.  Note the use of "Backlighting" to ensure the text is visible on any background color.

To add the unique category labeling, we use a second text plot.  We offset the bars to the right by using DiscreteOffset=0.2, and ClusterWidth=0.6.  This places the three bars offset to the right.  Then, we overlay a TEXT statement with the following settings:

text x=profession y=zero text=profession /  rotate=90 position=right 
     textattrs=(size=12 color=darkblue) contributeoffsets=none     
     discreteoffset=-0.2;

Pys_Salaries_3The resulting graph is shown on the right.  In the syntax for the TEXT plot, note the three required parameters, X, Y and Text.  The text from the column is placed at the (x, y) location in the graph.  The plot is offset to the left by 20%, the text is rotated 90 degrees with position of right (meaning, text is on right of the location prior to rotation).  The text color is set to dark blue just to add some interest.

The overlaid bar labels use black text, again with backlight.  Back light works by default, and darkens or lightens the background based on the text color so that the text is clearly visible.

Over the years we have found ourselves using the Scatter with MarkerChar to insert textual information in a graph.  So, it is about time text gets its own statement with options to customize the text.

Note.  Everything is done using plot statements.  No annotation is required.

Full SAS 9.4M3 SGPLOT code:  BarCharts

Post a Comment

Axis Break Appearance Macro

Bar_NoBreak3Often, we have data where most of the observations are clustered within a narrow range, with a few outliers positioned far away.  When all the data is plotted, the axis is scaled to accommodate all the data, thus skewing the scale.  Techniques to handle such data have been addressed earlier in the article Broken Y Axis and Using Log Axes.

Users have previously voiced the need to support axis breaks in the procedures.  This feature can get complicated very quickly, so our plan was to start with the simpler case, and then build based on your feedback.

Bar_Break3Support for axis ranges on one axis at a time is included in SAS 9.4M1.  You can specify one or more breaks by providing the data range(s) that are to be retained.  In the graph on the right, most of my data is between 2 and 3 on the x axis, with one outlier at x > 10.  I use the RANGES option on the x axis to retain the data ranges 1-3.5 and 9.75-11.

SAS 9.4 SGPLOT code:

proc sgplot data=break noautolegend;
  highlow y=y low=zero high=x / group=y lineattrs=(thickness=3);
  xaxis ranges=(1 - 3.5 9.75-11) integer;
  yaxis min=0 max=4;
run;

As you can see in the graph above, only the ranges specified in the RANGES option are displayed on the x axis.  An attempt is made to keep the tick value increments the same in the displayed regions.  A full height break indicator is displayed across the entire height of the data area.  Such breaks are useful when using plots like bars, needles or series.  If the full break was not shown, it would not be obvious at first glance that the blue needle is broken.

X_AxisBreak_Bracket_Wall_ListingHowever, in many cases, such a full height break indicator is not desirable.  When using scatter plots, users have expressed the need for axis break symbols on the axis only, without the display of the full break indicator.  Such an axis break is shown in the graph on the right.  One has to look carefully to see the "Bracket" break indicator shown on the x axis between "3" and "10". Click on the graph for a higher resolution image.

I created the above graph using the SGPLOT procedure, so how did I get this appearance?  Well, the good news is that the procedure does all the hard work needed to draw only the necessary ranges, etc. and position the data correctly  Now, all we have to do is replace the full axis break indicator with the axis break symbol.  This task can be done using annotate.  Since I know exactly the data extent of the break as provided by me in the RANGES option.  I can use this information to erase the break indicator, and draw my own symbol on the axis.

The idea is simple.  Use the POLYGON function to erase the full break using the same color as the wall or background.  I go from the upper edge of the lower range and the lower edge of the upper range.  Each coordinate is correctly transformed to the right location by the procedure.  Then make the polygon the full height of the graph data area.  Using a RECTANGLE function will not work, as we do not know the pixel width of the break.  Note in the attached program, I adjusted the values a bit to allow for the curvy line.  Then, I draw the axis break myself between the Low and High values of the ranges.

I converted the code into a macro to erase the full break and draw axis break symbol for the case of one break:

%AxisBreak (Axis=, Low=, High=, DataOut=, Type=, Back=, Aspect);

Y_AxisBreak_Z_Analysis_WallAxis is X or Y, Low and High are the data values for the break region.  So, in my case for the example above, Low=3.5, High=9.75.  DataOut is the name of the annotation data set generated, Type is the break type.  Back indicated whether or not you include the wall in the display, and Aspect is the aspect of the graph.

The macro generates the necessary annotation data set that erases the full break, and replaces it with a simple axis break of type Bracket or Z.  The graph on the right uses a "Z" break symbol on the Y axis.  Note, the data range of the axis cover -ive and +ive values.

SGPLOT code with Macro:

%AxisBreak (Axis=X, low=3.5, high=9.75, dataout=anno, back=Wall type=Bracket);
proc sgplot data=break noborder sganno=anno;
  scatter x=x y=y;
  xaxis ranges=(1 - 3.5 9.75 - 11) integer;
  yaxis min=0 max=4;
run;

Due to the way axis breaks are implemented in the code, only break symbols of type Bracket and Z can be drawn reliable using this technique.  But at least you now have a way to display simple axis break symbols, instead of the full length or width break indicator.  We plan to include simple axis break symbols in the next release as requested by you.  So, keep your ideas coming.  Till then, you can use the ideas used in this macro.

I have tested the macro for a few different cases with different styles, with or without wall, different dpi, different graph sizes and data ranges.  It seems to handle most cases of one break on one axis, but I have not tested for presence of required variables, etc. or bad data.  It is provided just as a tool.  I am sure the idea can be extended to multiple breaks on one axis if you have such a case.  I'll leave that exercise to the reader.

SAS Code:  Axis_Break_Poly_Macro

Post a Comment

Clinical Graphs

This week I had the opportunity to present a 1/2 day seminar on creating clinical graphs using the SG procedures during an In-House SAS Users' group meeting.  I have presented this seminar quite a few times now, and I always learn something.

The audience was very receptive, with some people familiar with SAS/GRAPH, and others having some knowledge of SG procedures and GTL.  The seminar focused on SG procedures. Often for such seminars, I like to get an idea in advance on the type of graphs the users need to make on a regular basis.  The list of graphs that  were of interest included Kaplan-Meier curves and Forest Plots.

TumorSizeA couple of specific plots mentioned were the Waterfall graph for "Change in Tumor Size by Treatment" and a graph for "Incidence of Injection Site Reaction".

The Waterfall graph displays the range of change in tumor size in the study by treatment.  The (simulated) data consists of the change in tumor size for subjects in a study, displayed in order of increasing reduction grouped by treatment.  Reference lines are drawn at RECIST threshold of -30% and at 20%.

TumorSizeSkinAlternative appearances can be seen on the Web, including grouping by other criteria.  SGPLOT supports skins to alter the visual appearance of the graphs as shown on the right.  Click on the graph for a higher resolution image.

For more information on graphs for Oncology research, see "Plotting Against Cancer: Creating Oncology Graphs using SAS" by Debpriya Sarkar.

SGPLOT code:

title 'Change in Tumor Size';
title2 'ITT Population';
proc sgplot data=TumorSize nowall noborder;
  styleattrs datacolors=(cxbf0000 cx4f4f4f) datacontrastcolors=(black);
  vbar cid / response=change group=group categoryorder=respdesc datalabel=label
           datalabelattrs=(size=5 weight=bold) groupdisplay=cluster clusterwidth=1;
  refline 20 -30 / lineattrs=(pattern=shortdash);
  xaxis display=none;
  yaxis values=(60 to -100 by -20);
  inset ("C="="CR" "R="="PR" "S="="SD" "P="="PD" "N="="NE") / title='BCR' 
        position=bottomleft border textattrs=(size=6 weight=bold);
  keylegend / title='' location=inside position=topright across=1 border;
run;

Note:  GroupDisplay=Cluster is used to be able to display the bar label on top of each bar. So the ClusterWidth option is used to modify the width of the single bar in the cluster.

I have used style colors in the code to set group colors.  However, one can also use the Discrete Attributes Map to ensure the consistent assignment of colors by groups as shown in the paper mentioned above.

Incidence2The other graph of interest was the "Incidence of Injection-Site Reaction by Time and Cohort", as shown in the graph on the right.  The (simulated) data shows the incidence of reaction by time and cohort using a cluster grouped bar chart.

In this case, user wanted bar fill colors and fill patterns.  Fill patterns can be useful when displaying the graph in a gray scale medium.  SGPLOT supports usage of fill patterns for bars, which is enabled by setting the display option.  The easiest way to do this is to set this option in the style.  The Journal3 style shipped with SAS is designed to display both fill colors and fill patterns.  So, I just used that style, and changed the colors to the ones I wanted using the StyleAttrs statement.  You can also do this by deriving a custom style.

SGPLOT Code:

ods listing style=journal3;
proc sgplot data=Incidence nowall noborder;
  styleattrs datacolors=(gray pink lightgreen lightblue) datacontrastcolors=(black);
  vbar time / response=incidence group=group groupdisplay=cluster;
  xaxis discreteorder=data;
  yaxis offsetmax=0.2;
  keylegend / title='' location=inside position=top across=2 border;
run;

Full SAS code:  Clinical_Graphs

Post a Comment

Axis Thresholds

Have you ever wondered why sometimes a SGPLOT or GTL graph has markers drawn beyond the extreme tick and value on an axis and sometimes not?  And, if you prefer your graphs to always have tick values on the axis that cover the whole range of data, how can you do that?

sasgraph3Let us look under the covers a bit to see what is going on and why.  First of all, the above behavior is intentional and referred to as "Thresholding".  It has a specific purpose as displayed in the graph on the right. Here I have generated some data where x is between 0.9 and 4.1 and made this graph using the SAS/GRAPH GPLOT procedure with default axis settings.

Note, GPLOT has used 6 "nice", round number tick values of 0-5 on the x axis to include the entire data range on the axis.  Since the data are only between 0.9 and 4.1,  the plot region to the left and right of the data is not utilized.  In this case, almost 40% of the horizontal space is not used.

WithThreshold2The graph on the right plots the same data using the SGPLOT procedure which uses the full available width of the graph, thus using the space efficiently.  This is the result of the default thresholding heuristics used by SGPLOT and GTL.  SGPLOT also starts out wanting to use 0-5 ticks, but the "0" and "5" ticks are deemed to be unnecessary and dropped, displaying only values 1-4.  Some of the observations are drawn outside the ticks, but the axis range itself covers the full range of the data.

Whether or not to display the outermost ticks and values is determined by the axis threshold on each side independently.  The threshold value can be between 0.0 and 1.0, with a default of 0.3.  This means that if the outermost data value on one side of the axis is more than 30% of the midpoint spacing away from a possible outer tick, then the outer tick is dropped.

DefaultThresholdThe graph on the right displays Diastolic x Cholesterol for all subjects with an AgeAtStart > 60.  The extreme values are labeled showing the Diastolic values in blue and the Cholesterol values in red as indicated in the legend.

Note, on the x-axis, the midpoint spacing is 20.  The extreme right marker has a value of 313.  This is 100*(1-13/20)=35%  away from a potential outer tick at '320'.  Since this is > 30%, the outer tick is dropped.  So, using default threshold of 30%, only upto 30% of the midpoint spacing will be unused at a max.  For Diastolic on the y axis, the upper extreme observation has a value of 115, which is 100*(1-15/20)=25% away from the outer tick of '120', so that tick is retained.

XThresholdMaxSo, what can you do if you always want to see the outer ticks?  The answer is simple - set the ThresholdMin or ThresholdMax options on the axis.  Setting Thresholdmax=1 will ensure that the outer tick will always be shown on the maximum side of the axis as shown in the graph on the right  Now, the outer tick of "320" is displayed on the x-axis.  The code snippet is shown below.  See the attached file for the full code.

SGPLOT code:

proc sgplot data=heart nocycleattrs noautolegend;
scatter x=cholesterol y=diastolic / datalabel=clabel datalabelattrs=graphdata2;
scatter x=cholesterol y=diastolic / datalabel=dlabel datalabelattrs=graphdata1;
xaxis grid thresholdmax=1;
yaxis grid;
run;

YThresholdMaxOn the other hand, setting ThresholdMin or max to '0' will force the outer ticks to be always dropped.  For the graph on the right, I have set the ThresholdMax=0 on the y axis along with the ThresholdMax=1 on the x axis.  Now, the x axis always has outer ticks, and the y axis never has the outer ticks.

Full Program: Threshold

Post a Comment

Bar with Statistics

One of the key benefits of using a horizontal bar chart is the ability to display statistics for each bar.  This is a popular feature for the HBAR statement with the SAS/GRAPH GCHART procedure.  So, let us review the options available to us to create such graphs using SGPLOT.

BarLabelThe simplest case is to display the frequency of each bar on the right hand side as shown in the graph on the right.  Here we have used the SGPLOT HBAR statement with the DataLabel option with Position=right.

I have also used the NoWall, NoBorder options and suppressed axis lines and baseline to get this popular view.  Note, the stat values are not colored by group.  Click on the graph for a higher resolution image.

proc sgplot data=cars nowall noborder;
hbar type / group=origin groupdisplay=cluster dataskin=pressed
baselineattrs=(thickness=0) datalabel datalabelpos=right;
yaxis display=(nolabel noline noticks);
xaxis display=(noline noticks) grid;
run;

MPG3With SAS 9.4, you have the option to include any statistics with a HBAR plot using the YAxisTable statement.  We can use this statement to display other statistics as shown on the right.

In this example, I have included the Mean City and Highway mileage along with the frequency counts.  Note, the frequency count values are now color coded by group.  All values are displayed right justified in the column by default.

proc sgplot data=cars nowall noborder;
label mpg_city='Mean City Mileage' mpg_highway='Mean Highway Mileage' n='Count';
format mpg_city mpg_highway 4.1;
hbar type / group=origin groupdisplay=cluster stat=pct dataskin=pressed 
     baselineattrs=(thickness=0);
yaxistable n / stat=sum classdisplay=cluster colorgroup=origin 
     valueattrs=(size=6 weight=bold) nostatlabel;
yaxistable mpg_city mpg_highway/ stat=mean classdisplay=cluster colorgroup=origin 
     valueattrs=(size=6 weight=bold);
yaxis display=(nolabel noline noticks);
xaxis display=(noline noticks) grid;
run;

In the graph and code above, I have used one YAxisTable to display the frequency values by using an additional variable called "N" with Freq=Sum.   This variable contains only "1" for each observation so we get the sum of the counts in this column.  You can also use any other numeric variable with Stat=Freq, and set the variable label appropriately.

Using the YAxisTable instead of the DataLabel option as in the first graph allows us to color each observation by group.  Then, I have used a second YAxisTable with mpg_city and mpg_highway as the variables with Stat=Mean to display the mean mileage values also colored by group.

BandsFor the graph on the right, I have used ValueHAlign=center to display each value in the center of the column using a 4.1 format.  I have set the labels for the variables to indicate the statistic used for each label.  I have also used faint horizontal bands for each category to help the eye across the graph.

Statistics can be displayed "Inside" or "Outside" the graph area, which is more apparent if graph borders are used.  Additional statistics can be displayed by adding more variables to the YAxisTable statement, or using another YAxisTable statement to display values on the left of the bars.

Full Program:  BarStats_SG_94

Post a Comment

Likert Graphs

Just this morning I received a request for a brief survey from Apple on my feedback about the new iPhone6+.  Yes, I finally got one, dead last in the family.  The survey followed the usual format, with a number of questions on what I like or dislike about it, with a 5 level scale for my response - Strongly Agree, Agree, Neutral, Disagree and Strongly Disagree.

Coincidentally, I recently also received an article from a co-worker on making Likert Graphs using R.  So, my curiosity was stirred, and I proceeded to dig into it.  Turns out these graphs are frequently used to evaluate the response data for such surveys and I was curious to see how far I could get using SAS 9.3 SG procedures.

Likert_4_DataI proceeded to make up some survey data based on the sample I saw in the article, which was a survey on books in 3 countries with 10 questions or statements.  The answers are summarized into the 4 or 5 groups.  Here I have used 4 groups.  Later I will show an example with 5 groups.

The data looks like the table on the right, each question has a QID for our convenience.  The question itself is also in the data but I did not include it here to keep the table relatively narrow.  The statement is like "Reading is one of my favorite hobbies".

Likert_Panel_4My data is sorted by the Qid, Country and Group.  I can process the data and compute Low and High values for each group, starting with zero.  You can see the data step in the full program attached below.

First, I want to use SAS 9.3 features to create the graph using the SGPanel procedure.  I used the PANEL layout, with the Question as the class variable.  Each question is displayed in the cell header, with a HighLow plot in each cell showing the summarized percent values for each response group by country.

SAS 9.3 SGPANEL procedure syntax:

ods listing style=styles.likert;
title 'Survey Responses to Questions by Country';
proc sgpanel data=Likert_4 ;
  panelby question / layout=panel columns=1 onepanel novarname noborder nowall;
  highlow y=country low=low high=high / group=group type=bar nooutline   
          lowlabel=sumdisagree highlabel=sumagree;
  rowaxis display=(nolabel noticks) fitpolicy=none;
  colaxis display=(nolabel noticks novalues);
  keylegend / noborder;
run;

As you can see, the basic graph is very easy to create using the SGPANEL procedure syntax shown above.  Note, I have used the MODSTYLE macro to derive a new style with the colors I want to use and specified it on the ODS Listing statement.

Also note in the graph and code above, I have used the LowLabel and HighLabel options of the HighLow plot to display the cumulative % of the disagree and agree values at each end.  The SAS 9.3 SGPANEL procedure does not provide an easy way to turn off the cell and header borders.  So, I have derived a style from the Likert style to turn off borders and axis lines.

Likert_4_Inset/*--Create style to suppress border and axis lines--*/
proc template;
  define style styles.noborder;
      parent = styles.likert;
  class GraphBorderLines / lineThickness=0px;
  class GraphAxisLines / linethickness = 0px;
end;

Now, I have used this new NOBORDER style, along with some SAS 9.4  display options to create the graph shown on the right.  I have suppressed the panel HEADER, displayed the question using a new INSET option and used a skin for the HighLow plot as shown above.

Likert_Center_Back_4In the graph on the right, I have positioned the strip such that the zero value is at the center of the x axis.  This alternate view may provide a better feel for the trend, whether negative or positive.  Of-course, this data is simulated using random numbers, so any trend is accidental.

The x axis is now set to span from -100% to 100%.  Each strip no longer spans the entire x axis, so I added an inset background to allow the "Question" to stand out a bit from the rest of the text information in the graph.

 

Likert_Center_Back_5Likert_5_InsetThe same technique can easily be extended to the 5 group level case as shown below.  The graph on the near right shows the full spanning strips, with a "Neutral" group in the middle.  The graph on the far right centers the middle of the neutral segment at zero on the x axis.  Also, I moved the inset labels to the left side.

 

Likert_4_SegLabelFinally, the graph on the right shows the 4 group graph with segment labels.  Here I have used the new SAS 9.4 VBarParm statement to draw the strips with stacked groups instead of the HighLow bar.  I have used the SEGLABEL option to automatically label each segment.  I did not include the High and Low labels from the HighLow plot, but if needed, that can be done.

As usual, this exercise flushed out some deficiencies in the code, but mostly to the lack of a way to turn off the header borders.  We will be sure to address such issues.

Full SAS code:   Likert_SGPanel2

 

Post a Comment