Axis Break Appearance Macro

Bar_NoBreak3Often, we have data where most of the observations are clustered within a narrow range, with a few outliers positioned far away.  When all the data is plotted, the axis is scaled to accommodate all the data, thus skewing the scale.  Techniques to handle such data have been addressed earlier in the article Broken Y Axis and Using Log Axes.

Users have previously voiced the need to support axis breaks in the procedures.  This feature can get complicated very quickly, so our plan was to start with the simpler case, and then build based on your feedback.

Bar_Break3Support for axis ranges on one axis at a time is included in SAS 9.4M1.  You can specify one or more breaks by providing the data range(s) that are to be retained.  In the graph on the right, most of my data is between 2 and 3 on the x axis, with one outlier at x > 10.  I use the RANGES option on the x axis to retain the data ranges 1-3.5 and 9.75-11.

SAS 9.4 SGPLOT code:

proc sgplot data=break noautolegend;
  highlow y=y low=zero high=x / group=y lineattrs=(thickness=3);
  xaxis ranges=(1 - 3.5 9.75-11) integer;
  yaxis min=0 max=4;

As you can see in the graph above, only the ranges specified in the RANGES option are displayed on the x axis.  An attempt is made to keep the tick value increments the same in the displayed regions.  A full height break indicator is displayed across the entire height of the data area.  Such breaks are useful when using plots like bars, needles or series.  If the full break was not shown, it would not be obvious at first glance that the blue needle is broken.

X_AxisBreak_Bracket_Wall_ListingHowever, in many cases, such a full height break indicator is not desirable.  When using scatter plots, users have expressed the need for axis break symbols on the axis only, without the display of the full break indicator.  Such an axis break is shown in the graph on the right.  One has to look carefully to see the "Bracket" break indicator shown on the x axis between "3" and "10". Click on the graph for a higher resolution image.

I created the above graph using the SGPLOT procedure, so how did I get this appearance?  Well, the good news is that the procedure does all the hard work needed to draw only the necessary ranges, etc. and position the data correctly  Now, all we have to do is replace the full axis break indicator with the axis break symbol.  This task can be done using annotate.  Since I know exactly the data extent of the break as provided by me in the RANGES option.  I can use this information to erase the break indicator, and draw my own symbol on the axis.

The idea is simple.  Use the POLYGON function to erase the full break using the same color as the wall or background.  I go from the upper edge of the lower range and the lower edge of the upper range.  Each coordinate is correctly transformed to the right location by the procedure.  Then make the polygon the full height of the graph data area.  Using a RECTANGLE function will not work, as we do not know the pixel width of the break.  Note in the attached program, I adjusted the values a bit to allow for the curvy line.  Then, I draw the axis break myself between the Low and High values of the ranges.

I converted the code into a macro to erase the full break and draw axis break symbol for the case of one break:

%AxisBreak (Axis=, Low=, High=, DataOut=, Type=, Back=, Aspect);

Y_AxisBreak_Z_Analysis_WallAxis is X or Y, Low and High are the data values for the break region.  So, in my case for the example above, Low=3.5, High=9.75.  DataOut is the name of the annotation data set generated, Type is the break type.  Back indicated whether or not you include the wall in the display, and Aspect is the aspect of the graph.

The macro generates the necessary annotation data set that erases the full break, and replaces it with a simple axis break of type Bracket or Z.  The graph on the right uses a "Z" break symbol on the Y axis.  Note, the data range of the axis cover -ive and +ive values.

SGPLOT code with Macro:

%AxisBreak (Axis=X, low=3.5, high=9.75, dataout=anno, back=Wall type=Bracket);
proc sgplot data=break noborder sganno=anno;
  scatter x=x y=y;
  xaxis ranges=(1 - 3.5 9.75 - 11) integer;
  yaxis min=0 max=4;

Due to the way axis breaks are implemented in the code, only break symbols of type Bracket and Z can be drawn reliable using this technique.  But at least you now have a way to display simple axis break symbols, instead of the full length or width break indicator.  We plan to include simple axis break symbols in the next release as requested by you.  So, keep your ideas coming.  Till then, you can use the ideas used in this macro.

I have tested the macro for a few different cases with different styles, with or without wall, different dpi, different graph sizes and data ranges.  It seems to handle most cases of one break on one axis, but I have not tested for presence of required variables, etc. or bad data.  It is provided just as a tool.  I am sure the idea can be extended to multiple breaks on one axis if you have such a case.  I'll leave that exercise to the reader.

SAS Code:  Axis_Break_Poly_Macro

Post a Comment

Clinical Graphs

This week I had the opportunity to present a 1/2 day seminar on creating clinical graphs using the SG procedures during an In-House SAS Users' group meeting.  I have presented this seminar quite a few times now, and I always learn something.

The audience was very receptive, with some people familiar with SAS/GRAPH, and others having some knowledge of SG procedures and GTL.  The seminar focused on SG procedures. Often for such seminars, I like to get an idea in advance on the type of graphs the users need to make on a regular basis.  The list of graphs that  were of interest included Kaplan-Meier curves and Forest Plots.

TumorSizeA couple of specific plots mentioned were the Waterfall graph for "Change in Tumor Size by Treatment" and a graph for "Incidence of Injection Site Reaction".

The Waterfall graph displays the range of change in tumor size in the study by treatment.  The (simulated) data consists of the change in tumor size for subjects in a study, displayed in order of increasing reduction grouped by treatment.  Reference lines are drawn at RECIST threshold of -30% and at 20%.

TumorSizeSkinAlternative appearances can be seen on the Web, including grouping by other criteria.  SGPLOT supports skins to alter the visual appearance of the graphs as shown on the right.  Click on the graph for a higher resolution image.

For more information on graphs for Oncology research, see "Plotting Against Cancer: Creating Oncology Graphs using SAS" by Debpriya Sarkar.

SGPLOT code:

title 'Change in Tumor Size';
title2 'ITT Population';
proc sgplot data=TumorSize nowall noborder;
  styleattrs datacolors=(cxbf0000 cx4f4f4f) datacontrastcolors=(black);
  vbar cid / response=change group=group categoryorder=respdesc datalabel=label
           datalabelattrs=(size=5 weight=bold) groupdisplay=cluster clusterwidth=1;
  refline 20 -30 / lineattrs=(pattern=shortdash);
  xaxis display=none;
  yaxis values=(60 to -100 by -20);
  inset ("C="="CR" "R="="PR" "S="="SD" "P="="PD" "N="="NE") / title='BCR' 
        position=bottomleft border textattrs=(size=6 weight=bold);
  keylegend / title='' location=inside position=topright across=1 border;

Note:  GroupDisplay=Cluster is used to be able to display the bar label on top of each bar. So the ClusterWidth option is used to modify the width of the single bar in the cluster.

I have used style colors in the code to set group colors.  However, one can also use the Discrete Attributes Map to ensure the consistent assignment of colors by groups as shown in the paper mentioned above.

Incidence2The other graph of interest was the "Incidence of Injection-Site Reaction by Time and Cohort", as shown in the graph on the right.  The (simulated) data shows the incidence of reaction by time and cohort using a cluster grouped bar chart.

In this case, user wanted bar fill colors and fill patterns.  Fill patterns can be useful when displaying the graph in a gray scale medium.  SGPLOT supports usage of fill patterns for bars, which is enabled by setting the display option.  The easiest way to do this is to set this option in the style.  The Journal3 style shipped with SAS is designed to display both fill colors and fill patterns.  So, I just used that style, and changed the colors to the ones I wanted using the StyleAttrs statement.  You can also do this by deriving a custom style.


ods listing style=journal3;
proc sgplot data=Incidence nowall noborder;
  styleattrs datacolors=(gray pink lightgreen lightblue) datacontrastcolors=(black);
  vbar time / response=incidence group=group groupdisplay=cluster;
  xaxis discreteorder=data;
  yaxis offsetmax=0.2;
  keylegend / title='' location=inside position=top across=2 border;

Full SAS code:  Clinical_Graphs

Post a Comment

Axis Thresholds

Have you ever wondered why sometimes a SGPLOT or GTL graph has markers drawn beyond the extreme tick and value on an axis and sometimes not?  And, if you prefer your graphs to always have tick values on the axis that cover the whole range of data, how can you do that?

sasgraph3Let us look under the covers a bit to see what is going on and why.  First of all, the above behavior is intentional and referred to as "Thresholding".  It has a specific purpose as displayed in the graph on the right. Here I have generated some data where x is between 0.9 and 4.1 and made this graph using the SAS/GRAPH GPLOT procedure with default axis settings.

Note, GPLOT has used 6 "nice", round number tick values of 0-5 on the x axis to include the entire data range on the axis.  Since the data are only between 0.9 and 4.1,  the plot region to the left and right of the data is not utilized.  In this case, almost 40% of the horizontal space is not used.

WithThreshold2The graph on the right plots the same data using the SGPLOT procedure which uses the full available width of the graph, thus using the space efficiently.  This is the result of the default thresholding heuristics used by SGPLOT and GTL.  SGPLOT also starts out wanting to use 0-5 ticks, but the "0" and "5" ticks are deemed to be unnecessary and dropped, displaying only values 1-4.  Some of the observations are drawn outside the ticks, but the axis range itself covers the full range of the data.

Whether or not to display the outermost ticks and values is determined by the axis threshold on each side independently.  The threshold value can be between 0.0 and 1.0, with a default of 0.3.  This means that if the outermost data value on one side of the axis is more than 30% of the midpoint spacing away from a possible outer tick, then the outer tick is dropped.

DefaultThresholdThe graph on the right displays Diastolic x Cholesterol for all subjects with an AgeAtStart > 60.  The extreme values are labeled showing the Diastolic values in blue and the Cholesterol values in red as indicated in the legend.

Note, on the x-axis, the midpoint spacing is 20.  The extreme right marker has a value of 313.  This is 100*(1-13/20)=35%  away from a potential outer tick at '320'.  Since this is > 30%, the outer tick is dropped.  So, using default threshold of 30%, only upto 30% of the midpoint spacing will be unused at a max.  For Diastolic on the y axis, the upper extreme observation has a value of 115, which is 100*(1-15/20)=25% away from the outer tick of '120', so that tick is retained.

XThresholdMaxSo, what can you do if you always want to see the outer ticks?  The answer is simple - set the ThresholdMin or ThresholdMax options on the axis.  Setting Thresholdmax=1 will ensure that the outer tick will always be shown on the maximum side of the axis as shown in the graph on the right  Now, the outer tick of "320" is displayed on the x-axis.  The code snippet is shown below.  See the attached file for the full code.

SGPLOT code:

proc sgplot data=heart nocycleattrs noautolegend;
scatter x=cholesterol y=diastolic / datalabel=clabel datalabelattrs=graphdata2;
scatter x=cholesterol y=diastolic / datalabel=dlabel datalabelattrs=graphdata1;
xaxis grid thresholdmax=1;
yaxis grid;

YThresholdMaxOn the other hand, setting ThresholdMin or max to '0' will force the outer ticks to be always dropped.  For the graph on the right, I have set the ThresholdMax=0 on the y axis along with the ThresholdMax=1 on the x axis.  Now, the x axis always has outer ticks, and the y axis never has the outer ticks.

Full Program: Threshold

Post a Comment

Bar with Statistics

One of the key benefits of using a horizontal bar chart is the ability to display statistics for each bar.  This is a popular feature for the HBAR statement with the SAS/GRAPH GCHART procedure.  So, let us review the options available to us to create such graphs using SGPLOT.

BarLabelThe simplest case is to display the frequency of each bar on the right hand side as shown in the graph on the right.  Here we have used the SGPLOT HBAR statement with the DataLabel option with Position=right.

I have also used the NoWall, NoBorder options and suppressed axis lines and baseline to get this popular view.  Note, the stat values are not colored by group.  Click on the graph for a higher resolution image.

proc sgplot data=cars nowall noborder;
hbar type / group=origin groupdisplay=cluster dataskin=pressed
baselineattrs=(thickness=0) datalabel datalabelpos=right;
yaxis display=(nolabel noline noticks);
xaxis display=(noline noticks) grid;

MPG3With SAS 9.4, you have the option to include any statistics with a HBAR plot using the YAxisTable statement.  We can use this statement to display other statistics as shown on the right.

In this example, I have included the Mean City and Highway mileage along with the frequency counts.  Note, the frequency count values are now color coded by group.  All values are displayed right justified in the column by default.

proc sgplot data=cars nowall noborder;
label mpg_city='Mean City Mileage' mpg_highway='Mean Highway Mileage' n='Count';
format mpg_city mpg_highway 4.1;
hbar type / group=origin groupdisplay=cluster stat=pct dataskin=pressed 
yaxistable n / stat=sum classdisplay=cluster colorgroup=origin 
     valueattrs=(size=6 weight=bold) nostatlabel;
yaxistable mpg_city mpg_highway/ stat=mean classdisplay=cluster colorgroup=origin 
     valueattrs=(size=6 weight=bold);
yaxis display=(nolabel noline noticks);
xaxis display=(noline noticks) grid;

In the graph and code above, I have used one YAxisTable to display the frequency values by using an additional variable called "N" with Freq=Sum.   This variable contains only "1" for each observation so we get the sum of the counts in this column.  You can also use any other numeric variable with Stat=Freq, and set the variable label appropriately.

Using the YAxisTable instead of the DataLabel option as in the first graph allows us to color each observation by group.  Then, I have used a second YAxisTable with mpg_city and mpg_highway as the variables with Stat=Mean to display the mean mileage values also colored by group.

BandsFor the graph on the right, I have used ValueHAlign=center to display each value in the center of the column using a 4.1 format.  I have set the labels for the variables to indicate the statistic used for each label.  I have also used faint horizontal bands for each category to help the eye across the graph.

Statistics can be displayed "Inside" or "Outside" the graph area, which is more apparent if graph borders are used.  Additional statistics can be displayed by adding more variables to the YAxisTable statement, or using another YAxisTable statement to display values on the left of the bars.

Full Program:  BarStats_SG_94

Post a Comment

Likert Graphs

Just this morning I received a request for a brief survey from Apple on my feedback about the new iPhone6+.  Yes, I finally got one, dead last in the family.  The survey followed the usual format, with a number of questions on what I like or dislike about it, with a 5 level scale for my response - Strongly Agree, Agree, Neutral, Disagree and Strongly Disagree.

Coincidentally, I recently also received an article from a co-worker on making Likert Graphs using R.  So, my curiosity was stirred, and I proceeded to dig into it.  Turns out these graphs are frequently used to evaluate the response data for such surveys and I was curious to see how far I could get using SAS 9.3 SG procedures.

Likert_4_DataI proceeded to make up some survey data based on the sample I saw in the article, which was a survey on books in 3 countries with 10 questions or statements.  The answers are summarized into the 4 or 5 groups.  Here I have used 4 groups.  Later I will show an example with 5 groups.

The data looks like the table on the right, each question has a QID for our convenience.  The question itself is also in the data but I did not include it here to keep the table relatively narrow.  The statement is like "Reading is one of my favorite hobbies".

Likert_Panel_4My data is sorted by the Qid, Country and Group.  I can process the data and compute Low and High values for each group, starting with zero.  You can see the data step in the full program attached below.

First, I want to use SAS 9.3 features to create the graph using the SGPanel procedure.  I used the PANEL layout, with the Question as the class variable.  Each question is displayed in the cell header, with a HighLow plot in each cell showing the summarized percent values for each response group by country.

SAS 9.3 SGPANEL procedure syntax:

ods listing style=styles.likert;
title 'Survey Responses to Questions by Country';
proc sgpanel data=Likert_4 ;
  panelby question / layout=panel columns=1 onepanel novarname noborder nowall;
  highlow y=country low=low high=high / group=group type=bar nooutline   
          lowlabel=sumdisagree highlabel=sumagree;
  rowaxis display=(nolabel noticks) fitpolicy=none;
  colaxis display=(nolabel noticks novalues);
  keylegend / noborder;

As you can see, the basic graph is very easy to create using the SGPANEL procedure syntax shown above.  Note, I have used the MODSTYLE macro to derive a new style with the colors I want to use and specified it on the ODS Listing statement.

Also note in the graph and code above, I have used the LowLabel and HighLabel options of the HighLow plot to display the cumulative % of the disagree and agree values at each end.  The SAS 9.3 SGPANEL procedure does not provide an easy way to turn off the cell and header borders.  So, I have derived a style from the Likert style to turn off borders and axis lines.

Likert_4_Inset/*--Create style to suppress border and axis lines--*/
proc template;
  define style styles.noborder;
      parent = styles.likert;
  class GraphBorderLines / lineThickness=0px;
  class GraphAxisLines / linethickness = 0px;

Now, I have used this new NOBORDER style, along with some SAS 9.4  display options to create the graph shown on the right.  I have suppressed the panel HEADER, displayed the question using a new INSET option and used a skin for the HighLow plot as shown above.

Likert_Center_Back_4In the graph on the right, I have positioned the strip such that the zero value is at the center of the x axis.  This alternate view may provide a better feel for the trend, whether negative or positive.  Of-course, this data is simulated using random numbers, so any trend is accidental.

The x axis is now set to span from -100% to 100%.  Each strip no longer spans the entire x axis, so I added an inset background to allow the "Question" to stand out a bit from the rest of the text information in the graph.


Likert_Center_Back_5Likert_5_InsetThe same technique can easily be extended to the 5 group level case as shown below.  The graph on the near right shows the full spanning strips, with a "Neutral" group in the middle.  The graph on the far right centers the middle of the neutral segment at zero on the x axis.  Also, I moved the inset labels to the left side.


Likert_4_SegLabelFinally, the graph on the right shows the 4 group graph with segment labels.  Here I have used the new SAS 9.4 VBarParm statement to draw the strips with stacked groups instead of the HighLow bar.  I have used the SEGLABEL option to automatically label each segment.  I did not include the High and Low labels from the HighLow plot, but if needed, that can be done.

As usual, this exercise flushed out some deficiencies in the code, but mostly to the lack of a way to turn off the header borders.  We will be sure to address such issues.

Full SAS code:   Likert_SGPanel2


Post a Comment

HeatMap with Numeric and Discrete Variables

Heat maps are a great way to visualize the bi-variate distribution of data.  Traditionally, a heat may may have two numeric variables, placed along the X and Y dimension.

HeatMapNumNumEach variable range is sub divided into equal size bins to create a rectangular grid of bins.  The number of observations that fall into each bin is computed, and the grid is displayed by coloring each bin with a shade of color  computed from a color gradient as shown on the right.  Click on the graph to see a higher resolution image.

GTL supports a HeatMapParm statement, which can draw a heat map if provided the X-Y grid of bins, along with a count of observations in each bin.  Actually, the value can be count, or anything else.  So, it comes down to computing the values in each bin.

For the above graph, I used the KDE procedure to compute the frequency of observations in each grid using the "BIVAR" statement for two interval variables.  The binned data is written out th the KDEData data set using the ODS Output statement.

ods output bivariatehistogram=KDEData;
proc kde data=sashelp.heart; 
bivar systolic ageatstart / plots=all ng=100;

Once the data is extracted, I keep the non-missing observations and feed the X, Y and Count data to the HeatMapParm statement using the GTL code shown below.

proc template;
  define statgraph HeatMapNumNum;
    dynamic _x _y _n;
      entrytitle 'Distribution of Age by Systolic Blood Pressue';
      layout overlay;
	heatmapparm x=_x y=_y colorresponse=_n / colormodel=(white yellow red)
           display=(fill outline) outlineattrs=(color=cxf7f7f7) 
            xbinaxis=false ybinaxis=false name='h';
	continuouslegend 'h';
proc sgrender data=KDEData template=HeatMapNumNum;
  dynamic _x='binx' _y='biny' _n='bincount';

Each bin is drawn using a fill color whose shade is computed from the three color map I have specified in the GTL code and also a light gray outline.  It can be seen from the outlines that all bins are drawn and the KDE procedure computes bins with zero frequencies.

Another way to compute the bins is to use the SURVEYREG procedure, as shown in the code below for two interval variables.  This procedure can plot heat maps directly, but for our purposes, we will get the data to draw our own heat map.

ods output fitplot=SurveyRegData;
proc surveyreg data=sashelp.heart plot=fit(shape=rec nbins=30);
   model AgeAtStart = Systolic;

HeatMapNumNum2We can use the data written out by this procedure to draw our heat map just as before.  Note, the SurveyReg procedure allows us to set the number of bins in each direction.  So, here we have used 30 bins in each direction to get a fine grained heat map.

If you click on the graph on the right, you will notice that the map does not have all bins drawn.  This means that the SurveyReg procedure only defines bins that contain non zero counts.  Bins with zero counts are not generated at all, resulting in the empty bins (no outline).

In many cases, we may want to create a Heatmap for a combination of one discrete variable and one interval variable.  The HeatmapParm GTL statement can take either discrete or interval variables, but now can we compute the bins in this case?

One easy way is using the new GTL or SGPLOT Histogram statement with the GROUP option released with SAS 9.4.   Using the GROUP option, the Histogram statement computed a set number of bins for the interval variable for each unique value of the discrete variable.  The histogram does the work to make the interval bins the same for all the discrete levels, giving us exactly what we want.

HeatMapCatNumNow, we can take this data, and use the HeatMapParm GTL statement with one discrete and one interval variable as shown on the right.  I used a four color ramp just for some variety.  The code is shown below.

proc template;
  define statgraph HeatMapCatNum;
  dynamic _title  _x _y _n;
      entrytitle _title;
      layout overlay / yaxisopts=(display=(ticks tickvalues));
        heatmapparm x=_x y=_y colorresponse=_n / colormodel=(white green yellow red) 
            display=(fill outline) outlineattrs=(color=cxf7f7f7) name='h' ;
        continuouslegend 'h';

One can also draw a Heatmap with two discrete variables.  The data is easily computed using the MEANS or FREQ procedures.  The value for each bin can be a response value as shown in this article.

Full SAS 9.4 GTL Code:  HeatMap

Post a Comment

Consistent Group Colors by Value

Getting consistent group colors across different data sets for a graph is a common topic of interest.   Recently a user wrote in to ask how to ensure that specific groups "values" for a bar chart get specific colors.  The group values may arrive in different order, or some may be missing entirely in the data from day to day.

Bar_Data_3ABar_Data_2AThis is an important issue, and the SAS 9.3 Discrete Attributes Map feature was specifically created to address this issue.  On the right are two data sets.  Data Set #1 on the far right has 3 observations for Locations A, B and C with response values and group values based on the response.  Data Set #2 has 2 observations for Locations C and B with response and group.  Notice the locations and group values are in different order, and the group "<50" is missing entirely in data set # 2.

Bar3Fmt_93By default, when colors are assigned by group values, the colors from the GraphData1-GraphData12 elements of the active style are used to color the bars.  The style elements are sequentially assigned to each group in the order they occur in the data.

In the first graph on the right, group value "50-80" is read first, and hence gets the color from GraphData1, which is blue.  The Location values on the X axis are shown in Data order.

Bar2Fmt_93In the second graph on the right, the first Location in the data is "C" with a group values of ">80", so ">80" gets the blue color as shown in the graph and the legend.   In such cases, where the data order and content can change from day to day for the same graph, it is necessary to retain the same color assignments across the graphs.

This is solved by using the Attributes Maps as previously described in my article on Discrete Attribute Maps.

AttrMapDataFirst, we create a discrete attributes map data set.  This is like a format and the data set is like the SGAnnotate data set, with specific column names.  "ID" specifies a name for the attr map, and a data set can have multiple ids for multiple maps.  This id is used to specify the map to be used in the VBar statement.  For each formatted "Value" in the data, we can specify the specific attributes to be used.

Here we have specified the FillColor and the LineColor. The value "<50" gets the fill color of red, and linecolor of black and so on.  Additional attributes like line pattern or symbols can also be specified.  The "Value" in the attr map should contain the formatted value.

BarAttrMapFmt3_93Now, we run data set #1 with the modified program shown below with the discrete attribute map data set provided in the DATTRMAP option on the procedure statement.  We also provide the map id in the VBAR statement.  These options are shown in bold in the code below.  Note, each bar is now colored by the fill color specified in the attr map for each group value.

SAS 9.3 SGPLOT code:

title 'Value by Location';
proc sgplot data=bar3 dattrmap=attrmap;
  vbar loc / response=value group=grp datalabel nostatlabel attrid=X;
  refline 50 / lineattrs=(color=darkred) label='Action Limit' labelloc=inside labelpos=min;
  refline 80 / lineattrs=(color=darkgreen) label='Goal' labelloc=inside labelpos=min;
  xaxis display=(nolabel) discreteorder=data;

The same program can be run with Data Set #2 to create the graph shown on the right.  Note, in the legend of the two graphs, the colors assigned for each group are exactly the same, regardless of the order of the data or the presence or absence of any group value.  The values in the legend are in the order the group values are encountered in the data.  So, the values are not in the same order.  The legend values can be sorted if needed.

Bar_Data_2AllOften it is necessary to include all values in the legend, even if some values may be missing in today's data.  In the graph on the right, I have included all possible group values in the data in the right order to ensure we can get all the values in the legend.

The presence of all groups in the correct order (in the legend) ensures that all group values are in the legend in the order we want.  We know this data as it is in the Attr Map already, so we can pre-pend these additional observations into the data set as shown on the right.

Here is the final graph.  Note, the colors are consistent across all graphs and the legend contains all three expected group values even though Data Set #2 does not contain  the "<50" group.

BarAttrMapFmt2A_93Such graphs are common across all domains including financial and clinical, where we always want the same treatments to be represented in the graph with the same color or symbol across different data set.

Full SAS 9.3 SGPLOT code:  GroupColors_93_Fmt


Post a Comment

Report from MWSUG 2014

FlowersRiverThe Mid-West SAS Users' Group conference in Chicago was a great success, with over 400 attendees and great weather.  The conference hotel was in downtown with nice view of the river and a stroll down "Magnificent Mile".  The city does a great job with the flower beds down Michigan Ave., along the sides and in the median.  I suppose this time, the theme was Thanksgiving.

From graphics perspective, the conference was loaded with excellent presentations, two of which won the "Best Paper" in their tracks.

  • SurvivalKaplan-Meier Survival Plotting Macro %NEWSURV - Jeffrey Meyers, Mayo Clinic, Rochester, Minnesota.  - In this paper Jeffrey presented the techniques he used to create the Survival Plot using GTL.
  • Categorical AND Continuous – The Best of Both Worlds - Kathryn Schurr, Ruth Kurtycz, Spectrum Health-Healthier Communities, Grand Rapids, MI.  In this paper, the authors examined the ways in which data can be visualized using discrete and interval displays by banding the interval data space into logical zones.

CategoricalMany other papers were presented using SG Procedures, GTL  and SAS/GRAPH techniques including:

Here is a link to the Conference Proceedings.

Post a Comment

CandleStick Chart

A HighLow plot is very popular in the financial industry, often used to track the periodic movement of a stock or some instrument or commodity.  The CandleStick Chart is one specific type of high low plot, purportedly originating in Japan for tracking of financial instruments in the rice trade.

Creating a Candlestick Chart using SGPLOT procedure is very straightforward using the Highlow plot statement.   In the data shown on the below, I have extracted the data for one stock, and set the group based on whether the stocked closed higher or lower than the open.  Also based on the same criteria, I have set the Highcap or Lowcap variables to FilledArrow or missing.  For example, if the stock closed higher than open, Gain='Up', Highcap='FilledArrow and LowCap is missing.  V1 and V2 contain the low and high values for Open and Close variables.


CandleStickGray2The graph on the right shows the classical Candlestick Chart.  The Open and Close interval is displayed using a filled region.  Line segments (shadows) are drawn to the High and Low values for each day.  The fill color is white when Close > Open, and gray otherwise.    Click on the graph for a higher resolution image

I have created this graph with the SGPLOT procedure using the HighLow plot statement.  This statement comes in two Orientations and two Types.  The syntax for the statement is as follows:

     highlow X=var High=var Low=var / Type=LINE | BAR Highcap=var Lowcap=var;
     highlow Y=var High=var Low=var / Type=LINE | BAR Highcap=var Lowcap=var;

Here we have used the "Vertical" orientation by setting X=Date.  High and Low variable determine the vertical extents of the bar.  Using Y=var sets the orientation to horizontal, and High and Low variables determine the horizontal extents of the bar.  This is useful for a Adverse Event Timeline  graph or my take on the Swimmer Plot.   For more details on the plot statement and its uses, see the previous article HighLow Plot.  In this graph I have used the following features:

  1. A HighLow plot of Type=LINE to draw the High-Low interval.
  2. A HighLow plot of Type=BAR to draw the filled region displaying the Open-Close interval.
  3. The second HighLow Low=V1 and High=V2 and Group=Gain to color the bar appropriately.
  4. I have used a discrete attributes map to define the colors for "Up" and "Down" values of gain.

SAS 9.4 Code for CandleStick Chart:

title 'Monthly Stock Price';
proc sgplot data=stock dattrmap=attrmap;
  highlow x=date low=low high=high / type=line;
  highlow x=date low=v1 high=v2 / type=bar group=gain 
          lineattrs=(color=black) name='a' attrid=Mono;
  yaxis label='Price' grid;
  xaxis display=(nolabel);
  keylegend 'a' / location=inside position=bottom;

CandleStick2The graph on the right is a color version of the same graph, using a green shade for Gain='Up' and a red shade for Gain='Down'.  The discrete attribute map is defined with two IDs, one for the monochrome graph, and one for the color graph.  All I have to do is flip the attrid in the HighLow Bar statement.  Legend is moved inside the graph area at the bottom since empty space exists.

CandleStickCap2Note the data set also contains the HighCap and LowCap variables.  Each has a value of "FilledArrow" or missing based on the gain.  We have use these columns to make each bar have an arrow pointing up or down appropriately.  Note, some intervals are too small to draw an arrow.  In such cases the arrow is dropped.  This is where we need your feedback.  Would you like to see an option where an arrow is always drawn to indicate the direction regardless of the size of the interval?

CandleStickGrayCap2The traditional graph with directional arrows is shown to the right.

Full SAS Code:  CandleStick2

Post a Comment


Many users of SGPLOT and GTL know how to mix and match various plot statements to create graphs, sometimes in ways not originally intended.  You are also aware that you can go a step beyond, and use these systems to create completely non-standard graphs such as the Spiral Plot, the Polar Graph, the Euler Diagram and more.Data2

The other day I was asked to create a diagram.  I created a simple one with the SGPLOT procedure, with four nodes, and three links.  The four nodes A, B, C and D have the (x, y) positions shown in columns Xn and Yn.  The three links have ids of 1, 2 and 3.  These are drawn using the Series plot, each having 4 points, with first starting at right of the "From" node and ending at the left of the "To" node.  Two additional intermediate points are provided.  The "Node" and "Link" data is merged into the table shown on the right.


DiagramLineThis data is plotted using the SGPLOT procedure, using a series plot to draw the links, and scatter plot to draw the nodes and the node ids.  Graph is shown on the right.  SGPLOT program is shown below, some options are trimmed to fit.  Please see linked file at the bottom for the full program.

proc sgplot data=diagram dattrmap=attrmap;
series x=xl y=yl / group=id name='b';
scatter x=xn y=yn / group=node datalabel=node
keylegend 'b' / linelength=20;

DiagramSmoothWe can use the SmoothConnect option to avoid the sharp angles as shown on the right.  Note, this result is less than satisfactory, as the curves are required to pass through each of the points in the data.  This causes the curves to bend in the opposite direction of the curve as can be seen at the start of each link near node A.  The three links are not co-linear at the start.  Also, at each penultimate node, the curve bends the other way, as can be seen in the blue link to the left of the node.

DiagramBezierNow, for a diagram, it is not really necessary that the link pass through each of the intermediate nodes.  Those are merely there to set a path for the links.  Only the start and end of the path must be on the first and last point.

In the graph on the right we get the desired effect.  Here, each link starts and ends in the right point, but the curve does not necessarily pass through the intermediate points.  The points are used as "control points' to compute a quadratic Bezier Spline.  Then we use the series plot to draw the spline.

BezierSplineThe graph on the right shows the spline curve and the control points.  The original series plot points are used as the control points for the spline.  The spline starts out as a straight line segment from the 1st vertex half way to the 2nd vertex.  Now, from this point, a quadratic curve is calculated to the point half way between the next line segment.  This continues for all the remaining segments, till we reach the half way point of the last segment.  Then, the last segment is again a straight segment to the final vertex.

The benefit of this computation is that the curve is always at a tangent to the first and last segments, thus ensuring the slope of those segments.  Here, we want them to be horizontal.  The portion of the curve in between goes smoothly from one segment to the next.  The program includes the BezierMacro() that computes the points for the quadratic Bezier Spline given the original control points.  For more details, see this WikiPedia page on Quadratic Splines.

SeriesWhile I was in Beijing, the Chinese terms we learned the quickest were "Mien" for noodles, and other derivatives like "Jiruo Mien" for Chicken Noodles.  It was essential to know this at a minimum to order food at Mr. Lee's, the local fast food place.  Here are my versions of the graphs for the "Dry Noodles" and the "Wet Noodles", given the original data.  Click on the graph for a higher resolution image.

The programs below were written using some SAS 9.4 graph features, but these are not essential for this use case.  You can run it at SAS 9.3, and just remove the offending options.

Noodle_Graph_5_20SAS 9.4 Programs: 

Macro:  BezierMacro

Diagram:  Diagram

Noodle:  NoodleGraph

At the right is another use case with longer series plots to draw the response curves by treatment.

Noodle_Graph_10_20Note:  Bezier curves may NOT be appropriate where the curve needs to pass through each point, but can be useful where the points for the series plot are control points to draw a smooth curve.


Post a Comment