Bar with Statistics

One of the key benefits of using a horizontal bar chart is the ability to display statistics for each bar.  This is a popular feature for the HBAR statement with the SAS/GRAPH GCHART procedure.  So, let us review the options available to us to create such graphs using SGPLOT.

BarLabelThe simplest case is to display the frequency of each bar on the right hand side as shown in the graph on the right.  Here we have used the SGPLOT HBAR statement with the DataLabel option with Position=right.

I have also used the NoWall, NoBorder options and suppressed axis lines and baseline to get this popular view.  Note, the stat values are not colored by group.  Click on the graph for a higher resolution image.

proc sgplot data=cars nowall noborder;
hbar type / group=origin groupdisplay=cluster dataskin=pressed
baselineattrs=(thickness=0) datalabel datalabelpos=right;
yaxis display=(nolabel noline noticks);
xaxis display=(noline noticks) grid;

MPG3With SAS 9.4, you have the option to include any statistics with a HBAR plot using the YAxisTable statement.  We can use this statement to display other statistics as shown on the right.

In this example, I have included the Mean City and Highway mileage along with the frequency counts.  Note, the frequency count values are now color coded by group.  All values are displayed right justified in the column by default.

proc sgplot data=cars nowall noborder;
label mpg_city='Mean City Mileage' mpg_highway='Mean Highway Mileage' n='Count';
format mpg_city mpg_highway 4.1;
hbar type / group=origin groupdisplay=cluster stat=pct dataskin=pressed 
yaxistable n / stat=sum classdisplay=cluster colorgroup=origin 
     valueattrs=(size=6 weight=bold) nostatlabel;
yaxistable mpg_city mpg_highway/ stat=mean classdisplay=cluster colorgroup=origin 
     valueattrs=(size=6 weight=bold);
yaxis display=(nolabel noline noticks);
xaxis display=(noline noticks) grid;

In the graph and code above, I have used one YAxisTable to display the frequency values by using an additional variable called "N" with Freq=Sum.   This variable contains only "1" for each observation so we get the sum of the counts in this column.  You can also use any other numeric variable with Stat=Freq, and set the variable label appropriately.

Using the YAxisTable instead of the DataLabel option as in the first graph allows us to color each observation by group.  Then, I have used a second YAxisTable with mpg_city and mpg_highway as the variables with Stat=Mean to display the mean mileage values also colored by group.

BandsFor the graph on the right, I have used ValueHAlign=center to display each value in the center of the column using a 4.1 format.  I have set the labels for the variables to indicate the statistic used for each label.  I have also used faint horizontal bands for each category to help the eye across the graph.

Statistics can be displayed "Inside" or "Outside" the graph area, which is more apparent if graph borders are used.  Additional statistics can be displayed by adding more variables to the YAxisTable statement, or using another YAxisTable statement to display values on the left of the bars.

Full Program:  BarStats_SG_94

Post a Comment

Likert Graphs

Just this morning I received a request for a brief survey from Apple on my feedback about the new iPhone6+.  Yes, I finally got one, dead last in the family.  The survey followed the usual format, with a number of questions on what I like or dislike about it, with a 5 level scale for my response - Strongly Agree, Agree, Neutral, Disagree and Strongly Disagree.

Coincidentally, I recently also received an article from a co-worker on making Likert Graphs using R.  So, my curiosity was stirred, and I proceeded to dig into it.  Turns out these graphs are frequently used to evaluate the response data for such surveys and I was curious to see how far I could get using SAS 9.3 SG procedures.

Likert_4_DataI proceeded to make up some survey data based on the sample I saw in the article, which was a survey on books in 3 countries with 10 questions or statements.  The answers are summarized into the 4 or 5 groups.  Here I have used 4 groups.  Later I will show an example with 5 groups.

The data looks like the table on the right, each question has a QID for our convenience.  The question itself is also in the data but I did not include it here to keep the table relatively narrow.  The statement is like "Reading is one of my favorite hobbies".

Likert_Panel_4My data is sorted by the Qid, Country and Group.  I can process the data and compute Low and High values for each group, starting with zero.  You can see the data step in the full program attached below.

First, I want to use SAS 9.3 features to create the graph using the SGPanel procedure.  I used the PANEL layout, with the Question as the class variable.  Each question is displayed in the cell header, with a HighLow plot in each cell showing the summarized percent values for each response group by country.

SAS 9.3 SGPANEL procedure syntax:

ods listing style=styles.likert;
title 'Survey Responses to Questions by Country';
proc sgpanel data=Likert_4 ;
  panelby question / layout=panel columns=1 onepanel novarname noborder nowall;
  highlow y=country low=low high=high / group=group type=bar nooutline   
          lowlabel=sumdisagree highlabel=sumagree;
  rowaxis display=(nolabel noticks) fitpolicy=none;
  colaxis display=(nolabel noticks novalues);
  keylegend / noborder;

As you can see, the basic graph is very easy to create using the SGPANEL procedure syntax shown above.  Note, I have used the MODSTYLE macro to derive a new style with the colors I want to use and specified it on the ODS Listing statement.

Also note in the graph and code above, I have used the LowLabel and HighLabel options of the HighLow plot to display the cumulative % of the disagree and agree values at each end.  The SAS 9.3 SGPANEL procedure does not provide an easy way to turn off the cell and header borders.  So, I have derived a style from the Likert style to turn off borders and axis lines.

Likert_4_Inset/*--Create style to suppress border and axis lines--*/
proc template;
  define style styles.noborder;
      parent = styles.likert;
  class GraphBorderLines / lineThickness=0px;
  class GraphAxisLines / linethickness = 0px;

Now, I have used this new NOBORDER style, along with some SAS 9.4  display options to create the graph shown on the right.  I have suppressed the panel HEADER, displayed the question using a new INSET option and used a skin for the HighLow plot as shown above.

Likert_Center_Back_4In the graph on the right, I have positioned the strip such that the zero value is at the center of the x axis.  This alternate view may provide a better feel for the trend, whether negative or positive.  Of-course, this data is simulated using random numbers, so any trend is accidental.

The x axis is now set to span from -100% to 100%.  Each strip no longer spans the entire x axis, so I added an inset background to allow the "Question" to stand out a bit from the rest of the text information in the graph.


Likert_Center_Back_5Likert_5_InsetThe same technique can easily be extended to the 5 group level case as shown below.  The graph on the near right shows the full spanning strips, with a "Neutral" group in the middle.  The graph on the far right centers the middle of the neutral segment at zero on the x axis.  Also, I moved the inset labels to the left side.


Likert_4_SegLabelFinally, the graph on the right shows the 4 group graph with segment labels.  Here I have used the new SAS 9.4 VBarParm statement to draw the strips with stacked groups instead of the HighLow bar.  I have used the SEGLABEL option to automatically label each segment.  I did not include the High and Low labels from the HighLow plot, but if needed, that can be done.

As usual, this exercise flushed out some deficiencies in the code, but mostly to the lack of a way to turn off the header borders.  We will be sure to address such issues.

Full SAS code:   Likert_SGPanel2


Post a Comment

HeatMap with Numeric and Discrete Variables

Heat maps are a great way to visualize the bi-variate distribution of data.  Traditionally, a heat may may have two numeric variables, placed along the X and Y dimension.

HeatMapNumNumEach variable range is sub divided into equal size bins to create a rectangular grid of bins.  The number of observations that fall into each bin is computed, and the grid is displayed by coloring each bin with a shade of color  computed from a color gradient as shown on the right.  Click on the graph to see a higher resolution image.

GTL supports a HeatMapParm statement, which can draw a heat map if provided the X-Y grid of bins, along with a count of observations in each bin.  Actually, the value can be count, or anything else.  So, it comes down to computing the values in each bin.

For the above graph, I used the KDE procedure to compute the frequency of observations in each grid using the "BIVAR" statement for two interval variables.  The binned data is written out th the KDEData data set using the ODS Output statement.

ods output bivariatehistogram=KDEData;
proc kde data=sashelp.heart; 
bivar systolic ageatstart / plots=all ng=100;

Once the data is extracted, I keep the non-missing observations and feed the X, Y and Count data to the HeatMapParm statement using the GTL code shown below.

proc template;
  define statgraph HeatMapNumNum;
    dynamic _x _y _n;
      entrytitle 'Distribution of Age by Systolic Blood Pressue';
      layout overlay;
	heatmapparm x=_x y=_y colorresponse=_n / colormodel=(white yellow red)
           display=(fill outline) outlineattrs=(color=cxf7f7f7) 
            xbinaxis=false ybinaxis=false name='h';
	continuouslegend 'h';
proc sgrender data=KDEData template=HeatMapNumNum;
  dynamic _x='binx' _y='biny' _n='bincount';

Each bin is drawn using a fill color whose shade is computed from the three color map I have specified in the GTL code and also a light gray outline.  It can be seen from the outlines that all bins are drawn and the KDE procedure computes bins with zero frequencies.

Another way to compute the bins is to use the SURVEYREG procedure, as shown in the code below for two interval variables.  This procedure can plot heat maps directly, but for our purposes, we will get the data to draw our own heat map.

ods output fitplot=SurveyRegData;
proc surveyreg data=sashelp.heart plot=fit(shape=rec nbins=30);
   model AgeAtStart = Systolic;

HeatMapNumNum2We can use the data written out by this procedure to draw our heat map just as before.  Note, the SurveyReg procedure allows us to set the number of bins in each direction.  So, here we have used 30 bins in each direction to get a fine grained heat map.

If you click on the graph on the right, you will notice that the map does not have all bins drawn.  This means that the SurveyReg procedure only defines bins that contain non zero counts.  Bins with zero counts are not generated at all, resulting in the empty bins (no outline).

In many cases, we may want to create a Heatmap for a combination of one discrete variable and one interval variable.  The HeatmapParm GTL statement can take either discrete or interval variables, but now can we compute the bins in this case?

One easy way is using the new GTL or SGPLOT Histogram statement with the GROUP option released with SAS 9.4.   Using the GROUP option, the Histogram statement computed a set number of bins for the interval variable for each unique value of the discrete variable.  The histogram does the work to make the interval bins the same for all the discrete levels, giving us exactly what we want.

HeatMapCatNumNow, we can take this data, and use the HeatMapParm GTL statement with one discrete and one interval variable as shown on the right.  I used a four color ramp just for some variety.  The code is shown below.

proc template;
  define statgraph HeatMapCatNum;
  dynamic _title  _x _y _n;
      entrytitle _title;
      layout overlay / yaxisopts=(display=(ticks tickvalues));
        heatmapparm x=_x y=_y colorresponse=_n / colormodel=(white green yellow red) 
            display=(fill outline) outlineattrs=(color=cxf7f7f7) name='h' ;
        continuouslegend 'h';

One can also draw a Heatmap with two discrete variables.  The data is easily computed using the MEANS or FREQ procedures.  The value for each bin can be a response value as shown in this article.

Full SAS 9.4 GTL Code:  HeatMap

Post a Comment

Consistent Group Colors by Value

Getting consistent group colors across different data sets for a graph is a common topic of interest.   Recently a user wrote in to ask how to ensure that specific groups "values" for a bar chart get specific colors.  The group values may arrive in different order, or some may be missing entirely in the data from day to day.

Bar_Data_3ABar_Data_2AThis is an important issue, and the SAS 9.3 Discrete Attributes Map feature was specifically created to address this issue.  On the right are two data sets.  Data Set #1 on the far right has 3 observations for Locations A, B and C with response values and group values based on the response.  Data Set #2 has 2 observations for Locations C and B with response and group.  Notice the locations and group values are in different order, and the group "<50" is missing entirely in data set # 2.

Bar3Fmt_93By default, when colors are assigned by group values, the colors from the GraphData1-GraphData12 elements of the active style are used to color the bars.  The style elements are sequentially assigned to each group in the order they occur in the data.

In the first graph on the right, group value "50-80" is read first, and hence gets the color from GraphData1, which is blue.  The Location values on the X axis are shown in Data order.

Bar2Fmt_93In the second graph on the right, the first Location in the data is "C" with a group values of ">80", so ">80" gets the blue color as shown in the graph and the legend.   In such cases, where the data order and content can change from day to day for the same graph, it is necessary to retain the same color assignments across the graphs.

This is solved by using the Attributes Maps as previously described in my article on Discrete Attribute Maps.

AttrMapDataFirst, we create a discrete attributes map data set.  This is like a format and the data set is like the SGAnnotate data set, with specific column names.  "ID" specifies a name for the attr map, and a data set can have multiple ids for multiple maps.  This id is used to specify the map to be used in the VBar statement.  For each formatted "Value" in the data, we can specify the specific attributes to be used.

Here we have specified the FillColor and the LineColor. The value "<50" gets the fill color of red, and linecolor of black and so on.  Additional attributes like line pattern or symbols can also be specified.  The "Value" in the attr map should contain the formatted value.

BarAttrMapFmt3_93Now, we run data set #1 with the modified program shown below with the discrete attribute map data set provided in the DATTRMAP option on the procedure statement.  We also provide the map id in the VBAR statement.  These options are shown in bold in the code below.  Note, each bar is now colored by the fill color specified in the attr map for each group value.

SAS 9.3 SGPLOT code:

title 'Value by Location';
proc sgplot data=bar3 dattrmap=attrmap;
  vbar loc / response=value group=grp datalabel nostatlabel attrid=X;
  refline 50 / lineattrs=(color=darkred) label='Action Limit' labelloc=inside labelpos=min;
  refline 80 / lineattrs=(color=darkgreen) label='Goal' labelloc=inside labelpos=min;
  xaxis display=(nolabel) discreteorder=data;

The same program can be run with Data Set #2 to create the graph shown on the right.  Note, in the legend of the two graphs, the colors assigned for each group are exactly the same, regardless of the order of the data or the presence or absence of any group value.  The values in the legend are in the order the group values are encountered in the data.  So, the values are not in the same order.  The legend values can be sorted if needed.

Bar_Data_2AllOften it is necessary to include all values in the legend, even if some values may be missing in today's data.  In the graph on the right, I have included all possible group values in the data in the right order to ensure we can get all the values in the legend.

The presence of all groups in the correct order (in the legend) ensures that all group values are in the legend in the order we want.  We know this data as it is in the Attr Map already, so we can pre-pend these additional observations into the data set as shown on the right.

Here is the final graph.  Note, the colors are consistent across all graphs and the legend contains all three expected group values even though Data Set #2 does not contain  the "<50" group.

BarAttrMapFmt2A_93Such graphs are common across all domains including financial and clinical, where we always want the same treatments to be represented in the graph with the same color or symbol across different data set.

Full SAS 9.3 SGPLOT code:  GroupColors_93_Fmt


Post a Comment

Report from MWSUG 2014

FlowersRiverThe Mid-West SAS Users' Group conference in Chicago was a great success, with over 400 attendees and great weather.  The conference hotel was in downtown with nice view of the river and a stroll down "Magnificent Mile".  The city does a great job with the flower beds down Michigan Ave., along the sides and in the median.  I suppose this time, the theme was Thanksgiving.

From graphics perspective, the conference was loaded with excellent presentations, two of which won the "Best Paper" in their tracks.

  • SurvivalKaplan-Meier Survival Plotting Macro %NEWSURV - Jeffrey Meyers, Mayo Clinic, Rochester, Minnesota.  - In this paper Jeffrey presented the techniques he used to create the Survival Plot using GTL.
  • Categorical AND Continuous – The Best of Both Worlds - Kathryn Schurr, Ruth Kurtycz, Spectrum Health-Healthier Communities, Grand Rapids, MI.  In this paper, the authors examined the ways in which data can be visualized using discrete and interval displays by banding the interval data space into logical zones.

CategoricalMany other papers were presented using SG Procedures, GTL  and SAS/GRAPH techniques including:

Here is a link to the Conference Proceedings.

Post a Comment

CandleStick Chart

A HighLow plot is very popular in the financial industry, often used to track the periodic movement of a stock or some instrument or commodity.  The CandleStick Chart is one specific type of high low plot, purportedly originating in Japan for tracking of financial instruments in the rice trade.

Creating a Candlestick Chart using SGPLOT procedure is very straightforward using the Highlow plot statement.   In the data shown on the below, I have extracted the data for one stock, and set the group based on whether the stocked closed higher or lower than the open.  Also based on the same criteria, I have set the Highcap or Lowcap variables to FilledArrow or missing.  For example, if the stock closed higher than open, Gain='Up', Highcap='FilledArrow and LowCap is missing.  V1 and V2 contain the low and high values for Open and Close variables.


CandleStickGray2The graph on the right shows the classical Candlestick Chart.  The Open and Close interval is displayed using a filled region.  Line segments (shadows) are drawn to the High and Low values for each day.  The fill color is white when Close > Open, and gray otherwise.    Click on the graph for a higher resolution image

I have created this graph with the SGPLOT procedure using the HighLow plot statement.  This statement comes in two Orientations and two Types.  The syntax for the statement is as follows:

     highlow X=var High=var Low=var / Type=LINE | BAR Highcap=var Lowcap=var;
     highlow Y=var High=var Low=var / Type=LINE | BAR Highcap=var Lowcap=var;

Here we have used the "Vertical" orientation by setting X=Date.  High and Low variable determine the vertical extents of the bar.  Using Y=var sets the orientation to horizontal, and High and Low variables determine the horizontal extents of the bar.  This is useful for a Adverse Event Timeline  graph or my take on the Swimmer Plot.   For more details on the plot statement and its uses, see the previous article HighLow Plot.  In this graph I have used the following features:

  1. A HighLow plot of Type=LINE to draw the High-Low interval.
  2. A HighLow plot of Type=BAR to draw the filled region displaying the Open-Close interval.
  3. The second HighLow Low=V1 and High=V2 and Group=Gain to color the bar appropriately.
  4. I have used a discrete attributes map to define the colors for "Up" and "Down" values of gain.

SAS 9.4 Code for CandleStick Chart:

title 'Monthly Stock Price';
proc sgplot data=stock dattrmap=attrmap;
  highlow x=date low=low high=high / type=line;
  highlow x=date low=v1 high=v2 / type=bar group=gain 
          lineattrs=(color=black) name='a' attrid=Mono;
  yaxis label='Price' grid;
  xaxis display=(nolabel);
  keylegend 'a' / location=inside position=bottom;

CandleStick2The graph on the right is a color version of the same graph, using a green shade for Gain='Up' and a red shade for Gain='Down'.  The discrete attribute map is defined with two IDs, one for the monochrome graph, and one for the color graph.  All I have to do is flip the attrid in the HighLow Bar statement.  Legend is moved inside the graph area at the bottom since empty space exists.

CandleStickCap2Note the data set also contains the HighCap and LowCap variables.  Each has a value of "FilledArrow" or missing based on the gain.  We have use these columns to make each bar have an arrow pointing up or down appropriately.  Note, some intervals are too small to draw an arrow.  In such cases the arrow is dropped.  This is where we need your feedback.  Would you like to see an option where an arrow is always drawn to indicate the direction regardless of the size of the interval?

CandleStickGrayCap2The traditional graph with directional arrows is shown to the right.

Full SAS Code:  CandleStick2

Post a Comment


Many users of SGPLOT and GTL know how to mix and match various plot statements to create graphs, sometimes in ways not originally intended.  You are also aware that you can go a step beyond, and use these systems to create completely non-standard graphs such as the Spiral Plot, the Polar Graph, the Euler Diagram and more.Data2

The other day I was asked to create a diagram.  I created a simple one with the SGPLOT procedure, with four nodes, and three links.  The four nodes A, B, C and D have the (x, y) positions shown in columns Xn and Yn.  The three links have ids of 1, 2 and 3.  These are drawn using the Series plot, each having 4 points, with first starting at right of the "From" node and ending at the left of the "To" node.  Two additional intermediate points are provided.  The "Node" and "Link" data is merged into the table shown on the right.


DiagramLineThis data is plotted using the SGPLOT procedure, using a series plot to draw the links, and scatter plot to draw the nodes and the node ids.  Graph is shown on the right.  SGPLOT program is shown below, some options are trimmed to fit.  Please see linked file at the bottom for the full program.

proc sgplot data=diagram dattrmap=attrmap;
series x=xl y=yl / group=id name='b';
scatter x=xn y=yn / group=node datalabel=node
keylegend 'b' / linelength=20;

DiagramSmoothWe can use the SmoothConnect option to avoid the sharp angles as shown on the right.  Note, this result is less than satisfactory, as the curves are required to pass through each of the points in the data.  This causes the curves to bend in the opposite direction of the curve as can be seen at the start of each link near node A.  The three links are not co-linear at the start.  Also, at each penultimate node, the curve bends the other way, as can be seen in the blue link to the left of the node.

DiagramBezierNow, for a diagram, it is not really necessary that the link pass through each of the intermediate nodes.  Those are merely there to set a path for the links.  Only the start and end of the path must be on the first and last point.

In the graph on the right we get the desired effect.  Here, each link starts and ends in the right point, but the curve does not necessarily pass through the intermediate points.  The points are used as "control points' to compute a quadratic Bezier Spline.  Then we use the series plot to draw the spline.

BezierSplineThe graph on the right shows the spline curve and the control points.  The original series plot points are used as the control points for the spline.  The spline starts out as a straight line segment from the 1st vertex half way to the 2nd vertex.  Now, from this point, a quadratic curve is calculated to the point half way between the next line segment.  This continues for all the remaining segments, till we reach the half way point of the last segment.  Then, the last segment is again a straight segment to the final vertex.

The benefit of this computation is that the curve is always at a tangent to the first and last segments, thus ensuring the slope of those segments.  Here, we want them to be horizontal.  The portion of the curve in between goes smoothly from one segment to the next.  The program includes the BezierMacro() that computes the points for the quadratic Bezier Spline given the original control points.  For more details, see this WikiPedia page on Quadratic Splines.

SeriesWhile I was in Beijing, the Chinese terms we learned the quickest were "Mien" for noodles, and other derivatives like "Jiruo Mien" for Chicken Noodles.  It was essential to know this at a minimum to order food at Mr. Lee's, the local fast food place.  Here are my versions of the graphs for the "Dry Noodles" and the "Wet Noodles", given the original data.  Click on the graph for a higher resolution image.

The programs below were written using some SAS 9.4 graph features, but these are not essential for this use case.  You can run it at SAS 9.3, and just remove the offending options.

Noodle_Graph_5_20SAS 9.4 Programs: 

Macro:  BezierMacro

Diagram:  Diagram

Noodle:  NoodleGraph

At the right is another use case with longer series plots to draw the response curves by treatment.

Noodle_Graph_10_20Note:  Bezier curves may NOT be appropriate where the curve needs to pass through each point, but can be useful where the points for the series plot are control points to draw a smooth curve.


Post a Comment

PharmaSUG-China 2014

PharmaSUG-ChinaThe Third PharmaSUG-China conference was held in Beijing last week, and I had the pleasure to attend this excellent conference along with a record number of attendees.

On Thursday, I presented two 1/2 day seminars on ODS Graphics.  One titled "Advanced Topics in GTL" and another titled "Complex Clinical Graphs using SAS".  The attendees were eager to learn and the sessions included much discussion, which is always a lot of fun.

The opening session included a presentation of using JMP Clinical for analysis of clinical data. DemographicsThe presentation included a graph of Study Demographics.  Later in the afternoon, I thought it would be appropriate to create the same graph in my presentation on ODS Graphics Designer. The graph is shown on the right.

Friday and Saturday were filled with many presentations on interesting topics in the Programming Techniques and Coder's Corner sections, especially from a graphics perspective.  Conference proceedings are now available.

GrowthThe afternoon also included an excellent presentation on the Essentials of PDV by Arthur Li and Napoleon Plot by Kriss Harris.    Unfortunately, the papers are not available on the proceedings page at this time.

Rajesh Moorakonda, Singapore Clinical Research Institute presented a paper on Monitoring Child Growth that included graphs that plot the anthropometric parameters on a growth chart using the GPLOT procedure as shown on the right.

Anno_SurvivalThe Saturday session included the "Coder's Corner" section which included many interesting papers including a fair share of papers on graphics techniques.

In my presentation titled "Annotate Your SGPLOT Graphs" I presented the basic techniques for annotating an SGPLOT graph using the SGAnnotation data set.   I demonstrated how to add a table of subjects at risk by class to a survival plot.  The paper contains the details on how to make this graph.

Cancer_HeatMapDebpriya Sarker of SAS Institute Pune, presented his paper on "Plotting Against Cancer:  Creating Oncology Plots using SAS".  This paper included the techniques for creating many graphs used in the analysis of data for Oncology, such as the HeatMap depicting correlations for Genes and Drugs.

Huashan Huo, PPD Beijing  presented the paper on "Using SAS SG Procedures to Create and Enhance Figures in Pharmaceutical Industry".  This paper included multiple graphs created using ODS Graphics Designer, GTL and SG Procedures, including the graph of Median of Lipid Profile over time, where the authors added alternate vertical bands to clearly indicate the results for specific days of the study.



Great_Wall_SanjayPresentations were done on "I am Legend" by Kriss Harris showing ways to create a stand alone legend for cases where the legend can get too big to fit in a graph and "Programming Figures beyond SGPLOT and GTL" where the author showed ways to create graphs beyond what can be directly created using SGPLOT or GTL plot statements.  Unfortunately, the papers for these are not available on the web page.

Beijing afforded a great venue for the conference.  A bustling city of historical and modern elements, it provides numerous attractions, ranging from the 2000 year old Great Wall to the majestic Forbidden City to the ultra-modern National Center for the Performing Arts.



Post a Comment

Binary Response Graph

Often we need to plot the response values for binary cases of a classifier.  The graph below is created to simulate one seen at web site of the shock index for subjects with or without a pulmonary embolism.  In this case, the data is simulated for illustration purposes only.

PulmonaryBox_93There are two levels for the classifier for presence of pulmonary embolism, "Absent" and "Present". The response values are plotted as a box plot.  I call this graph the "Binary Response Graph" as I could not find the common name for such a graph.  I would be happy if someone can provide the industry standard name for such a graph.

SAS 9.3 code for box plot:

proc sgplot data=Pulmonary;
  vbox shock / category=pulmonary boxwidth=0.2 fillattrs=(color=lightblue);
  yaxis display=(noticks nolabel noline) min=0 max=2 grid;

Note in the graph, the two class values "Absent" and "Present" are placed on the x axis with an offset of 1/2 the midpoint spacing on each side on the axis.  This is the standard placement of category (aka midpoint) values along a discrete axis for plots like Bar Charts, Box Plots and so on.

PulmonaryScatter_93Now, let us plot the mean, the 5th and the 95th percentile for the same data using the scatter plot.  I used the MEANS procedure to compute the mean, P5 and P95 values to create the data set for the graph shown on the right.  Note, something different happened here with the placement of the category values on the x axis.

Aside:  In this graph I have used two scatter plots just to simulate the filled and outlined mean marker. With SAS 9.4, this can be done with an option.  Click on the graph for a high resolution image.

SAS 9.3 code for scatter plot:

proc sgplot data=Pulmonary;
  scatter x=pulmonary y=mean / yerrorlower=p5 yerrorupper=p95 
          markerattrs=(symbol=circlefilled color=black);
  scatter x=pulmonary y=mean / 
          markerattrs=(symbol=circlefilled color=lightblue size=6);
  yaxis display=(noticks nolabel noline) min=0 max=2 grid;

In the graph above, the category values are displayed at the ends of the axis, with an offset of half the size of the marker at each end of the axis.  This is the standard behavior of the scatter plot on any type of axis.  Setting x axis Type=Discrete does not make any difference.  While we noticed this behavior, we could not change it because the scatter plot is the most extensively used plot type and such a change would create too many problems for many graphs.

However, in such cases, it is often desirable to get the discrete axis behavior similar to the first graph shown above.  How can we get that?  Well, as usual, there are multiple (simple) ways to get the result we want.

PulmonaryScatterHighLow_93First, recall we can (and are) using layers of plots to create the graph.   I can place a high low plot of the same data prior to the scatter plot.  The high low plot prefers a Bar Chart like category axis, and placing it first makes it the "Primary" plot, thus forcing the x axis to its liking and forcing other plots to follow its lead.

The high low plot also does not force a baseline of zero on the y axis, like the bar chart does.  So, it is the ideal choice in this case.  The low and high values of the high low plot are the same (mean), so a dot is drawn at this location that is overdrawn by the scatter marker. Note, the resulting graph is now the way we want as shown above.

SAS 9.3 code for scatter plot with high low:

proc sgplot data=Pulmonary;
  highlow x=pulmonary low=mean high=mean;
  scatter x=pulmonary y=mean / yerrorlower=p5 yerrorupper=p95 
          markerattrs=(symbol=circlefilled color=black);
  scatter x=pulmonary y=mean / 
          markerattrs=(symbol=circlefilled color=lightblue size=6);
  yaxis display=(noticks nolabel noline) min=0 max=2 grid;

PulmonaryScatterGroup_93Another way to achieve a similar result is to use a "dummy" group variable on the scatter plot with GroupDisplay=Cluster.  This forces the axis to what we want as shown on the right.

SAS 9.3 code for scatter plot with cluster group:

proc sgplot data=Pulmonary;
  scatter x=pulmonary y=mean / yerrorlower=p5 yerrorupper=p95 group=pulmonary
          groupdisplay=cluster markerattrs=graphdatadefault
  yaxis display=(noticks nolabel noline) min=0 max=2 grid;

Full SAS 9.3 code:  Pulmonary_93

Post a Comment

New Graphics Features in SAS 9.4M2 - Part 2

For far too long we have been using the venerable Scatter Plot to do the work of placing text strings in the graph.  For far too long we have used the Scatter Plot or the Block Plot to place axis aligned text in the graphs.   It is time to move on.

When we started down the ODS Graphics path over 10 years ago, little did we know how often we would need to do the above.  Almost every clinical graph needs text placed judiciously in the graph.  With SAS 9.4, we released the Axis Table to simplify the task of placing axis aligned text.  Now with SAS 9.4M2, we release the TEXTPLOT for general purpose text placement in a graph.

The Text Plot renders text in the graph in various different ways.  Freed from the Scatter plot, we can specialize this plot to render text in ways that did not make sense with the scatter plot.  Here is the basic syntax:

textplot x=var y=var text=var / group=var colorresponse=var sizeresponse=var;

TextPlotThis new statement makes it possible to create graphs with text alone, or add text in different ways to your graph.  Here are some examples.

Simple text plot:  In this case, we use the basic options on the text plot to display the name of each person in the class data set positioned by Height and Age classified by the variable 'Sex'.

Size_RespText Plot with Size and Color Response: In this example, the font size of the name of each person is proportional to the values in the variable used for the Size Response role.  The color of the text string is determined by the Color Response role.  In this case, both size and color are determined by the same variable "Weight".  Click on the graph for a higher resolution image.  You will also notice in the larger version that the text has a soft "backlight".  This helps in discerning text that has a color close to the background color, like the yellow text.

BMI_CurvesBut the Text Plot goes beyond such features to support rotated text, aligned to the 9 compass directions as shown in the graph on the right.  In this case, we have displayed the standard BMI curves as bands, and want to label them along the top.  Using horizontal text can be a problem for narrow bands.  So, in this case I have specified an angle of rotation independently for each string in the column.

To render rotated text, you can specify an angle of rotation in degrees for each string separately.  This works quite well in most cases, but in this case it can be a problem as the slope of the curve can change based on the aspect ratio of the graph.  So, specifying the angle in data coordinates instead of screen coordinates may work better.  We will be sure to add an option to do that soon.

BMI_NamesClearly, you can overlay markers on the BMI curves to display the values for each subject in a study.  One could use the scatter plot to display the value for each subject, but here we have use the Text Plot itself.

Finally, certain things are harder to do, where you need to know the exact dimensions of the text being rendered. For example, I was attempting to see how far I can get creating a "Word Cloud" graph using the Text plot.  I can size and color each string by a response value based on some statistic (say number of occurrences), and place a string where I want.  But, only the Java rendering code knows the exact string box sizes, which vary for each string.  I cannot know where a string ends for proportional fonts to exactly position the next string.

WordCloudGCalligraphyCRBottomAs an exploration of what could be possible, we created a feedback mechanism to allow the user to know the exact size of a text string (for any given font, weight, size or style).  The renderer can write this information to a file on disk, which can be read back by the user.  Now, using a two pass process, you can create a perfect word cloud yourself as shown on the right.

In the example on the right, I first rendered all the text strings with the correct size, font and style, but all at (x, y) = (0, 0).  We added a mechanism (still under development) to write the actual bounding box of each string in pixel and data space into a csv file.  I read back this information using proc import, and merged the text box information with the original data.  Now, I ran a data step to position each string in sequence, wrapping to the next line when I have reached the end of the data space.

WordCloudCalligraphicThe benefit here is you can implement your own specific algorithm to lay out the strings once you know their exact dimensions.  Instead of a linear word cloud, you could do a circular layout, starting from the middle.  Or, turn the text sideways to fit them closer like some of the examples on the web.  Here is the same data (with different size values) as a grouped word cloud.

So, we are looking for some feedback from you.  Do you see use cases in your work where you could use this information to layout strings exactly where you need? Would knowing the exact pixel dimensions of something rendered in a graph help you control some aspects of the graph?   Please chime in with your opinions to help us determine if such a "feedback loop" could be useful and how you could leverage it.

SAS 9.4M2 Text Plot Code:  TextPlot

Post a Comment