'Unbox' Your Box Plots - part deux

There was a recent comment on the original 'Unbox Your Box Plots', where a user wants to see the original data for the box, but only label the outliers.

As noted in the comment, labeling all the scatter markers and turning on the outlier display is not ideal. But there is a way to do this.

The basic idea:

  • PROC MEANS (or PROC UNIVARIATE) to compute the Q1 and Q3 for the data
  • compute the upper and lower fences
  • blank out the label variable if that observation is not an outlier.

With SAS 9.4, GTL scatter plots support jitter. So we can do away with workaround using interval X axis as required in the original post. Here is the GTL output:

GTL Box plot with jittered data

You can also do this with SGPLOT procedure (as of SAS 9.4, 1st maintenance release), with the result as shown below:

SGPlot box plot with jittered data

The full code for both examples is here.

Post a Comment

Stock Plots

This weekend I was reviewing my portfolio of stocks as usual.  Yes, I do have a small stock portfolio with a few stocks, and normally I use free stock charting software to review the stock plots.  These sites allow you to view the daily stock prices along with many technical indicators such as moving averages, Bollinger bands and more.

FB_2016_2AOne technical indicator often talked about is the range bands.  The conventional wisdom being that the prices tend to stay within this range, until they don't.  I could not find a way to do this on the website I use, so I have to take a screen shot of the graph, and then lay straight lines at the upper and lower range of the stock prices using Microsoft Publisher.  The result is shown on the right.

The process above is a bit tedious, so I figured I could use the power of SAS to create the graph I need as shown below.  Click on the graph for a higher resolution version.

FB_2Yr_1BI created this graph by downloading the 2-year stock data for Face Book (FB) from the NASDAQ site.  For sure, there are many other sites available.  Then, I used the SGPLOT procedure to create the graph, plotting a time series of Close x Date using the SERIES plot with an overlay of a REG plot (nomarkers).  The default 95% confidence works quite well to bound the low and high values of the graph.  However, I adjusted the Alpha value for a tighter fit, at least for this graph, and settled on Alpha=0.1.


title "&name (&symbol) Daily Close (Alpha=&alpha) Degree=1 on &sysdate";
proc sgplot data=&symbol._2yr_data noautolegend subpixel;
  series x=date y=close / y2axis ;
  reg x=date y=close / y2axis nomarkers cli alpha=α
  y2axis grid display=(nolabel);
  xaxis grid display=(nolabel);

Note the use of the &name, &symbol and &alpha macro variables. These are used because I made this into a macro that will download the data, process it, and create multiple graphs given the stock name and symbol.  See the full code linked below.

Just to compare the results, I also tried a quadratic fit and one with ORDER=3.  The results are interesting.  The different graphs indicate different potential for the stock, each indicating some room for the stock prices to go up before they become "over bought".  Note, the previous conclusion is purely speculation on my part, and not meant as "financial advice".  Alpha=0.1 is likely to not fit different stocks based on individual "beta", and can be changed in the macro invocation.









The same can be done for other stock symbols as shown below.  Note, the 90% CLI bands are used for all the graphs.  The last graph uses a HighLow plot with 1-year data.

Full SAS code:  StockPlotMacro_2




Post a Comment

Graph Table with Class

As often is the case, this article is prompted by a recent post on the SAS/GRAPH and ODS Graphics page communities page.  A user wanted to create a Graph Table showing a bar chart with tabular data for each of the category values along the x-axis.  The user was creatively using a VBAR overlaid with multiple VLINE statements using SAS 9.40M?.  The VLINE statements were used to display the statistics.

BoxPlotTables_3I applaud the creativity of the user, who has clearly taken to heart the lesson that multiple plot statements can often be used creatively to build the graph you may want.  Prior to SAS 9.4, this was one way to overlay additional textual data on a graph that contains a VBAR.  However, with SAS 9.4, there is an easier way - AxisTable.

While we have discussed AxisTables in earlier articles, it seems worthwhile to review the subject.  The graph above right shows how you can display multiple rows of data statistics aligned with the x-axis categories.  The group values are clustered as shown for the box plot and in the table below it.  Click on the graph for a higher resolution image.

BoxPlotTableOur goal is to create the graph above.  Let us start with a cluster grouped box plot along with textual display of data.   In the graph on the right, a box plot of Horsepower is displayed by Type with Group=Origin for the data set sashelp.cars.  The group values are clustered side-by-side.  An xAxisTable is used to display the associated values for Horsepower, also classified by Origin.

Note, since the CLASS option is used with the xAxisTable, the statistical values for the three levels of "Origin" are displayed stacked under each category on the x-axis.  Each class value is displayed on the left.

BoxPlotTableClusterWith SAS 9.40M3, the CLASSDISPLAY option was added to allow the display of the class values in the clustered arrangement as shown on the right.  Using CLASSDISPLAY=CLUSTER, values for each class are displayed side by side, and arranged in the same way as in the box plot.  Now, the name of the variable is displayed on the left of the values.  Note, we have used the COLORGROUP=Origin to color each value by the same variable to provide a visual that is easier to decode.

BoxPlotTables_3The benefit of this option is that multiple statistics can be displayed with such grouped plot statements.  The graph on the right shows the mean values for Horsepower, Mpg_City and Mpg_Highway.  More variables can be used if necessary.

SAS 9.40M3 Code for grouped Box Plot with Table.

title h=10pt 'Mean Auto Statistics by Type and Origin';
proc sgplot data=sashelp.cars(where=(type ne 'Hybrid')) noborder;
  format mpg_city mpg_highway horsepower 3.0;
  styleattrs axisextent=data;
  vbox horsepower / category=type group=origin name='a'
           groupdisplay=cluster dataskin=gloss
          meanattrs=(size=6) outlierattrs=(size=5);
  xaxistable horsepower mpg_city mpg_highway / class=origin
         classdisplay=cluster stat=mean
        colorgroup=origin location=inside nostatlabel;
  xaxis display=(nolabel noticks noline);
  keylegend 'a' / location=inside position=topright across=1 title='';
  yaxis grid;

BoxPlotTablesBandsFinally, the user wanted to add vertical divider lines (column border) to separate the column of values.  Unfortunately, the AxisTable statement does not currently support column or row borders.  However, the x-axis color bands could be used to create such a grouping as shown in the graph on the right.  Click on the graph to see this more clearly.  The banding intentionally uses a soft color, matching the color of the background.  However, that can be controlled in the syntax.

A Graph Table is very effective for display of results of an analysis. The AxisTable is ideally suited to help create such visuals.  Graph Tables such as the Survival Plot or the Forest Plot are popular examples of the usage of Axis Tables.

Full SAS 9.40M3 code for Graph Tables:  GraphTableWithClass

Post a Comment

Polar Graph - Wind Rose

Last week I posted an article on displaying polar graph using SAS.  When the measured data (R, Theta) are in the polar coordinates as radius and angle, then this data can be easily transformed into the XY space using the simple transform shown below.

    x=r*cos(theta * PI / 180);
    y=r*sin(theta * PI / 180);

Then, we can plot the graph using a scatter plot statement.  Setting Aspect=1 ensures the graph retains its shape, and we add the radial grid lines using Vector plot statement.  With GTL, we can use the Ellipseparm statement to display the circular grids.   With SGPLOT, we can use either a Polygon plot or SGAnnotate to draw the circular grids.

Wind_Polar_3In this article, we will discuss another popular polar graph called the Wind Rose Graph.  This graph was developed to depict the wind speed and direction, and can be useful to present any directional information, or information that is cyclical in nature.

The Wind Rose graph on the right is created using the SGPLOT procedure.  Here I have simulated wind data by direction and speed category.  The data was generated using some trigonometric and random functions and does not represent real or sampled data.

Note, this visualization does not use a scatter plot, as was the case with the polar graph in the previous article.  Here, we have used a "Bar" to represent the wind from each direction.  This needs a bit more work to create.

Wind_Data_3I start with generating the data in (R, Theta) coordinates, as shown on the right.  For 16 values around the circle, I generate the wind percentages by 4 "Knots" categories.  These can be seen in the legend in the graph above.  I have generated Low and High values for each segment in the table on the right.  This is done for ease of transformation later into the polar graph.  For simplicity, each group has equal number of percentages.  Values with total > 80 has 4 groups.

We can certainly plot this data directly as a HighLow plot in the XY space as shown below.  Click on the graph for a higher resolution image.

Wind_XY_Highlow_3title h=10pt 'Wind data as stacked HighLow segments';
proc sgplot data=WindSpeed noborder;
  styleattrs datacolors=(forestgreen lightgreen gold cxD00000);
  highlow x=theta low=low high=high / group=knots type=bar ;
  yaxis offsetmin=0 label='Percent' grid;
  xaxis values=(0 to 360 by 45);

Note in the bar chart on the right, the bars are actually plotted using the High-Low plot, with Type=bar.  The overall wind values are easy to compare side by side.  However, since the data is really directional, let us plot the bars on the compass directions.

Wind_Polar_Data_3Using the equation shown above, we transform the (R, Theta) coordinates to the (x, y) coordinates.  A polygon is generated for each segment of the high-low in the original (R, Theta) space.  Then each vertex of the polygon is converted into the (x, y) space as shown in the table on the right.  When plotted, the "Polar Bars" (pun intended) are displayed.  The table includes the Knots variable to be used as color group and polygon Id.

In the same data set, data is generated for the 16 radial grid lines, with values 0-315, the angle around the circle.  A Vector plot is used to display this on the graph as the 16 directions.  A Text plot is used to display the labels for each direction using a user defined format.  X and Y axes are turned off, and Aspect=1 is used.  The result is shown in the graph at the top of the article.

title h=10pt 'Wind Rose created using SAS SGPLOT Procedure';
proc sgplot data=WindPolar aspect=1 noborder nowall noautolegend sganno=anno subpixel;
  format label dir.;
  format knots knots.;
  styleattrs datacolors=(forestgreen lightgreen gold cxD00000);
  polygon x=x y=y id=id / dataskin=sheen fill nooutline group=knots name='a';
  vector x=x2 y=y2 / xorigin=x1 yorigin=y1 lineattrs=(color=lightGray) noarrowheads;
  text x=xl y=yl text=label / textattrs=(size=7);
  keylegend 'a' / position=right across=1;
  xaxis display=none;
  yaxis display=none;

anno_dataIn the previous article for Polar Graph, I had used GTL to display the circular grid lines using the EllipseParm statement.  In this graph, I have used SGAnnotate to draw the circular grids using the OVAL function.  Click on the table on the right to see the annotate data in detail.  Note the use of "Layer=back".  This draws the circular grids behind the graph.  To see these annotations, we need to turn off the Wall display.

A label for each circular grid can be added using the TEXT function.  That exercise is left to the motivated reader.

Full SAS 9.4 SGPLOT Code:  WindRoseGraph




Post a Comment

Polar Graph

There are many situations where it is beneficial to display the data using a polar graph.  Often your data may contain directional information.  Or, the data may be cyclic in nature, with information over time by weeks, or years.  The simple solution is to display the directional or time data on the X axis of a XY plot as shown further down in the article.  But the information may not very easy to understand in such a graph.  Such a graph was recently discussed on the SAS Communities page.

Wind_Graph_Polar_SG2The same information can be better understood using a polar graph.  The graph on the right shows the (simulated) BC concentration by Wind Direction and Wind Speed.  The data for this graph was simulated by me using some random and trigonometric functions.  There is no real sampled or measured information in the data.

The data has the concentration of BC by wind direction and Wind speed.  In this graph I have transformed the data so that the direction (0-360 degrees) is mapped to a point around the circle, and the speed (0-25 mph) is mapped along the radius.  The BC concentration is displayed by a colored marker, and the gradient legend is displayed on the right to decode the values.

The directions are displayed using the N-S-E-W arrows, so understanding the information is easier.  Note, I have not yet added the indicators for the Wind Speed in this graph.

Here is the SAS 9.40 SGPLOT program for the graph.

title 'BC Concentration by Wind Speed and Direction';
proc sgplot data=wind aspect=1.0 noborder;
  scatter x=x y=y / colorresponse=bc markerattrs=(symbol=circlefilled)
               colormodel=(green yellow red) name='a';
  vector x=x2 y=y2 / xorigin=x1 yorigin=y1 arrowheadshape=barbed;
  text x=xl y=yl text=label / textattrs=(size=9);
  gradlegend 'a' / title='';
  xaxis display=none;
  yaxis display=none;

In the data step for the graph, x and y are computed from R and Theta using the following formula.  You can see all the details in the program code linked below.

    x=r*cos(theta * PI / 180);
    y=r*sin(theta * PI / 180);

Wind_Graph_XYA simpler method would be to just plot the Wind Speed and and Wind Direction on the Y and X axes of a rectangular XY plot as suggested at the start of the article.  The result is shown on the right.  The data is exactly the same, but now we have plotted speed (R) on the Y axis and direction (Theta) on the X axis of the scatter plot.  See linked code below.

I would say the data is not as easy to understand in this presentation.  The feel for direction is lost, and also a discontinuity is created between 0 and 360.  A polar presentation is clearly more intuitive.

Note in the polar graph on top, I did not plot the indicators for the Wind Speed.  The SGPLOT procedure does not support a simple plot statement to draw circles to display the values for the speed along the radius.  Yes, we could plot the values along one axis, but that would not be so intuitive.  The circular grid lines can be drawn using  the SGANNOTATE "oval" function and that exercise is left to the motivated reader.

Wind_Graph_Polar_GTL2Instead, we can also make the same graph using GTL, which does support the ELLIPSEPARM statement that can be used to draw the circular grids.  Now, the circles provide the grid lines for the Wind Speed, and the values are displayed along one of the directional arrows.  Click on the graph for a higher resolution image.

Note, I have used the option ASPECT=1.0 to ensure that the display area is circular regardless of the size or aspect of the graph itself.

Full SAS 9.40 SGPLOT and GTL code:  Wind_Graph

Earlier, I had posted a similar method to Visualize the Temperature Data over Time.

Post a Comment

Bar Chart with Descending Response

Recently, I needed to view the list of products with the highest number of defects.  I have a data set of defects reported against various products.  The data set has over 30 products, and each observation contains the product name, name of the primary support person, and other relevant details of the defect.  My goal is to produce an uncluttered graph showing only the most significant information, and to also insert additional information (such as "Support" name) in the graph.

Product_1Here is a bar chart of the defects by product.  The graph on the right shows the number of defects by all the products in the data set as a horizontal bar chart, showing also the primary support person for the product.  Click on the graph for a higher resolution image.  The entire detailed data set is provided to the procedure, and the computation of the frequencies is done by the HBAR statement itself.

While we have achieved one goal (inserting additional information into the graph), there are too many bars with small defect counts cluttering up the graph.

Here is the SGPLOT code for the graph:

title "Open Defects for Product on &sysdate";
proc sgplot data=blog.product noborder;
  hbar product / datalabel categoryorder=respdesc datalabelfitpolicy=none;
  yaxistable product support / position=left location=inside;
  xaxis display=(nolabel noline noticks) grid;
  yaxis display=none valueattrs=(size=6) fitpolicy=none;

Note the bars are displayed by descending  defect counts.  As we can see, most of the defects are in the first few products, and there are too many other product names in the graph with very few defects that I do not need to see.  This also makes the graph harder to read.

In the code above, I have used the HBAR statement with the CATEGORYORDER=RESPDESC which creates this graph with the descending order of the bars.  I have also used the YAXISTABLE to display both the product name and the primary support person.

Product_2The full detailed data is provided to the procedure and the HBAR is computing the number of defects by product and arranging them by descending statistic  (frequency).  So, there is no way for me to specify to the graph that I want to see only the products with more than 10 defects, as shown in the graph on the right.  This would be a nice feature to add to the procedure at some point.

To create the graph shown on the right, I have to first compute the defect count for each product using PROC MEANS.  But, I also need to carry through the "Support" column for plotting.  Note the use of the ID variable in the code below.  This allows me to get the name of the primary support person along with the product name in the output data set.  I sort the data set by the count, and then I can display the graph.  I can either show only the first 10 observations, or I could show only the bars with count > N.

My data has no numeric variables.  The MEANS procedure apparently did not like this.  So, I added a constant column "X" (=10).  This seems to have overcome the problem.

/*--Add a dummy analysis variable--*/
data product;
  set blog.product;

/*--Compute frequencies by component, keep the primary support id--*/
proc means data=product noprint;
  class product;
  id support;
  var x;
  output out=freq(where=(_type_ > 0))

/*--Sort data by descending frequency--*/
proc sort data=freq;
  by descending n;

/*--Draw graph using HBARPARM with summarized data--*/
title "Open Defects for Product on &sysdate";
proc sgplot data=freq(where=(n>10)) noborder;
  hbarparm category=product response=n / datalabel ;
  yaxistable product support / position=left location=inside;
  xaxis display=(nolabel noline noticks) grid;
  yaxis display=none fitpolicy=none;

Full SAS 9.40 SGPLOT code:  Bar_Chart

Post a Comment

Scalable Turnip Graph

A Turnip Graph displays the distribution of an analysis variable.  The graph displays markers with the same (or close) y coordinate by displaying the markers spread out over the x-axis range in a symmetric pattern.  Recently, a question was posted on the SAS Communities page regarding such a graph.

TurnipScatterHere is an example of display of the distribution of the data using a Turnip Graph.  In this example, the markers are "Binned" on the y-axis.  All markers in each bin are displayed symmetrically in the x direction.  The data requires the list of observations with same y value which are automatically displayed as a row of markers using the SCATTER plot with the JITTER option.  Click on the graph for a higher resolution view.

SGPLOT code for Turnip Graph:

title 'Distribution of Cholesterol by DeathCause';
proc sgplot data=turnipScatter noautolegend;
  scatter x=deathcause y=y / jitter;
  xaxis display=(nolabel);

One shortcoming for the graph above is that it does not scale well for moderately large data.  The graph above was created for a data about 225 observations with 4 category values.  I have intentionally reduced the data so it works for the graph above.  The number of markers just barely fit the space available.  As the observation count or number of categories increase, this method does not continue to provide good results.  Other methods can be used to actually compute the (x, y) of each observation which requires much more work for a general solution.

TurnipPanelTextAn alternate way which is relatively easy to build to view the same data is shown on the right.  Instead of displaying each marker, the graph displays a "bin" that represents all the markers in the bin.  All bins in the graph are scaled by the count in each bin so it is easy to see the relative distribution of the data.  The observation count is displayed in the bin.  Click on the graph for a higher resolution view.

As you can see, this graph scales very well for all kinds of data, with small or large observation counts and for different number of categories on the x-axis.  To prepare the data, we run an SGPANEL graph with the HISTOGRAM statement using the SCALE=COUNT option and save the resulting data in a data set using the ODS OUTPUT statement.  This saves the bins and the number of observations in each bin by category.  We mirror the data by creating a "Min" column equal to the negative value of the "Count" column.

We use the SGPANEL Procedure with the HIGHLOW plot to display the distribution in a panel.  We use a TEXT plot to display the bin counts and we turn off the cell headers and use a TEXT plot to display the categories at the bottom to make this look like a single cell graph.  TEXT is better than an INSET since it can split the long values on white space.

SGPLOT code for the Scalable Turnip Graph:

title 'Distribution of Cholesterol by DeathCause';
proc sgpanel data=turnip noautolegend;
  panelby deathcause / novarname layout=columnlattice  columns=4 noborder noheader;
  highlow y=y low=min high=max / type=bar barwidth=1
                 fillattrs=(color=lightgray) lineattrs=(color=black);
  colaxis display=none;
  rowaxis min=0 offsetmin=0.15 display=(noticks noline nolabel) grid;
  TurnipPaneltext y=y x=zero text=max / strip textattrs=(size=5);
  text y=ylbl x=zero text=label / strip splitpolicy=split
          position=bottom contributeoffsets=none;

To view the relative distribution, bin counts are not really necessary.  Alternative visuals are shown below.  Full code for preparing the data and for creating the graph is linked below.  I am tempted to call this the "Spark-Plug Graph" or a "Spinning Top" graph.

ViolinPanelA "Violin Graph" can be created instead using the same data by using the BAND statement instead of HIGLOW.

Full code for Scalable Turnip Graph:  Turnip


Post a Comment

Infographics: Coin Stack Bar Chart

Often we see bar charts showing revenues or other related measures by a classifier using a visual of a stack of coins.  Such visuals are not strictly for the purposes of accurate magnitude comparisons, but more for providing an interesting visual to attract the attention of the reader.  In other words - Infographics.

Coins_India_Jitter_2I thought this would be a good exercise to see how we can do this using the SGPLOT procedure.  One such result is shown on the right.  Click on the graph for a higher resolution image.

I searched the web for some appropriate images of coins, anything with a perspective image of a coin that can be used to create a stack.  Then, I found a beautiful image of an antique "2-Annas" coin from British India.  The image of the coin has beautiful shine, good resolution, unusual shape and clear details that makes for nice stacks of coins as shown above.

BarChartThe default Bar Chart would look like the graph on the right.  While it accurately conveys the information clearly, in some instances it is a bit boring compared to the graph above.

The data for the graph is very simple as show on the right below the graph, and the program is shown below.

SGPLOT code for Bar Chart:

BarDatatitle 'Revenues (Millions) by Year';
proc sgplot data=Bar noborder noautolegend;
  vbar cat / response=resp fillattrs=graphdata1 dataskin=pressed
                    datalabel datalabelattrs=(size=12 weight=bold);
  xaxis display=(noticks noline nolabel) integer ;
  yaxis display=(nolabel noticks noline) min=0 integer grid;

Now, to create the graph of the pile of coins, we need to render each coin in the stack individually, using a SCATTER plot where the marker symbol is built from the image of the coin.  CoinsDataWe process the original data set, and generate an observation for each coin with increasing y value in the data.  Then, the default rendering order (which is data), the later (higher) coins will be drawn over the earlier coins, thus creating a stack.

The data generated for the coin stack is shown on the right.  Note, the response value is kept only once for each category value.

We use the SYMBOLIMAGE statement to define the symbol.  One must use a "transparent" image, where the pixels outside the coin part are transparent.  We also use the JITTER option so each coin is shifted along the x-axis a bit to simulate a "real" stack.  Else, the stack will be too straight.  This jitter option works best when the x-axis values are numeric.

SGPLOT code for Coin Stack Graph:

title 'Revenues (Millions) by Year';
proc sgplot data=coins noborder noautolegend;
  symbolimage name=Coin image="&Coin";
  scatter x=cat y=val / markerattrs=(symbol=Coin size=70) jitter jitterwidth=0.03;
  text x=cat y=resp text=resp / textattrs=(size=14 weight=bold color=white)
         strip position=top backlight=0.75;
  xaxis display=(noticks noline nolabel) integer offsetmin=0.15 offsetmax=0.15;
  yaxis display=none offsetmin=0.2 offsetmax=0.2;

Coins_India_2The value of the stack is displayed on the top of the coins.  Note use of "Backlight" option to generate a darker outline around the text, so it is visible on top of the light colored coins.  Any nice image of a coin can be used.  The number of coins drawn should depend on the "thickness" of the coin in the image.  The 2-Annas coin is thin, so we need more coins.  The graph on the right is without jitter, which creates even stacks.

Coins_Somali_RedThe Somalian Silver coin is "thicker", so we need less numbers as shown on the right.

Full SAS9.40M3 code:  CoinGraph

Post a Comment

Good Graph: Magnitude Comparisons

At the 2013 SAS Global Forum, I presented a paper titled "Make a Good Graph" which reviewed some of the features that make for a good graph.  This paper presents an aggregation of ideas from various sources, including some recommendations from thought leaders in the graphics arena such as Edward Tufte, William Cleveland and Naomi Robbins.

circlesamp (1)Recently, a question was posted on the SAS Communities site asking how to create the graph shown on the right using SAS.   This graph is showing sales figures by company (and peer) by region using a bubble plot.

There are two issues here:

  • How to make such a graph?
  • Should you make such a graph?

Sales_Bubble_3The answer to the first one is simple.  The SAS SGPLOT procedure supports the BUBBLE statement that can create a graph like the one shown above.

On the right is one I created for the simple data in the plot using SGPLOT procedure.  It is relatively easy to make and the generated visual is mostly like the one above, with a few differences.  A more exact match can be created, but I stopped here. Here is the code.

Bubble Plot Code:

ods graphics / reset width=4in height=1.75in noborder imagename='Sales_Bubble';
title 'Sales by Region';
proc sgplot data=sales noborder noautolegend;
bubble y=Group x=Category size=value / bradiusmax=25 bradiusmin=12
group=category dataskin=pressed datalabel=value
datalabelpos=center datalabelattrs=(color=white size=8 weight=bold);
yaxis display=(noline noticks nolabel) fitpolicy=split valueattrs=(size=8);
xaxis display=(nolabel noticks noline) valueattrs=(size=8);

Assuming the purpose of the graph is to better understand a company's sales vis-a-vis a peer, the second question becomes relevant.  Using the bubble plot, it is relatively hard to make accurate magnitudes comparisons of sales figures between the company and its peers without the help of the numbers in the bubble.

The visual shown above would not be the best one to facilitate accurate magnitude comparisons.  It has been shown by studies on the subject that using areas for comparison of magnitude is not very effective.  A better way for such a goal would be usage of linear line segments from a common baseline.  Also, it helps  to bring the items to be compared close to each other.

Sales_BarThe clustered bar chart on the right provides a better visual for magnitude comparisons of sales by region between company and its peer.  Putting the company and peer values adjacent allows for better comparisons which are clearly visible even without the numbers on the bars.

Bar Chart Code:

ods graphics / reset width=4in height=2.5in noborder imagename='Sales_Bar';
title 'Sales by Region';
proc sgplot data=sales noborder;
styleattrs datacolors=(darkgreen gold);
vbarparm category=Category response=value / group=Group
groupdisplay=cluster dataskin=pressed datalabel
datalabelattrs=(color=black size=8 weight=bold);
keylegend / title='';
yaxis display=(noline noticks nolabel) grid;
xaxis display=(nolabel noticks);

Linear distance from common baseline along with proximity of items to be compared create a better graph.  I am thinking it would be a good idea to have a thread for topics on how to create a "Good Graph".  A bit close to "Good Grief", made famous by Peanuts.  🙂

Full SAS 9.40M3 code:  Magnitude

Post a Comment

CTSPedia Clinical Graphs - Subgrouped Forest Plot

The advent of the AXISTABLE statement with SAS 9.4, has made it considerably easier to create graphs that include statistics aligned with x-axis values (Survival Plot) or with the y-axis (Forest Plot).  This statement was specifically designed to address such needs, and includes the options needed to control the text attributes of the data and also any indentations that may be needed.

In previous posts, I have described the use of these new statements, but it seems I did not provide a full program for the "Subgrouped Forest Plot", one of many popular clinical graphs.  Here we can use the YAXISTABLE available in SGPLOT for this graph

Subgroup_Forest_SG_94Here is the graph I created using the SGPLOT procedure.  Click on the graph to see a higher resolution image.  The details for the graph are as follows:

  • A Hazard Ratio plot in the middle.
  • Study names on the far left.  The study names are subgrouped, with label and values.  The labels have bolder font and the values are indented.
  • Number of patients with % on the left.
  • Event rates for PCI Group, Therapy Group and p-value on the right.
  • Note the use of Unicode arrow characters for the annotations on the axis created using the TEXT plot statement.  This is done using the ability to add Unicode values to a User Defined Format in SAS 9.4M3.
  • SG Annotation code is NOT used in this graph.

SAS 9.40M3 code:

title j=r h=7pt '4-Yr Cumulative Event Rate';
ods graphics / reset width=5in height=3in imagename='Subgroup_Forest_SG_94';
proc sgplot data=forest_subgroup_2 nowall noborder nocycleattrs dattrmap=attrmap noautolegend;
  format text $txt.;
  styleattrs axisextent=data;
  refline ref / lineattrs=(thickness=13 color=cxf0f0f7);
  highlow y=obsid low=low high=high;
  scatter y=obsid x=mean / markerattrs=(symbol=squarefilled);
  scatter y=obsid x=mean / markerattrs=(size=0) x2axis;
  refline 1 / axis=x;
  text x=xl y=obsid text=text / position=bottom contributeoffsets=none strip;
  yaxistable subgroup / location=inside position=left textgroup=id labelattrs=(size=7)
                      textgroupid=text indentweight=indentWt;
  yaxistable countpct / location=inside position=left labelattrs=(size=7) valueattrs=(size=7);
  yaxistable PCIGroup group pvalue / location=inside position=right pad=(right=15px)
                      labelattrs=(size=7) valueattrs=(size=7);
  yaxis reverse display=none colorbands=odd colorbandsattrs=(transparency=1) offsetmin=0.0;
  xaxis display=(nolabel) values=(0.0 0.5 1.0 1.5 2.0 2.5);
  x2axis label='Hazard Ratio' display=(noline noticks novalues) labelattrs=(size=8);

I have also added this code to the CTSPedia page for Subgrouped Forest Plot.

Full SAS 9.40M3 code:  Subgrouped_Forest_Plot_SG_94


Post a Comment