Infographics using SAS

Infographics is all the rage today.  Open any magazine or newspaper and we see data and numbers everywhere.  Often, such information is displayed by adding some graphical information to add context to the data.  A couple of good examples are Communicating numeric information, and Facts about Hot Dogs.

Infographics1ARiley Benson, our UX expert explained it this way.  The reason such infographics are used is to provide a memorable item associated with the numbers.  An aesthetic graphic invites the user to spend more time viewing the graphic and number, thus making it more memorable.  While the icon label is needed initially, frequent usage of the same icons makes them easier to recognize later without the label.

I was curious about how we could leverage GTL or SG procedures to make such graphs easier.  So, I created the graphic on the right for display of the % revenues from a particular sector, in this case "Utilities".  I found an image, and used PROC SGPLOT to create this "InfoGraphic"

Given a value to be displayed, an icon and the label, it is easy to create the display on the right.  The SAS 9.4 code is shown below.

title h=1 '2015 Software Revenue for Utilities';
proc sgplot data=infographics (where=(industry='Utilities'))
         pad=(left=20 right=20 bottom=20) noborder;
  symbolimage name=Utilities image="&file7";
  styleattrs backcolor=cxfaf3f0;
  scatter x=x y=y / group=industry dataskin=sheen
                markerattrs=(symbol=Utilities size=120);
  text x=xlbl y=ylbl text=value / textattrs=(size=24);
  text x=xnam y=ynam text=industry / textattrs=(size=12);
  xaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;
  yaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;
run;

In the code above, we have used the SYMBOLIMAGE statement to create a new marker symbol from the image icon for "Utilities", which is the "Bulb".  Additional columns are used for the "Value" and also the (x, y) location of the icon and the value.  In this example, I have customized the (x, y) locations for the icon and the value based on their arrangement.

Data1The data for the program is on the right.  The data includes the Value, Industry, Image File Name for the icon, the (x, y) location for the icon center, the value label and the industry name.  In the code above, only one industry (Utilities) is used.

The nice part of using the code is that you can change the relative layout of the icon, value and label easily.  Also, we can create a panel of the values by industry in any layout.

Infographics2The graph on the right is a 4 column layout of all the industries using the SGPANEL procedure, showing all the icons and values from the data set above.  I have used the SGPANEL procedure to arrange the layout.  The icons, values and labels all fall into place easily.

The SGPANEL program is shown below.  Note the use of ATTRPRIORITY=NONE on the ODS Graphics statement.  This is required in case a color priority style like HTMLBlue is active, because we want to cycle through all marker shapes per group.  Also note the use of SORT=DATA to ensure the panel classifiers are in the data order so the data and the layout are in sync.

Infographics3ods graphics / reset attrpriority=none noborder;
title '2015 Software Revenue by Industry';
proc sgpanel data=infographics pad=(left=20 right=20);
  panelby industry / noborder noheader spacing=20
                   onepanel columns=4 sort=data;
  symbolimage name=Banking image="&file1";
  symbolimage name=Government image="&file2";
  symbolimage name=Services image="&file3";
  symbolimage name=Insurance image="&file4";
  symbolimage name=LifeSciences image="&file5";
  symbolimage name=Retail image="&file6";
  symbolimage name=Utilities image="&file7";
  symbolimage name=Education image="&file8";
  styleattrs datasymbols=(Banking Government Services
                       Insurance LifeSciences Retail Utilities
                     Education) backcolor=cxfaf3f0;
  scatter x=x y=y / group=industry markerattrs=(size=120)
                dataskin=sheen;
  text x=xval y=yval text=value / textattrs=(size=14);
  text x=xnam y=ynam text=industry / textattrs=(size=7);
  colaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;
  rowaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;
run;

A 2x4 layout can easily be created by changing the panelby settings.

Full SAS 9.4 code:  Info_Graphics

Post a Comment

Legend Order

In the previous article on managing legends, I described the way to include items in a legend that may not exist in the data.  This is done by defining a Discrete Attribute Map, and then requesting that all the values defined in the map should be displayed in the legend.

AE_4In the graph on the right, the data contains only Severity values of "Mild" and "Moderate".  However, since three values are defined in the attribute map, and "Show" column is set to "Attrmap", all values for the group are displayed.  That causes the value "Severe" to be displayed in the legend, even though there is no observation in the data with this severity.

Another useful (and intentional) result is the legend items are displayed in the order they are defined in the discrete attribute map as it allows you to control the order of the items in the legend.  This feature is also useful to addresses an issue that a user was grappling with recently as described below.

The order of the items in the legend is based on the order the group values are encountered in the data.  Legend values can be sorted in alphabetical order, but if you want a custom order, you can use the attribute map as discussed below.

Bar_1The graph on the right shows the stacked cumulative counts for the cars by Type and Origin.  The legend inside is intentionally set with one column to make it easier to associate the colors with the stacking order.  However, the order in the legend is the reverse of the order in the graph.

I can change the stacking order by setting the GROUPORDER option to "ReverseData".  However, the order in the legend also reverses, thus keeping the legend order out of sync with the bar order.

Bar_4The way to address this is to use the Discrete Attr Map, and provide the group values and the corresponding colors in the order you want.  Now, the legend item values will be displayed in the order of the values defined in the attr map.  Note the items in the legend in the graph now are in the same order as the bar segments.

Note also in the Attr Map, we have not used actual fill color, like in the first case, but instead we have used the style elements.  This can be done by using the FillStyleElement column name instead of the FillColor column name.

CarsShowAll_1The view of the Attribute Map for the the graph above is shown on the right.  The code for the attr map and the graph is shown below.

data CarsShowAll;
  retain Id 'Origin' Show 'Attrmap';
  length Value $10 FillStyleElement $15;
  input value $ FillStyleElement $;
  datalines;
USA       graphdata3
Europe graphdata2
Asia       graphdata1
;
  run;

title "Counts by Type and Origin";
proc sgplot data=sashelp.cars dattrmap=carsShowAll nowall noborder;
  vbar type / group=origin dataskin=gloss filltype=gradient
                         baselineattrs=(thickness=0) attrid=Origin;;
  keylegend / location=inside across=1 position=topright opaque
                           fillheight=12px fillaspect=golden;
  xaxis display=(noline noticks nolabel);
  yaxis display=(noline nolabel noticks) grid;
run;

Full SAS 9.4 Code:  Legend_Order

Post a Comment

Legendary

Entries in a legend are populated automatically based on the data.  When creating a graph with group classification,  the display attributes for each bar are derived from the GraphData1-12 style elements from the active style.

AEThe graph on the right shows you the result of creating an adverse event timeline by AE and Severity.  The data contains four AE names with two severity values.  The severity values are assigned the display attributes from GraphData1 and GraphData2, which for the HTMLBlue style are blue and red.

Now, if the data for today arrives in a different group order, the assignment may change, so it is hard to ensure that the color assignments are consistent.

AE_2This can be addressed by using the Discrete Attribute Map as shown in the graph on the right.  Here we have defined a Discrete Attribute Map where the display attribute for each group value is defined in a data set like a format.

Now, the display attributes such as color or marker symbol for each group are obtained from the attribute map by the value of the group.

AttrMapThe Discrete Attribute Map is a data set with predefined column names as shown on the right.  Multiple maps can be defined in a data set by "ID".  Here we have defined only one map, with ID=Severity.  Three levels are defined, "Mild", "Moderate" and "Severe".  Now, the colors for the each group are well defined, and will remain consistent regardless of the position of the observation in the data.

AE_3Note however, in the graph above, only two of the three defined values are displayed in the legend.  This is normal, and only the values in the data are displayed. However, often we additional classifications for the data that may or may not be in the data at any one time, but we may want to display all the "possible" values for the classification variable in the legend as shown in the graph on the right.  In this graph the legend item for "Severe" with a red color swatch is included in the legend, even though there is no observation in the data set with a group value of "Severe"

AttrMapShowAllWith SAS9.4M3 release, this is easily done by requesting that all the levels for a particular attribute id in a Discrete Attribute Map be shown in the legend.  Note the column "Show" with value of "Attrmap".  This instructs the system to display all values for this AttrId that are marked as "Attrmap".   Note, this is also a great way to populate the legend with other items you may need that are not in the data.

AE_4Another noteworthy feature released with SAS 9.4 is the ability to control the size of the legend items.  When skins are in effect, or with fill patterns, or just because you want it so, it is often desirable if the color swatches in the legend could be made bigger.  This can be done using the new FILLHEIGHT and FILLASPECT options.

title "Adverse Event Timeline Graph by Day";
proc sgplot data=ae dattrmap=attrmapShowAll;
  highlow y=ae low=low high=high / type=bar group=severity
                   dataskin=pressed barwidth=0.8 lowlabel=label
                   attrid=Severity labelattrs=(color=black size=9);
   refline 0 / axis=x;
  xaxis display=(nolabel) values=(0 to 96 by 2);
  yaxis display=(noticks novalues);
  keylegend / fillheight=12px fillaspect=golden;
run;

Full SAS 9.4 Code: Legend 

Post a Comment

Easy Box Plot with Multiple Connect Lines

Last month I wrote an article on connecting multiple statistics by category in a box plot using SGPLOT.  In the first article I described the way you can do this using overlaid SERIES on a VBOX using SAS 9.4, which allows such a combination.  However, if you have SAS 9.3, I described how you can do this using annotation.

Recently a question was posted on the SAS communities site for SAS/GRAPH and ODS Graphics, a question was posted on how to do this when using a BY variable.  That got me thinking on whether there could be an easier way.  Turns out there is.

Note, I changed the examples as connect makes more sense when x axis is numeric.  The data is not important.

 

BoxConnect_1Prior to SAS 9.4, the SGPLOT procedure limits the combination of some plot types.  While "Basic" plots can be layered in any combination, Category plots (VBAR, VLINE) or Distribution plots (VBOX, Histogram) could only be combined with other plots of the same type.  So, a SERIES plot could not be combined with a VBOX.

However, we are allowed to combine multiple VBOX plots since SAS 9.2 and CONNECT is available since SAS 9.3.  So, the idea here is to overlay multiple VBOX statements, each with a different CONNECT option.  This works just fine as shown above.  The only trick is to make sure that only the first VBOX uses the FILL option (default) while all the others use NOFILL.

title 'Distribution of Value by Week';
proc sgplot data=ValueByWeek nocycleattrs noautolegend;
  vbox value / category=week connect=q1;
  vbox value / category=week nofill connect=q3;
  xaxis display=(nolabel);
run;
BoxConnectPanel_1

What could be simpler than this approach?  The additional benefit is that one can easily create a panel of such graphs.

proc sgpanel data=ValueByWeek nocycleattrs noautolegend;
  panelby location/ layout=panel columns=1;
  vbox value / category=week connect=q1;
  vbox value / category=week nofill connect=q3;
  colaxis display=(nolabel);
run;

Further more, this also works when using BY variable processing.  Now, the procedure correctly pages the graph by the BY variable, and each graph has the correct connect lines.  No need to figure out the data needed for the overlaid SERIES plot, or the annotate data set.


BoxConnectBy_1

title 'Distribution of Value by Week';
proc sgplot data=ValueByWeek nocycleattrs noautolegend;
  by location;
  vbox value  / category=week connect=q1;
  vbox value / category=week nofill connect=q3;
  xaxis display=(nolabel);
run;

Full SAS 9.3 SGPLOT code:  Box_Connect_Numeric

Post a Comment

Fit Plot Customizations

A customer wants to use PROC REG to fit a simple regression model but display in the fit plot markers that differentiate groups of individuals.

Click on a graph to enlarge.

wfkfit4
Before we see how to do that, let's look at some simpler examples.

The following step fits a linear regression model and displays an ordinary fit plot:

proc sgplot data=sashelp.class;
   title 'Simple Linear Regression Fit Plot -- PROC SGPLOT';
   reg y=weight x=height / cli clm;
run;

The CLI option produces prediction limits and the CLM option produces confidence limits.

wfkfit
The following steps fit the same model, but males are displayed as filled squares and females are displayed as filled circles:

ods graphics on / attrpriority=none;
 
proc format;
   value $sex 'M' = 'Male' 'F' = 'Female';
run;
 
proc sgplot data=sashelp.class;
   title 'Simple Regression but with a Classification Variable Displayed -- PROC SGPLOT';
   styleattrs datasymbols=(squarefilled circlefilled);
   reg y=weight x=height / cli clm nomarkers;
   scatter y=weight x=height / group=sex  name='scatter';
   keylegend 'scatter' / location=inside across=1 position=topleft;
   format sex $sex.;
run;

wfkfit1

These examples all use the HTMLBlue style, which is an ATTRPRIORITY=COLOR style. The ATTRPRIORITY=NONE option enables marker differences to be displayed as well as color differences. The $SEX format provides meaningful labels in the legend. The STYLEATTRS statement creates the custom markers. The NOMARKERS option suppresses the markers from being displayed by the REG statement. Instead, they are displayed by the SCATTER statement, which uses the GROUP=SEX option to distinguish the groups. The KEYLEGEND statement displays a legend inside the graph.

While this is a nice graph and it is easy to make, the customer specifically wanted PROC REG, because PROC REG displays a table of statistics along with the fit plot. The following step illustrates:

proc reg data=sashelp.class;
   model  weight = height;
quit;

wfkfit2

PROC REG will not use the classification variable SEX in the graph without a template change. However before you can proceed, you need to see if the SEX variable is available in the data object that underlies the graph. The following step outputs the data object to a SAS data set:

proc reg data=sashelp.class;
   ods select fitplot;
   ods output fitplot=fp;
   model weight = height;
   id sex;
quit;

If you print the data set, you will see that the SEX variable is in the output data set, but it is named ID1. In fact, not one of the original variable names is present in the output data set. This is because analytical procedures need to have precise control the data object column names so that the templates will work with the wide variety of models that people specify.

We will use a DATA step and CALL EXECUTE to modify the graph template for the fit plot. There are other ways to modify a template, but the DATA step provides a parsimonious way to show small changes to large templates. You cannot write template modification code like the DATA step below without first looking at the template. The following step writes the fit plot template to a file called temp.tmp:

proc template;
   source Stat.REG.Graphics.Fit / file='temp.tmp';
quit;

The following step reads the template, adds a PROC TEMPLATE statement, drops the MARKERATTRS= option from the SCATTERPLOT statement, and adds the GROUP=ID1 option. It also adds options to the BEGINGRAPH statment to control the markers:

options source;
data _null_;
   infile 'temp.tmp';
   input;
   if _n_ = 1 then call execute('proc template;');
   if left(_infile_) =: 'SCATTERPLOT y=DEPVAR' then do;
      _infile_ = tranwrd(_infile_, 'markerattrs=GRAPHDATADEFAULT', ' ');
      _infile_ = tranwrd(_infile_, '/', '/ group=id1 ');
      end;
   if left(_infile_) =: 'BeginGraph' then
      _infile_ = 'BeginGraph / attrpriority=none' ||
                 ' datasymbols=(squarefilled circlefilled);';
   call execute(_infile_);
run;

Other statements are executed as is. The OPTIONS SOURCE statement is not required. It shows the code that is generated by CALL EXECUTE, so it can help you understand what is happening when things do not work.

The following step uses the modified template:

proc reg data=sashelp.class;
   ods select fitplot;
   model weight=height;
   id sex;
quit;

wfkfit3

You can also add a NAME= option to the SCATTERPLOT statement and a DISCRETETELEGEND statement after the SCATTERPLOT statement to display the values of SEX in a legend:

data _null_;
   infile 'temp.tmp';
   input;
   if _n_ = 1 then call execute('proc template;');
   if left(_infile_) =: 'SCATTERPLOT y=DEPVAR' then do;
      _infile_ = tranwrd(_infile_, 'markerattrs=GRAPHDATADEFAULT', ' ');
      _infile_ = tranwrd(_infile_, '/', '/ group=id1 name="sc"');
      end;
   if left(_infile_) =: 'BeginGraph' then
      _infile_ = 'BeginGraph / attrpriority=none' ||
                 ' datasymbols=(squarefilled circlefilled);';
   call execute(_infile_);
   if left(_infile_) =: 'SCATTERPLOT y=DEPVAR' then
   call execute('discretelegend "sc" / location=inside across=1 autoalign=(topleft);');
run;
 
proc reg data=sashelp.class;
   ods select fitplot;
   model weight=height;
   id sex;
   format sex $sex.;
quit;

wfkfit4

The DATA _NULL_ step reads the same (unmodified) temp.tmp file and creates a new template modification.

The following step deletes the modified template:

proc template;
   delete Stat.REG.Graphics.Fit / store=sasuser.templat;
quit;

This all works because the SEX variable appears in the data object when it is specified in the ID statement. It appears in the data object so that it can appear in HTML tooltips. What if it had not been there? The next part of the example shows how you can output the data object, modify it (that is, merge in the SEX variable), and create the desired graph with PROC SGRENDER. The PROC SGRENDER step uses the modified data object, the modified graph template, and the style template, but it needs one more thing: dynamic variables. Procedures set dynamic variables that control many aspects of the graphs and contain other values such as the statistics that are displayed in the table.

The following step captures the graph, including the dynamic variables and their values, in an ODS document. It also captures the data object in a SAS data set:

ods document name=MyDoc (write);
proc reg data=sashelp.class;
   title 'Not Shown';
   ods select fitplot;
   ods output fitplot=fp;
   model weight=height;
quit;
ods document close;

The following step lists the contents of the ODS document:

proc document name=MyDoc;
   list / levels=all;
quit;

wfkdoc1
You need to copy the path of the graph from the LIST statement output into the OBDYNAM statement.

The following step creates a SAS data set that contains the values of the dynamic variables:

proc document name=MyDoc;
   ods exclude dynamics;
   ods output dynamics=dynamics;
   obdynam \Reg#1\MODEL1#1\ObswiseStats#1\Weight#1\FitPlot#1;
quit;

The following step displays the data set of dynamic variables (some of which are shown):

proc print; 
run;

wfkdoc4

The following step merges the SEX variable into the output data set made from the data object:

data both(drop=height weight rename=(sex=id1));
   merge sashelp.class(keep=height weight sex) fp;
   if height ne _indepvar1 or weight ne depvar then put _all_;
   format sex $sex.;
run;

The SEX variable is renamed ID1 so that it can work with the same template as before. You cannot rely on a merge operation being as simple as the one shown here. Data sets made from graph data objects can vary from input data sets in many ways. An IF statement is added to check the merge only to emphasize that you need to carefully combine data from separate sources and always check your results.

The following step modifies the template (as before):

data _null_;
   infile 'temp.tmp';
   input;
   if _n_ = 1 then call execute('proc template;');
   if left(_infile_) =: 'SCATTERPLOT y=DEPVAR' then do;
      _infile_ = tranwrd(_infile_, 'markerattrs=GRAPHDATADEFAULT', ' ');
      _infile_ = tranwrd(_infile_, '/', '/ group=id1 name="sc"');
      end;
   if left(_infile_) =: 'BeginGraph' then
      _infile_ = 'BeginGraph / attrpriority=none' ||
                 ' datasymbols=(squarefilled circlefilled);';
   call execute(_infile_);
   if left(_infile_) =: 'SCATTERPLOT y=DEPVAR' then
   call execute('discretelegend "sc" / location=inside across=1 autoalign=(topleft);');
run;

The following step uses CALL EXECUTE to run PROC SGRENDER along with a DYNAMIC statement that provides the value of each of the dynamic variables:

data _null_;
   set dynamics(where=(label1 ne '___NOBS___')) end=eof;
   if nmiss(nvalue1) and cvalue1 = '.' then cvalue1 = ' ';
   if _n_ = 1 then do;
      call execute('proc sgrender data=both');
      call execute('template=Stat.REG.Graphics.Fit;');
      call execute('dynamic');
   end;
   if cvalue1 ne ' ' then
      call execute(catx(' ', label1, '=',
                   ifc(n(nvalue1), cvalue1, quote(trim(cvalue1)))));
   if eof then call execute('; run;');
run;

wfkfit6

The DATA _NULL_ step with the CALL EXECUTE statements generate the following DYNAMIC statement:

dynamic _SHOWCLM = 1 _SHOWCLI = 1 _WEIGHT = 0 _SHOWSTATS = 1 _NSTATSCOLS = 2
   _SHOWNOBS = 1 _NOBS = 19 _SHOWTOTFREQ = 0 _TOTFREQ = 19 _SHOWNPARM = 1 
   _NPARM = 2 _SHOWEDF = 1 _EDF = 17 _SHOWMSE = 1 _MSE = 126.02868962 
   _SHOWRSQUARE = 1 _RSQUARE = 0.7705068427 _SHOWADJRSQ = 1 
   _ADJRSQ = 0.7570072452 _SHOWSSE = 0 _SSE = 2142.4877235 _SHOWDEPMEAN = 0
   _DEPMEAN = 100.02631579 _SHOWCV = 0 _CV = 11.223296526 _SHOWAIC = 0 
   _AIC = 93.780394884 _SHOWBIC = 0 _BIC = 96.223301459 _SHOWCP = 0 _CP = 2
   _SHOWGMSEP = 0 _GMSEP = 140.9531397 _SHOWJP = 0 _JP = 139.29486747 
   _SHOWPC = 0 _PC = 0.2834915472 _SHOWSBC = 0 _SBC = 95.669272843 _SHOWSP = 0 
   _SP = 7.876793101 _TITLE = "Fit Plot" _DEPNAME = "Weight" _DEPLABEL = "Weight"
   _SHORTYLABEL = "Weight" _SHORTXLABEL = "Height" _CONFLIMITS = "95% Confidence
   Limits" _PREDLIMITS = "95% Prediction Limits" _XVAR = "_INDEPVAR1";

The following step deletes the modified template:

proc template;
   delete Stat.REG.Graphics.Fit / store=sasuser.templat;
quit;

You can process the data set of dynamic variables and create a similar graph using PROC SGPLOT:

data _null_;
   length s $ 500;
   retain s;
   set dynamics(keep=label1 nvalue1) end=eof;
   if label1 = '_NOBS'    then l = 'Observations';
   if label1 = '_NPARM'   then l = 'Parameters';
   if label1 = '_EDF'     then l = 'Error DF';
   if label1 = '_MSE'     then l = 'MSE';
   if label1 = '_RSQUARE' then l = 'R-Square';
   if label1 = '_ADJRSQ'  then l = 'Adj R-Square';
   if l ne ' ' then s = catx(' ', s, quote(l), '=', quote(put(nvalue1, best6.)));
   if eof then call symputx('insets', s);
run;
 
%put &insets;
 
proc sgplot data=sashelp.class;
   title 'PROC SGPLOT with an Inset Table';
   styleattrs datasymbols=(squarefilled circlefilled);
   reg y=weight x=height / cli clm nomarkers;
   scatter y=weight x=height / group=sex  name='scatter';
   keylegend 'scatter' / location=inside across=1 position=topleft;
   inset (&insets) / position=bottomright border;
   format sex $sex.;
run;

wfk2fit

The DATA step generates the following list of insets:

"Observations" = "    19" "Parameters  " = "     2" "Error DF    " = "    17" 
"MSE         " = "126.03" "R-Square    " = "0.7705" "Adj R-Square" = " 0.757"

ODS Graphics provides you with ways to make simple graphs and customize every aspect of them. While not shown in this example, you can also annotate graphs and modify dynamic variables. For more information about SG annotation and the techniques shown in this blog, see the free book Advanced ODS Graphics Examples

Post a Comment

CandleStick Chart with SAS 9.2

Let us start the new year by taking a trip back in history to SAS 9.2, first released in 2008, and the first SAS release that included the new ODS Graphics software including GTL and SG procedures.  While we have recently released the third maintenance on SAS 9.4 (SAS 9.40M3), many of you are using various maintenance releases of SAS 9.3, and some are still using SAS 9.2.

One such SAS 9.2 user recently saw my post on creating a CandleStick Chart using SAS 9.3  which included a new plot type called the HighLow plot.  This is a versatile plot that can not only handle the "Candlestick" chart commonly used in the financial domain, but is also useful to create many different graphs as you can see in other articles in this blog..  This user wanted to create a similar chart using SAS 9.2.

I first sent them a program to create a High-Low-Close type graph using the GPLOT procedure, but user wanted something similar the the graph shown the linked article.

Stock Plot_GTL_92_aWhile I could not think of a way to create such a graph using SGPLOT, it was possible, with some effort to create one using GTL.  The graph on the right is created using the GTL BoxPlotParm statement.  This statement was originally added to provide the user a way to plot a custom box plot, where the values for the various features of the box are computed by the user.  So, the data set provided by user needs to contain the various statistics like "High", "Low", "Q1", "Q3" and so on for each value of the category.

DataThe data set would look like the table on the right.  In this example, for each value of Date, we have 4 observations, one for each named statistic.  Here we have the "Min", "Max", "Q1" and "Q3" values computed for each value of Date.  The column names can be anything, but the "Stat" values must have the text strings shown.

In my case, I used a data step to compute these values.  The Q1-Q3 range is represented by the "Open" and "Close" value of the stock, and the "Low" and "High" are the low and high values for the stock for that day.

proc sort data=sashelp.stocks
        (where=(stock='IBM' and date > '01Jan2004'd))
        out=ibm;
 by date;
run;

data boxParm;
  length Group $4;
  format DateUp DateDn date7.;
  keep Date DateUp DateDn Stat Value Group Close2;
  set ibm;

  Stat='Min'; Value=low; output;
  Stat='Max'; Value=high; output;
  Stat='Q1'; Value=min(open, close); output;
  Stat='Q3'; Value=max(open, close); Close2=close; output;
run;

Now, we create a template using the BoxPlotParm statement for the graph.  Note we have also superposed a Series plot to connect the "Close" value for each day.

/*--Template for OHLC plot--*/
proc template;
  define statgraph OHLC;
    begingraph;
      entrytitle 'Stock Chart for IBM';
      layout overlay / xaxisopts=(display=(ticks tickvalues line)
                                        discreteopts=(tickvaluefitpolicy=thin));
        boxplotparm x=date y=value stat=stat;
        seriesplot x=date y=close2 / lineattrs=(color=gray);
      endlayout;
    endgraph;
end;
run;

/*--OHLC plot--*/
proc sgrender data=boxParm template=OHLC;
run;

User also wanted to see the boxes colored by whether the price was up or down.  This would be easy with a GROUP option for the BoxPlotParm.  Unfortunately, the SAS 9.2 release does not support a Group option.  However, the saving grace was that this was not a real group, where there could be one or more group per category.  Instead it is really a single colored box by the group classifier.

Stock Plot_Group_GTL_92With some creative coding we can achieve this result.  Can you guess how I might have done this?

What I have done is displayed all the boxes by date using the green color.  Then, I have overdrawn only the boxes for the days the stock value was down.  This causes only some of the green boxes to be hidden by the red ones.  See the program for the full data step and code.

/*--Template for OHLC plot by group--*/
proc template;
  define statgraph OHLC_Grp;
    begingraph;
      entrytitle 'Stock Chart for IBM';
      layout overlay / xaxisopts=(display=(ticks tickvalues line)
                                       discreteopts=(tickvaluefitpolicy=thin));
        boxplotparm x=date y=value stat=stat / fillattrs=graphdata2
                                   name='All' legendlabel='Up';
        boxplotparm x=datedn y=value stat=stat / fillattrs=graphdata1
                                  name='Dn' legendlabel='Down';;
        seriesplot x=date y=close2 / lineattrs=(color=gray);
        discretelegend 'All' 'Dn';
      endlayout;
    endgraph;
  end;
run;

SAS 9.2 Code for CandleStick Chart:  Stock_Plot_92

Post a Comment

Dual Axis Graph with Zero Equalization

An interesting question came up recently, where a colleague wanted to create a bar line chart with Revenue on the Y axis and Profit on the Y2 axis.  The Revenues were all positive, but the Profit had positive and negative values.

PNL_DataSome data I generated is shown on the right.  Creating this bar chart is very simple, using the code shown below.  Since there is only one value per month, I used the BarChartParm, with an overlay the Series plot.  Some options are trimmed.  See full code link below.

title 'Revenues and Profit by Month for 2014';
proc sgplot data=pnl;
  vbarparm category=date response=revenue;
  series x=date y=profit / y2axis ;
  xaxis display=(nolabel);
run;

Bar_LineThe resulting graph is shown on the right.  Click on the graph for a detailed view.  Note, the "Revenue" is displayed on the Y  (left) axis as a bar chart and has positive values.  The "Profit" is displayed on the Y2 axis as a series plot, and has a range from -24% to +64%.   The two axes are independent, and the zero values are not aligned.  The zero value on the Y2 axis is floating in the middle, and so one has to examine the graph carefully to understand the data.

We thought it would add a lot to the readability of the graph if the zero values on both axes would be aligned.  In the graph below, the zero value for both the "Revenue" and "Profit" are aligned, thus making it much easier to understand / interpret this graph.

Bar_Line_ZeroEqThe way to do this is to evaluate the range of the data for each variable, and then computed a value for Y1Min based on the negative to positive ratio of the Profit data.   Then, I applied the min and max values to the Y1 axis.

I kept my case simple to demonstrate the idea.  More code may be needed if both variables have negative to positive range, so the min and max values are computed for both axes keeping their negative-positive proportions equal, and thus equalizing the zero line.  If a VBAR is used, then you will need to work with the summarized values.

title 'Revenues and Profit by Month for 2014';
proc sgplot data=pnl;
  vbarparm category=date response=revenue / fillattrs=graphdata1
                        filltype=gradient name='a' baselineattrs=(thickness=0);
  series x=date y=profit / y2axis lineattrs=graphdata2(thickness=3);
  xaxis display=(nolabel);
  yaxis min=&y1Min max=&y1Max;
  keylegend / linelength=24;
run;

Finally, I also want to add grid lines for the Y axis.  Note, in the graph above, even though the zero line is equalized on both axes, the number of tick intervals on each axis are not equal.  Y axis has 6 tick values while the Y2 axis has 5.  Drawing grid lines will mean the grid lines will not be aligned and this will cause some confusion.

Bar_Line_ZeroEq_Grid_4The way to fix this is to ensure that both axes have the same number of tick values.  The graph on the right makes both Y and Y2 axes with same number of tick values, while still keeping the negative to positive ratio about equal.

This produces a good graph, with equalized zero line, and equalized grid lines as shown on the right.  I also underlaid a white bar chart to make the grid lines not show through the gradient bars.  Click on the graph for a higher resolution image.

title 'Revenues and Profit by Month for 2014';
proc sgplot data=pnl;
  vbarparm category=date response=revenue / fillattrs=(color=white)
                     baselineattrs=(thickness=0);
  vbarparm category=date response=revenue / fillattrs=graphdata1
                     filltype=gradient name='a' baselineattrs=(thickness=0);
  series x=date y=profit / y2axis lineattrs=(color=maroon thickness=3);
  xaxis display=(nolabel);
  yaxis values=(-70 to 210 by 70) grid;
  y2axis values=(-0.25 to 0.75 by 0.25);
  keylegend / linelength=24;
run;

In the graph above, I hard coded the Y and Y2 axis ranges and values to convey the idea.  I am sure some creative coding can achieve the effect programmatically.  We realize this is could be an important feature and we are debating to add options to the procedure to do this automatically.  If this seems of interest to your use cases, please chime in.

Full SAS Code:  Dual_Y_Axes_2

Post a Comment

Box Plot with Stat Table and Markers

A Box Plot is very popular to view the distribution of an analysis variable with one or more classifiers.  Also, everyone wants to customize the graph in different ways.  One recent request was for creating a box plot by category and group along with the display of various statistics and overlaid markers using the SGPLOT procedure.

One of the key strengths of the SGPLOT procedure is its ability to layer multiple basic plots to create graphs with more information.  However, previously the VBox statement could only be used with very few other statements with compatible roles.  The good news is that with SAS 9.40M1, the VBox statement can now be layered with other basic plots and the new Axis Table statements to display more information in the graph.  Let us see some examples

The GTL BoxPlot statement provides the DisplayStats option to display the numeric values for various computed statistics for each category in the plot such as Q1, Q3, STD and more.  The SGPLOT VBOX statement does not support such an option.  However, we use the Axis Table statement to include such statistics as shown below.

First we need to compute the statistics we want to show.  Clearly, the above statistics like Q1, Q3 and STD are already being computed to draw the box plot anyway.  So, we can use the SGPLOT program with the ODS OUTPUT statement to save out these statistics to a data set.  Then, we can use the Axis Table to display the data.

BoxPlot

The first step is to create the basic box plot we want as shown on the right.  The code for this is shown below, including the ODS Output statement that will save the generated data into the provided output data set.

ods output sgplot=sgplotdata;
  proc sgplot data=sashelp.heart;
  vbox cholesterol / category=deathcause;
run;

The SGPlot procedure computes the various statistics needed to draw the box plot, and these are saved into the SGPlotData data set.  We can examine this data set and see that additional columns are created by Category (and Group) for each statistic and its value.  Names like "BOX_CHOLESTEROL_X_DEATHCAUSE___Y" are used.  I have renamed these variables to "Value", "Cat" and "Stat".

BoxStatOutNow, we keep only the statistics we wish to display (Q1, Q3 and STD), and merge this data with the original data.

data merged;
  set sashelp.heart 
         sgplotdata(where=(value ne . and
                                stat in ('Q1' 'Q3' 'STD')));
run;

Now we use this merged data set to draw the box plot and the axis table of the statistic values.

proc sgplot data=merged;
  vbox cholesterol / category=deathcause;
  xaxistable value / x=cat class=stat;
  xaxis display=(nolabel);
run;

BoxStatInBy default, the axis table is drawn below the X axis.  However, the table can also be drawn above the X axis for better readability as shown on the right.  The axis table displays the value of each statistic by the category.  We use CLASS=Stat to stack each statistic as shown on the right.  This is the default behavior for CLASSDISPLAY.

proc sgplot data=merged;
  vbox cholesterol / category=deathcause;
  xaxistable value / x=cat class=stat
                        location=inside;
  xaxis display=(nolabel);
run;

As you can see from the code above, it is really very easy to add the display of any of the standard box plot statistics that are computed by the plot itself.  These include MIN, Q1, MEDIAN, Q3, MAX, MEAN, STD, N, DATAMIN and DATAMAX.

BoxStatScatterOften it is desirable to view the actual distribution of the data in addition to the box statistics.  This can be done by layering a scatter plot on the box plot.  This kind of overlay is now supported with SAS 9.40M1.

Note, we have assigned a filled symbol with high transparency value to overlay the scatter markers.  We have also turned off the display of the outliers, and used a unfilled box.

proc sgplot data=merged;
  vbox cholesterol / category=deathcause nooutliers     nofill;
  scatter x=deathcause y=cholesterol / jitter 
                markerattrs=(symbol=circlefilled size=5) transparency=0.95;
  xaxistable value / x=cat class=stat location=inside;
  xaxis display=(nolabel);
run;

BoxStatScatterGroup2Finally, we would like to display this entire graph by a group variable 'Sex'.   Now, the VBOX and the SCATTER use the GROUP role.  The Axis Table uses 'Sex' as the class variable, and ClassDisplay=Cluster to place the values side by side per category.

Since the Axis table supports only one Class variable, we cannot display the data both by the Sex and Stat.  In the example on the right, I have displayed only the data for "STD" by category and group.  Note use of GroupOrder and CategoryOrder on all plots.

proc sgplot data=mergedGroup;
  label value='STD';
  format value 5.2;
  vbox cholesterol / category=deathcause group=sex nooutliers
             nofill grouporder=ascending name='a';
  scatter x=deathcause y=cholesterol / group=sex groupdisplay=cluster
                grouporder=ascending jitter markerattrs=(symbol=circlefilled size=5)
               transparency=0.95 clusterwidth=0.7;
  xaxistable value / x=cat class=grp classdisplay=cluster colorgroup=grp location=inside classorder=ascending;
  xaxis display=(nolabel);
  keylegend 'a' / linelength=24;
run;

BoxStatScatterGroupMulti2But, what if we do want to display multiple statistics by category and group.  Yes, we can do that as shown on the right.  Now, we have displayed the values for 'Q1', 'Q3' and 'STD' by category and group.  To do this, we have to break up the data, so each statistic has a column of its own.

Now, we can use three separate Axis Table statements to display each statistic by category and group.  Note also the use of ColorGroup=sex.  This colors the statistic value by the group variable to match the box and the scatter.

proc sgplot data=mergedGroup2;
  label value='STD';
  format std 5.2 q1 q3 4.1;
  vbox cholesterol / category=deathcause group=sex nooutliers
            nofill grouporder=ascending name='a';
  scatter x=deathcause y=cholesterol / group=sex groupdisplay=cluster
                 grouporder=ascending jitter
                markerattrs=(symbol=circlefilled size=5)
               transparency=0.95 clusterwidth=0.7;
  xaxistable q1 / x=cat class=grp classdisplay=cluster colorgroup=grp location=inside classorder=ascending;
  xaxistable q3 / x=cat class=grp classdisplay=cluster colorgroup=grp location=inside classorder=ascending;
  xaxistable std / x=cat class=grp classdisplay=cluster colorgroup=grp location=inside classorder=ascending;
  xaxis display=(nolabel);
  keylegend 'a' / linelength=24;
run;

Full SAS 9.40M3 program:  Box_Stat_Scatter

Post a Comment

Boxplot with Connect using Annotate

In the previous article I described a way to create a box plot with multiple connect lines using SAS 9.40M1 or later release .  I created the graph using SGPLOT with VBOX and overlaid SERIES statements.  Such an overlay of a basic plot on the VBOX statement is supported starting with SAS 9.40M1.

Box_Group_Multi_Connect_Anno_93If you have a SAS 9.3, you can create the same graph using SGAnnotate as shown on the right.  In this case, we have displayed the colored boxes by using a VBOX statement with category and group roles set to "deathcause".  As mentioned in the previous article, connect line cannot be drawn since each box belongs to a different group.

Note also I have abbreviated the long death cause names to avoid long rotated axis tick values, since SAS 9.3 does not have support for splitting the long tick values.

StatisticsWe run the MEANS procedure on the data to compute the mean and median values by "deathcause" as shown on the right.  Now, instead of merging this data with the original data set, we need to create an annotation data set to define two "Polylines", one for mean * deathcause and one for median * death cause, and also the instructions needed to create the legend.

First, I transpose the data from a three column format to a group format as shown on the right.  Now the data has observations for the stat value by deathcause and group.  Group is either "Mean" or "Median".

StatisticsByGroup2/*--Rearrange multi column to group--*/
data heartGroup;
  length Group $6;
  keep DeathCause Group Value;
  set heart;
  Group='Mean'; Value=Mean; output;
  Group='Median'; Value=Median; output;
run;

Now, I can use this data set and script the annotate functions and data I need to overlay the two connect lines using the "Polyline" function.  I also script the "Line" and "Text" functions needed for the legend.

The code for scripting the SG annotate data set is shown below.  Note, SGAnnotate data set is a bit different form the classic SAS/GRAPH annotate data set due to the difference in the features of the underlying graph system.  However, many aspects have been retained for ease of transition.

SGAnno/*--Make SG Anno data set--*/
data sganno;
  length Label $6 DrawSpace $12;
  drop DeathCause Group Value;
  set heartGroup end=last;
  by group;

  /*--Script out the Mean and Median polylines--*/
  DrawSpace='DataValue';
  LineThickness=1;
  if first.group then Function='PolyLine';
  else Function='PolyCont';
  if group='Mean' then LinePattern='Solid';
  else LinePattern='Dash';
  XC1=deathcause; Y1=Value; output;

  /*--Script out the Legend--*/
  if last then do;
    DrawSpace='WallPercent'; Width=50;
    LinePattern='Solid';
    Function='Line'; x1=80; y1=95; x2=90; y2=95; output;
    Function='Text'; x1=90; y1=95; Label='Mean'; anchor='left'; output;

    LinePattern='Dash';
    Function='Line'; x1=80; y1=90; x2=90; y2=90; output;
    Function='Text'; x1=90; y1=90; Label='Median'; anchor='left'; output;

  end;
run;

The "Polyline" and "Polycont" functions are used to draw the connect lines for "Mean" and "Median" values by deathcause.  DRAWSPACE=DataValue is used to interpret the x and y values in data space.  Also, since the x axis values are discrete character, the column XC1 is used instead of X1.  The "Line" and "Text" functions are used to display the legend using  DRAWSPACE of WALLPERCENT.  Care must be taken to correctly match the legend item attributes with the labels.

Full SAS9.3 program:  Box_Connect_Anno

 

Post a Comment

Boxplot with connect

This blog post is motivated by a post by a user on the communities page about creating a box plot with colored boxes by category and multiple connect lines.

Box_ConnectNormally, a box plot can be drawn by category, with a single connect line for one of the statistical values of the box plot, say mean or median as shown in the graph on the right.  This is very straightforward, and supported by the SGPLOT procedure and GTL.  The SGPLOT code for this use case is shown below.  We have used the VBOX statement, with CONNECT=mean.  The connect line joins the specified statistic across all the categories for a group.

title 'Cholesterol by Cause of Death';
  proc sgplot data=sashelp.heart noautolegend ;
  vbox cholesterol / category=deathcause connect=mean;
  xaxis display=(nolabel);
run;

Box_Group_Connect_2If there is more than one group, the values are connected by group as shown on the right.  Here we have used GROUP=sex, resulting in a box plot with Male and Female boxes by Death Cause.  The mean values of the boxes are connected by group.  The boxes and connect lines are colored by sex, as shown in the legend.

The unique use case the user had was that he wants the boxes displayed by category without groups, but each box is colored by the category variable.

Box_GroupThis can be achieved by setting the GROUP='category variable', resulting in a graph where the boxes are colored by the category, but really because the group role is used.  In the graph on the right, we have set GROUP=DeathCause, the same variable as the CATEGORY role.  This colors the boxes by category.  We have used CONNECT=mean, but no connect line is shown.  This is due to the fact that each of the boxes belongs to a different group, and there is only one of each.  So, no connect line is possible.

title 'Cholesterol by Cause of Death';
proc sgplot data=sashelp.heart noautolegend noborder;
  vbox cholesterol / category=deathcause group=deathcause  connect=mean;
  xaxis display=(nolabel);
run;

The user not only wants a connect line in this case, but actually wants to see the connect line for multiple statistics, say "Mean" and "Median".  This is now getting beyond what the SGPLOT procedure can do with simple options.  Now, we need to draw the boxes and the connect lines ourselves using either an overlay of a SERIES plot (SAS 9.4M1) or SGAnnotate (SAS 9.3 +).

DataUsing SAS 9.4M1, we have processed the data using the MEANS Procedure to compute the Mean and Median statistics by DeathCause.  Then, we have merged this summary data into the original data, and created additional columns.  The last few observations are shown on the right.  The merged data contains all the original observations for Death Cause and Cholesterol, with missing "Mean" and "Median".  Then, we have 5 additional observations for the Mean and Median columns with missing Cholesterol.

Box_Group_Multi_ConnectWe can use this data set to plot the box plot of cholesterol by Deathcause, and overlay that with two SERIES plots, one for Mean and one for Median with different line attributes.

Note how the boxes are displayed colored by death cause.  Additionally, we have a "connect" of the "Mean" and the "Median" values using the SERIES plot.  The legend for the connect lines is displayed inside the plot area.

title 'Cholesterol by Cause of Death';
proc sgplot data=heart2 noautolegend noborder;
  vbox cholesterol / category=deathcause group=deathcause;
  series x=deathcause y=mean / name='mean' legendlabel='Mean';
  series x=deathcause y=median / lineattrs=(pattern=dash) name='median' legendlabel='Median';
  keylegend "mean" "median" / linelength=32 location=inside across=1 position=topright;
  xaxis display=(nolabel);
run;

SAS 9,40M1 allows the overlaying of "Basic" plots with a VBOX statement.  Prior to SAS 9.40M1, overlay of basic plots on a VBOX was disallowed.  This restriction was removed for SAS 9.40M1 expressly because many users want to overlay detailed data on a box plot, such as the actual observations themselves as in the Margin Plot example.

If you have an earlier SAS 9.3 + version, you can do the same by using SGANNOTATE to draw the connect lines.  See Dan's paper on SGAnnotate, or Warren's  Advanced ODS Graphics Examples.

FullSAS 9.40M1 code: Box_Connect

Post a Comment