Spark and Summary Plots

In the area of graphical visualization of data, Edward Tufte is a thought leader and has put forth many innovative ideas that enhance the understanding of the information in the graph with minimal distractions and potential for misinterpretation.

One of his ideas has been the use of "Spark" plots.  As per my understanding, these are very light weight graphs that can depict the key information in a very small space.  Often such graphs can be included inline with other textual information in a paragraph like this:  spark_3.  In this case, I have generated this graph using SGPLOT procedure with minimal decorations to depict the trend of the stock prices for Intel from the sashelp.stocks data set.  I display only the series, last value and a label.

SGPLOT code for Spark Plot;

proc sgplot data=spark noautolegend noborder nowall;
  series x=date y=adjclose;
  scatter x=date y=lastvalue / markerattrs=(color=blue symbol=circlefilled size=12);
  text x=date y=lastvalue text=lastvalue / position=topright textattrs=(size=20);
  text x=date y=firstvalue text=label / position=left textattrs=(size=20)
         splitpolicy=splitalways splitchar='.';
  xaxis display=none;
  yaxis display=none offsetmin=0 offsetmax=0;
run;

Recently, I received a request from SAS user Benjamin Knisley to create a similar lightweight "Graphical Summary" for visualizing patient data over time.  The graph shown below includes display of the visits and hospitalization over time.  Multiple visits are depicted as dots for easy viewing and the x and y axes are removed.  Some significant information about the patient, clinic and actual start and end dates is added.  See link below for full code.  I believe this depiction of the data is also motivated by Tufte's ideas.

visits_dot_4

One customization needed in the above graph is the use of the VALUES option since user wanted a sparse display of the years on the x-axis.  This too can be generalized by using GTL which provides the INTERVAL and INTERVALMULTIPLIER options on the TIMEOPTS bundle.

SGPLOT code for Graphical Summary Graph:

title j=l 'Family name, Given name' j=r 'County Clinic';
proc sgplot data=dots noautolegend noborder nowall;
  scatter x=date y=y / markerattrs=(symbol=circlefilled size=5);
  xaxistable hospitalized / x=date nomissingchar labelattrs=(size=9 weight=bold)
                     valueattrs=(size=10 weight=bold);
  text x=date y=ylbl text=firstdate / position=right contributeoffsets=none;
  text x=date y=ylbl text=lastdate / position=left contributeoffsets=none;
  xaxis type=time values=('01jan1980'd '01jan1985'd '01jan1990'd '01jan1995'd)
           valueshint display=(nolabel) valuesformat=year. valueattrs=(size=9 weight=bold);
  yaxis display=(noline noticks novalues) labelattrs=(size=9 weight=bold);
run;

Full SAS 9.4 code: graphicalsummary  

Post a Comment

Legend order and group attributes

In this blog, I will show you how to control the order of the entries in a legend and explicitly control the correspondence between groups and style elements in PROC SGPLOT. In many cases, the colors that are used to differentiate groups do not matter--the graph simply needs to display different groups using different colors. That is not true for other graphs. It might be confusing if males were displayed using pink markers and lines and if females were displayed using blue markers and lines. For adverse events, you might prefer to use green for mildly adverse events and red for more severe events. Furthermore, you might want to order the events in the legend from mild to severe, and that might not conveniently depend on the order of the events in the data or a sorted order. The easiest way to control both legend order and group to style element correspondence is by using attribute maps. A series of examples provides background and shows other options.

The first graphs show default legend orderings and correspondence. They show that these can change depending on the data and the type of graph that you create. The fifth graph shows how you can use the STYLEATTRS statement in PROC SGPLOT to override components of style elements. The seventh (and last) graph shows how you can use an attribute map to control both the order of the entries in a legend and the correspondence between groups and style elements. With attribute maps, you do not have to know the original order. You can completely control the legend order and assign or override the default style elements. The PROC SGPLOT documentation contains much more information about the STYLEATTRS statement and attribute maps.

All of the graphs use this format in creating the legend:

proc format;
   value $sex 'M' = 'Male' 'F' = 'Female';
run;

This step creates a simple scatter plot with two groups:

proc sgplot data=sashelp.class;
   title '(1) Scatter Plot of the Class Data Set by Sex';
   scatter y=height x=weight / group=sex;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData1) are first in the data set and legend.';
   footnote2 justify=left 'Females (GraphData2) are second in the data set and legend.';
run;

Click on a graph to enlarge.

Order1

The GROUP= option is specified in the SCATTER statement so that males are displayed differently from females. The first observation in the SASHelp.Class data set is a male. Therefore, males are displayed using the GraphData1 style element (blue circles) and females are displayed using the GraphData2 style element (red circles). The legend entries are similarly ordered male and then female.

The following step creates a regression fit plot with two groups:

proc sgplot data=sashelp.class;
   title '(2) Fit Plot of the Class Data Set by Sex';
   reg y=height x=weight / group=sex degree=2 nomarkers;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData2) are first in the data set.';
   footnote2 justify=left 'Males are second in the legend and use GraphData2 '
                          'because the male function was fit second.';
   footnote3 justify=left 'Females (GraphData1) are second in the data set.';
   footnote4 justify=left 'Females are first in the legend and use GraphData1 '
                          'because the female function was fit first.';
run;

Order2

Males are still first in the data set, but now males appear second in the legend and are plotted using GraphData2 (red line), and females appear first in the legend and are plotted using GraphData1 (blue line). This is because the regression code gathers together the females first and then the males('F' is sorted ahead of 'M'). Therefore, the legend order and the GraphDatan assignment changes from the scatter plot.

The following step uses both a SCATTER and a REG statement:

proc sgplot data=sashelp.class;
   title '(3) Fit Plot of the Class Data Set by Sex';
   scatter y=height x=weight / group=sex;
   reg     y=height x=weight / group=sex degree=2 nomarkers;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData2) are first in the data set.';
   footnote2 justify=left 'Males are second in the legend and use GraphData2 '
                          'because the male function was fit second.';
   footnote3 justify=left 'Females (GraphData1) are second in the data set.';
   footnote4 justify=left 'Females are first in the legend and use GraphData1 '
                          'because the female function was fit first.';
run;

Order3

The legend order and the GraphDatan assignment still depends on the order in which the regression analysis is performed for each group.

The next step creates a grouped scatter plot from sorted data:

proc sort data=sashelp.class out=class;
   by sex;
run;
 
proc sgplot data=class;
   title '(4) Scatter Plot of the Sorted Class Data Set by Sex';
   scatter y=height x=weight / group=sex;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData2) are second in the data set and legend.';
   footnote2 justify=left 'Females (GraphData1) are second in the data set and legend.';
run;

Order4

Since females now appear first in the data, they appear first in the legend and are displayed using GraphData1. Males appear second in the legend and are displayed using GraphData2.

The next step relies on the default group order (males then females in this case) and uses the STYLEATTRS statement to set the marker and line colors:

proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(Blue cxFFAAAA);
   title '(5) Fit Plot of the Class Data Set by Sex';
   reg y=height x=weight / group=sex degree=2;
   format sex $sex6.;
   footnote1 justify=left 'Males (Blue) are first in the data set.';
   footnote2 justify=left 'Males are second in the legend '
                          'because the male function was fit second.';
   footnote3 justify=left 'Females (Pink) are second in the data set.';
   footnote4 justify=left 'Females are first in the legend '
                          'because the female function was fit first.';
   footnote5 justify=left 'The STYLEATTRS statement sets the colors for '
                          'males then females.';
run;

Order5

The STYLEATTRS statement sets the contrast colors to blue and a shade of pink for GraphData1 and GraphData2. The order of the legend entries is alphabetized, and the colors are consistent with gender identity colors.

Notice that the preceding step uses a REG statement without the NOMARKERS option. With this combination, the assignment of style elements to groups is reversed from the example with the REG statement and the NOMARKERS option. If you cannot anticipate which style element is used with which group, do not worry about it; it will all become easier in the last example. You can use attribute maps to control the order of the legend and override the GraphDatan style elements.

This second last example still relies on knowing the default group assignment. The first step creates an attribute map with females first and then males. Therefore, females will appear first in the legend. In this example, the only attribute that is set is FillColor, which is irrelevant in this graph. Specifying an irrelevant variable like this enables you to use an attribute map to simply control legend order:

data order;
   input Value $;
   retain ID 'A' Show 'AttrMap' FillColor 'Red';
   datalines;
Female
Male
;
 
proc sgplot data=sashelp.class dattrmap=order;
   title '(6) Scatter Plot of the Class Data Set by Sex';
   scatter y=height x=weight / group=sex attrid=A;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData1) are first in the data set.';
   footnote2 justify=left 'Males are second in the legend '
                          'because they are second in the attribute map.';
   footnote3 justify=left 'Females (GraphData2) are second in the data set.';
   footnote4 justify=left 'Females are first in the legend '
                          'because they are first in the attribute map.';
run;

Order6

The last example is more typical, and it does not require you to know the default group order. The attribute map names females first and males second so that the legend entries appear in that order. Furthermore, females are explicitly specified to use all of the components of the GraphData2 style element and males use all of the components of GraphData1.

data order;
   input Value $ n;
   retain ID 'A' Show 'AttrMap';
   FillStyle        = cats('GraphData', n);
   LineStyle        = cats('GraphData', n);
   MarkerStyle      = cats('GraphData', n);
   TextStyleElement = cats('GraphData', n);
   datalines;
Female 2
Male   1
;
 
proc sgplot data=sashelp.class dattrmap=order;
   title '(7) Scatter Plot of the Class Data Set by Sex';
   scatter y=height x=weight / group=sex attrid=A;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData1) are first in the data set.';
   footnote2 justify=left 'Males are second in the legend '
                          'because they are second in the attribute map.';
   footnote3 justify=left 'Females (GraphData2) are second in the data set.';
   footnote4 justify=left 'Females are first in the legend '
                          'because they are first in the attribute map.';
   footnote5 justify=left 'Males are explicitly assigned GraphData1.';
   footnote6 justify=left 'Females are explicitly assigned GraphData2.';
run;

Order7

The correspondence between groups of observations and GraphDatan style elements can be confusing. It might depend on the order of the observations in the data set or it might depend on the order in which ODS Graphics does computations. You can use STYLEATTRS to override GraphDatan style elements. Even more powerfully, you can use attribute maps to control the order of the legend and correspondence between groups of observations and GraphDatan style elements. The STYLEATTRS statement and attribute maps are much more powerful than is shown here. See the PROC SGPLOT documentation for more information.

Post a Comment

Clinical Graphs - Risk Difference Plots

Often I have written articles that are motivated by some question by a user on how to create a particular graph, or how to work around some shortcoming in the feature set to create the graph you need.  This time, I got a question about Clinical Graphs that were mostly working as built by a user, with a small issue that was fixed in the latest SAS 9.4M3 release of SAS.  So, there was not much for me to add to the graph.   However, the graph was quite impressive and information rich and worth sharing with everyone.

Risk_Diff_PlotSo, with the permission of the authors,  David Carr and Andreas Brueckner of Novartis, here are a couple of impressive graphs.  The first graph shown on the right displays the Risk Difference between Drug A and Comparator by various categories, classified as "Benefits" and "Risks".  Click on the graph for a higher resolution image.

The graph uses a technique I have written about for splitting a SGPLOT data area into two separate parts along the x-axis, thus showing two graphs in one cell aligned by y-axis values.  While this technique is no longer necessary when adding statistics tables with the advent of the AxisTable, it is still necessary to to create such a graph.

proc sgplot data=tmp2 dattrmap=AttrMap nocycleattrs sganno=anno;

  /* This syntax plots the incidence lines on LHS */
  highlow y=cat high=rate low=zero / group=group type=bar groupdisplay=cluster name='a'
                 lowlabel=IRLAB barwidth=1.0 clusterwidth=0.9 transparency=0 fill attrid=group
                dataskin=pressed LABELATTRS= (size=8pt weight=normal family='Albany AMT') nooutline;

  /* This syntax plots background shading */
  highlow y=cat high=xmax low=rate1 / group=brtyp type=bar groupdisplay=cluster
              barwidth=1.0 clusterwidth=0.9 transparency=0.70 fill attrid=brtyp nooutline;

  /* This syntax plots the words Benefits and Risks at the respective y axis posotions */
  highlow y=cat high=defx low=defx / type=bar highlabel=BRTXT barwidth=0 transparency=0 fill
             LABELATTRS= (size=20pt weight=normal family='Albany AMT') nooutline;

  /* This syntax plots the risk difference with error bars on RHS */
  scatter y=cat2 x=rd /group=subg xerrorlower=rdlo xerrorupper=rdhi
              x2axis y2axis attrid=subg;

  /* Axis limits (0 to 50; -8 to 8) might need to be adjusted depending on data */
  refline 50 / axis=x;
  refline 0 /axis=x2;

  xaxis display=(nolabel) offsetmax=0.60 grid values=(0 to 50 by 10) valueattrs=(size=7);
  yaxis display=(nolabel) valueattrs=(size=9 family='Albany AMT') type=discrete
             fitpolicy=splitalways splitchar="#";

  x2axis display=(nolabel) offsetmin=0.42 grid max=64 values=(-10 to 10 by 2) valueattrs=(size=7);
  y2axis display=(nolabel noticks novalues) values=(1 to 10 by 1) type=linear;

  keylegend 'a';
run;

This graph uses overlays of multiple graph types to provide a lot of information in one graph, in a manner that is easy to understand, and pleasing in appearance.  A cluster grouped HighLow plot us used to display the % incidence by drug on the left side.  A Scatter plot is used to display the risk difference value and 95% CI on the right.  The authors have made creative usage of HighLow plot to display alternating bands to indicate the "Benefit" and "Risk" items.  HighLow plot is also used to place the "Benefits" and "Risks" labels.

SG Annotation is used to label the x and x2-axes (which could have been done using the LABELPOS option for the axes) and to display the "Drug A Better" and "Comparator Better" indicators.  A discrete attr map is used to set the colors and marker attributes.

Risk_Diff_PanelThe second graph is a "Paneled" version of the same graph, paneled by gender, as shown on the right.  Authors have made creative use of the SGPLOT procedure to display the same graph by gender.  Personally, I would have tried using the SGPANEL procedure to handle the data and layout.  Row headers could be suppressed, and Inset can be used to place the class value label in each cell.

All in all, a couple of great examples of creating complex clinical graphs using the SGPLOT procedure.

 

Post a Comment

Clinical Graphs: A1c Plot

Last week I was visiting San Diego for the SANDS conference.  I always enjoy this conference as I get to interact closely with the users to hear of their pains and innovative solutions to creating Clinical Graphs.

A1C_1_GTLIn the conference Ed Barber asked about displaying A1c data along with some events data on a time line, along with data showing the meds that were used in the same graph as shown on the right.  Click on the graph for a higher resolution image.

As always, I like to break down the desired result into component parts, and then use the appropriate statement combinations to create the graph.  In this case, the graph is really in two parts.  The upper cell has a display of the subject A1c by date, along with markers to show the occurrence of some adverse events by date.

The lower cell contains a graph of the medications and their duration, along with the dosage represented as the thickness of the line.  The data for this is most ideally coded as columns with medicine name, dosage and start and end dates for the duration.

The easiest way to get the data in one data set (as needed to graph it), I create each data set separately as shown below.

A1c_Data_2Meds_Data_2

 

 

 

 

 

 

 

 

Both_Data_2Then, I do a simple merge to get one data set as shown on the right.

Now, we can plot the curve showing A1c x Date and overlay a scatter plot of Event x Date, where "Event" column contains the y-axis value for plotting the marker.  In this case, I have used a value that will put it somewhere in the range of the A1c data.

For the plot in the bottom cell, we can use a HighLow plot to display the start and end dates for each medication.  We can also display the med name as a LowLabel and the dosage as the HighLabel to produce the graph shown at the top.

Note, I have used a GTL template to create a 2-cell graph, with a common x-axis.  While the GTL code is more verbose, the structure of the graph is quite simple.  We use a 2-row LAYOUT LATTICE, and populate each cell with a LAYOUT OVERLAY.  The upper container has a Series and a Scatter plot, while the lower cell has one HighLow plot.

Note the use of the DATTRMAP data set for setting the attributes of the HighLow plot line by group.  Group=2 has the thicker line.

GTL Template code:

proc template;
  define statgraph A1c_1;
    begingraph;
      entrytitle 'A1c by Date with Events and Meds';
      layout lattice / rows=2 columndatarange=union rowweights=(0.7 0.3) rowgutter=5px;
        columnaxes;
          columnaxis / griddisplay=on display=(ticks tickvalues);
        endcolumnaxes;

        layout overlay / yaxisopts=(griddisplay=on tickvalueattrs=(size=8)
                     linearopts=(viewmin=6 tickvaluesequence=(start=6 end=8 increment=0.2)));
          seriesplot x=date y=a1c / lineattrs=(thickness=2);
          scatterplot x=date y=event / markerattrs=(symbol=trianglefilled size=12);
       endlayout;

        layout overlay / yaxisopts=(display=(label) reverse=true);
          highlowplot y=meds low=from high=to / group=group lowlabel=label highlabel=dosage
                      lineattrs=(color=black) labelattrs=(color=black);
        endlayout;
      endlayout;
    endgraph;
  end;
run;

proc sgrender data=both template=a1c_1 dattrmap=attrmap;
  dattrvar group='Meds';
run;

A1C_2_GTLThe graph on the right uses two HIGHLOW plot statements to differentiate the display of the Dosage from the Meds.  Click on the graph for a higher resolution image.  The HIGHLOW plot has only one option for LabelAttrs, which applies to both the low and high labels.  So, to display the HighLabel using different label settings, I one HighLow plot with LowLabel only, and one HighLow plot with HighLabel only with LabelAttrs.  The line thickness for 2nd plot is made zero.

Once you understand the structure of the graph, you can create a 2-cell graph using SG by splitting the height of the graph using the y-axis and y2-axis sections as shown below.

SG code:

title 'A1c by Date with Events and Meds';
proc sgplot data=both noautolegend dattrmap=attrmap;
series x=date y=a1c / lineattrs=(thickness=2);
scatter x=date y=event / markerattrs=(symbol=trianglefilled size=12);
refline 5.8 / noclip;
highlow y=meds low=from high=to / y2axis group=group attrid=Meds
lowlabel=label labelattrs=(color=black) lineattrs=(color=black);
highlow y=meds low=from high=to / y2axis group=group attrid=Meds
highlabel=dosage labelattrs=(color=gray size=6) lineattrs=(thickness=0);
yaxis offsetmin=0.3 offsetmax=0.05 values=(6 to 8 by 0.2) min=5.8 max=8 valueshint
label='A1c' labelpos=datacenter valueattrs=(size=8) grid;
y2axis offsetmin=0.75 offsetmax=0.07 display=none reverse;
xaxis display=(nolabel) grid;
run;

A1C_3_GTLUpdate:  Recently a co-worker suggested that a legend would help to decode the information.  Now, a legend is added in the upper cell, as shown in the graph on the right.  Also, the lines representing the "Meds" are made blue, to distinguish from the A1c curve.

 

 

A1C_SG_2A 2-cell SG plot is shown on the right.  Note the lack of a "row gutter" between the upper and lower sections, as these are all part of one cell.

See Full code:  A1C_Plot

 

 

 

Post a Comment

Outline cells in a table using a heat map

There are many ways to use a heat map. For big data sets, heat maps provide a substitute for scatter plots. Heat maps can also be used to enhance small tables. Several of my colleagues (Sanjay Matange, Pratik Phadke, Rick Wicklin, Chris Hemedinger, and probably others) have written blogs about using heat maps to display tables. This blog shows how to outline selected cells in the table and display multiple values in each cell.

Click on a graph to enlarge.

HeatMapToeplitz2

(Author note, August 12, 3:30 PM: As questions came in about how to enhance my second example, I consulted my colleague Prashant Hebbar, who guided me toward more elegant code. I have revised my second example and my replies accordingly. The new code relies on the options POSITION=TOP and POSITION=BOTTOM instead of DISCRETEOFFSET=-0.15 and DISCRETEOFFSET=0.15 to put values on multiple lines. Thanks, Prashant!)

A Toeplitz matrix is used to illustrate this idea, since the values have a simple pattern. In this example, you will learn how to put an outline around some of the cells (those that have values greater than three). The first step creates a 10 x 10 Toeplitz matrix with entries 0 to 9. PROC IML creates a matrix that is arrayed in a single column for plotting, and it creates a SAS data set of values, row labels, and column labels.

proc iml; /* Create a Toeplitz matrix   */
   t = 9; /* Values range from 0 to t */
   x = (0:t)` @ j(t + 1, 1, 1) || j(t + 1, 1, 1) @ (0:t)` || 
       shape(toeplitz(0:t), 1)`;
   create mat from x[colname={"RowLab" "ColLab" "Value"}];
   append from x;
quit;

The function toeplitz(0:t) creates a 10 x 10 matrix, shape(toeplitz(0:t), 1) converts the matrix to a 1 x 100 row vector, and shape(toeplitz(0:t), 1)` transposes the row vector to make a 100 x 1 column vector to plot.

The second step creates a macro variable whose value controls the size of the cells for plotting. Approximately an 18 x 18 pixel square is reserved for each cell. The Outline variable has nonmissing values when the value is greater than 3. These are the cells that are outlined.

data mat2;
   if _n_ = 1 then /* Scale matrix size by the number or rows/columns */
      call symputx('size', ceil(18 * sqrt(n)));
   set mat nobs=n;
   Outline = ifn(Value > 3, Value, .); /* Outline the nonmissing values (Value > 3) */
   output;
run;
 
proc print data=mat2(obs=11); 
   id RowLab ColLab; 
run;
 
%put &size;

Here are 11 of the 100 observations:

HeatMapToeplitz3
The template creates a graph that is 250+180=430 pixels high and 200+180=380 pixels wide. The height is greater than the width to provide space for titles. The first HEATMAPPARM statement creates a heat map that displays values of the Value variable from 0 to 9 as colors ranging from white (GraphWalls:Color) to blue (ThreeColorRamp:StartColor). The second HEATMAPPARM statement outlines the nonmissing values of the variable Outline. It uses the option FILLATTRS=(TRANSPARENCY=1) to suppress the heat map and only display outlines. The TEXTPLOT statement displays the values overlaid on of the shaded cells.

proc template;
   define statgraph matrix;
      begingraph / designheight=%eval(250+&size) /* Size: a bit higher than wider */
                   designwidth =%eval(200+&size);
         entrytitle "Toeplitz Matrix";
         entrytitle "Values Greater than Three are Outlined";
         layout overlay / yaxisopts=(discreteopts=(tickvaluefitpolicy=none)
                                     display=(tickvalues) reverse=true)
                          xaxisopts=(discreteopts=(tickvaluefitpolicy=rotate)
                                     display=(tickvalues));
            * Heat map provides the background color for each cell;
            heatmapparm y=RowLab x=ColLab colorresponse=Value / 
               ColorModel=(GraphWalls:Color ThreeColorRamp:StartColor);
            * Heat map provides the outlines;
            heatmapparm y=RowLab x=ColLab colorresponse=Outline / 
               ColorModel=(GraphWalls:Color ThreeColorRamp:StartColor)
               display=all includemissingcolor=false fillattrs=(transparency=1)
               outlineattrs=graphdata2(thickness=1);
            * Textplot provides the values;
            textplot y=RowLab x=ColLab text=eval(put(Value, 2.)) / 
                     textattrs=(size=12px) position=center;
         endlayout;
      endgraph;
   end;
run;
 
proc sgrender data=mat2 template=matrix;
run;

table overlaid on a heat map

The next example creates the same Toeplitz matrix as before. However, this time the DATA step creates character row and column labels. The DATA step also creates a second variable that contains the cell indices. Larger cells are created (approximately 32 x 32 pixels), which are large enough to display the additional values. A second TEXTPLOT statement displays the cells. Each TEXTPLOT statement has POSITION= option that positions text either at the top or bottom of each cell.

proc iml; /* Create a Toeplitz matrix   */
   t = 9; /* Values range from 0 to t */
   x = (0:t)` @ j(t + 1, 1, 1) || j(t + 1, 1, 1) @ (0:t)` ||
       shape(toeplitz(0:t), 1)`;
   create mat3 from x[colname={"r" "c" "Value"}];
   append from x;
quit;
 
data mat4;
   if _n_ = 1 then /* Scale matrix size by the number or rows/columns */
      call symputx('size', ceil(25 * sqrt(n)));
   set mat3 nobs=n;
   Outline = ifn(Value > 3, Value, .); /* Outline the nonmissing values (Value > 3) */
   RowLab = put(r, words5.);
   ColLab = put(c, words5.);
   Cell   = cats('(', r, ',', c, ')');
   output;
run;
 
proc template;
   define statgraph matrix;
      begingraph / designheight=%eval(250+&size) /* Size: a bit higher than wider */
                   designwidth =%eval(200+&size);
         entrytitle "Toeplitz Matrix";
         entrytitle "Values Greater than Three are Outlined";
         layout overlay / yaxisopts=(discreteopts=(tickvaluefitpolicy=none)
                                     display=(tickvalues) reverse=true)
                          xaxisopts=(discreteopts=(tickvaluefitpolicy=rotate)
                                     display=(tickvalues));
            * Heat map provides the background color for each cell;
            heatmapparm y=RowLab x=ColLab colorresponse=Value /
               ColorModel=(GraphWalls:Color ThreeColorRamp:StartColor);
            * Heat map provides the outlines;
            heatmapparm y=RowLab x=ColLab colorresponse=Outline /
               ColorModel=(GraphWalls:Color ThreeColorRamp:StartColor)
               display=all includemissingcolor=false fillattrs=(transparency=1)
               outlineattrs=graphdata2(thickness=1);
            * Textplot provides the values;
            textplot y=RowLab x=ColLab text=eval(put(Value, 6.)) /
                     position=top textattrs=(size=12px);
            textplot y=RowLab x=ColLab text=cell / position=bottom
                     textattrs=(size=12px);
         endlayout;
      endgraph;
   end;
run;
 
proc sgrender data=mat4 template=matrix;
run;

HeatMapToeplitz2

Post a Comment

Likert Graph Revisited

A few weeks back I posted an article on ways to create a WindRose Graph using SGPLOT procedure.  The process is relatively simple.  Create (R, Theta) data with both numeric axes where the Theta is a value in the data range is 0-360, and R is the corresponding response value.  Then simply transform the (R, Theta) values to (X, Y) using  the transform shown in the article and plot the data as a regular rectangular XY plot.

America100CollegeStudentsA reader chimed in asking whether it was possible to create the graph shown on the right using a similar process.  This graph was published in a report on Post Secondary School Success.

While I feel confident such data could be displayed in the manner shown on the right, it was not clear to me if this is indeed the best way.  Here are my concerns:

  • The categories around the circle have no real positional information.
  • The categories do not add up to 100%, normally expected for a Pie Chart.
  • The categories do not have any directional or "cyclic" information.
  • The "subgroup" values are harder to compare as it is not clear if they are represented by the radial value or the segment area.
  • Each subgroup label and value has to be individually labelled using some custom tool.

Likert_LabelWhile the graph above is attractive, the information is may be easier to consume as a Likert graph shown on the right.  Click on the graph for the full resolution graph.  This representation of the data on the right follows the principles of effective graphics and has the following benefits:

  • The categories are shown as horizontal bars.
  • The subgroups add up to 100% for each category.
  • Subgroup magnitudes within the category are easy to compare.
  • Subgroup magnitudes can even be compared across categories.
  • Labelling is done programmatically.  No custom labeling is required.
  • Y-axis label splitting is used to fit the categories.
  • Since both subgroup label and value are displayed in the segment, we have to compute the label, and its location to display the label using an overlaid TEXT plot.
  • Custom labeling may be needed for very small segments.

SGPLOT code for the graph:

title "Top 100 American Colleges";
proc sgplot data=labels noautolegend subpixel nocycleattrs noborder;
  hbarparm category=category response=value / group=group dataskin=crisp barwidth=1;
  text y=category x=Lbl_Loc text=label / contributeoffsets=none textattrs=(size=7);
  xaxis display=(nolabel noticks noline);
  yaxis display=(nolabel noticks noline) fitpolicy=splitalways splitchar='^';
run;

Likert_GradOne could make the graph more visually interesting.  Here I have used gradient color for each horizontal bar using the ColorGradient option.  I have also set FillType=Gradient so each segment is drawn distinctly.  Color itself is not significant, so no legend is needed.  Click on the graph to see the high resolution graph.

Also see my previous article on Likert Graphs.

Full SGPLOT code:  Likert

Post a Comment

'Unbox' Your Box Plots - part deux

There was a recent comment on the original 'Unbox Your Box Plots', where a user wants to see the original data for the box, but only label the outliers.

As noted in the comment, labeling all the scatter markers and turning on the outlier display is not ideal. But there is a way to do this.

The basic idea:

  • PROC MEANS (or PROC UNIVARIATE) to compute the Q1 and Q3 for the data
  • compute the upper and lower fences
  • blank out the label variable if that observation is not an outlier.

With SAS 9.4, GTL scatter plots support jitter. So we can do away with workaround using interval X axis as required in the original post. Here is the GTL output:

GTL Box plot with jittered data

You can also do this with SGPLOT procedure (as of SAS 9.4, 1st maintenance release), with the result as shown below:

SGPlot box plot with jittered data

The full code for both examples is here.

Post a Comment

Stock Plots

This weekend I was reviewing my portfolio of stocks as usual.  Yes, I do have a small stock portfolio with a few stocks, and normally I use free stock charting software to review the stock plots.  These sites allow you to view the daily stock prices along with many technical indicators such as moving averages, Bollinger bands and more.

FB_2016_2AOne technical indicator often talked about is the range bands.  The conventional wisdom being that the prices tend to stay within this range, until they don't.  I could not find a way to do this on the website I use, so I have to take a screen shot of the graph, and then lay straight lines at the upper and lower range of the stock prices using Microsoft Publisher.  The result is shown on the right.

The process above is a bit tedious, so I figured I could use the power of SAS to create the graph I need as shown below.  Click on the graph for a higher resolution version.

FB_2Yr_1BI created this graph by downloading the 2-year stock data for Face Book (FB) from the NASDAQ site.  For sure, there are many other sites available.  Then, I used the SGPLOT procedure to create the graph, plotting a time series of Close x Date using the SERIES plot with an overlay of a REG plot (nomarkers).  The default 95% confidence works quite well to bound the low and high values of the graph.  However, I adjusted the Alpha value for a tighter fit, at least for this graph, and settled on Alpha=0.1.

SAS SGPLOT code:

title "&name (&symbol) Daily Close (Alpha=&alpha) Degree=1 on &sysdate";
proc sgplot data=&symbol._2yr_data noautolegend subpixel;
  series x=date y=close / y2axis ;
  reg x=date y=close / y2axis nomarkers cli alpha=α
  y2axis grid display=(nolabel);
  xaxis grid display=(nolabel);
run;

Note the use of the &name, &symbol and &alpha macro variables. These are used because I made this into a macro that will download the data, process it, and create multiple graphs given the stock name and symbol.  See the full code linked below.

Just to compare the results, I also tried a quadratic fit and one with ORDER=3.  The results are interesting.  The different graphs indicate different potential for the stock, each indicating some room for the stock prices to go up before they become "over bought".  Note, the previous conclusion is purely speculation on my part, and not meant as "financial advice".  Alpha=0.1 is likely to not fit different stocks based on individual "beta", and can be changed in the macro invocation.

FB_2Yr_2

FB_2Yr_3

 

 

 

 

 

 

The same can be done for other stock symbols as shown below.  Note, the 90% CLI bands are used for all the graphs.  The last graph uses a HighLow plot with 1-year data.

Full SAS code:  StockPlotMacro_2

AVGO_2Yr_1

AVGO_2Yr_2

AVGO_2Yr_3AVGO_2Yr_4

Post a Comment

Graph Table with Class

As often is the case, this article is prompted by a recent post on the SAS/GRAPH and ODS Graphics page communities page.  A user wanted to create a Graph Table showing a bar chart with tabular data for each of the category values along the x-axis.  The user was creatively using a VBAR overlaid with multiple VLINE statements using SAS 9.40M?.  The VLINE statements were used to display the statistics.

BoxPlotTables_3I applaud the creativity of the user, who has clearly taken to heart the lesson that multiple plot statements can often be used creatively to build the graph you may want.  Prior to SAS 9.4, this was one way to overlay additional textual data on a graph that contains a VBAR.  However, with SAS 9.4, there is an easier way - AxisTable.

While we have discussed AxisTables in earlier articles, it seems worthwhile to review the subject.  The graph above right shows how you can display multiple rows of data statistics aligned with the x-axis categories.  The group values are clustered as shown for the box plot and in the table below it.  Click on the graph for a higher resolution image.

BoxPlotTableOur goal is to create the graph above.  Let us start with a cluster grouped box plot along with textual display of data.   In the graph on the right, a box plot of Horsepower is displayed by Type with Group=Origin for the data set sashelp.cars.  The group values are clustered side-by-side.  An xAxisTable is used to display the associated values for Horsepower, also classified by Origin.

Note, since the CLASS option is used with the xAxisTable, the statistical values for the three levels of "Origin" are displayed stacked under each category on the x-axis.  Each class value is displayed on the left.

BoxPlotTableClusterWith SAS 9.40M3, the CLASSDISPLAY option was added to allow the display of the class values in the clustered arrangement as shown on the right.  Using CLASSDISPLAY=CLUSTER, values for each class are displayed side by side, and arranged in the same way as in the box plot.  Now, the name of the variable is displayed on the left of the values.  Note, we have used the COLORGROUP=Origin to color each value by the same variable to provide a visual that is easier to decode.

BoxPlotTables_3The benefit of this option is that multiple statistics can be displayed with such grouped plot statements.  The graph on the right shows the mean values for Horsepower, Mpg_City and Mpg_Highway.  More variables can be used if necessary.

SAS 9.40M3 Code for grouped Box Plot with Table.

title h=10pt 'Mean Auto Statistics by Type and Origin';
proc sgplot data=sashelp.cars(where=(type ne 'Hybrid')) noborder;
  format mpg_city mpg_highway horsepower 3.0;
  styleattrs axisextent=data;
  vbox horsepower / category=type group=origin name='a'
           groupdisplay=cluster dataskin=gloss
          meanattrs=(size=6) outlierattrs=(size=5);
  xaxistable horsepower mpg_city mpg_highway / class=origin
         classdisplay=cluster stat=mean
        colorgroup=origin location=inside nostatlabel;
  xaxis display=(nolabel noticks noline);
  keylegend 'a' / location=inside position=topright across=1 title='';
  yaxis grid;
run;

BoxPlotTablesBandsFinally, the user wanted to add vertical divider lines (column border) to separate the column of values.  Unfortunately, the AxisTable statement does not currently support column or row borders.  However, the x-axis color bands could be used to create such a grouping as shown in the graph on the right.  Click on the graph to see this more clearly.  The banding intentionally uses a soft color, matching the color of the background.  However, that can be controlled in the syntax.

A Graph Table is very effective for display of results of an analysis. The AxisTable is ideally suited to help create such visuals.  Graph Tables such as the Survival Plot or the Forest Plot are popular examples of the usage of Axis Tables.

Full SAS 9.40M3 code for Graph Tables:  GraphTableWithClass

Post a Comment

Polar Graph - Wind Rose

Last week I posted an article on displaying polar graph using SAS.  When the measured data (R, Theta) are in the polar coordinates as radius and angle, then this data can be easily transformed into the XY space using the simple transform shown below.

    x=r*cos(theta * PI / 180);
    y=r*sin(theta * PI / 180);

Then, we can plot the graph using a scatter plot statement.  Setting Aspect=1 ensures the graph retains its shape, and we add the radial grid lines using Vector plot statement.  With GTL, we can use the Ellipseparm statement to display the circular grids.   With SGPLOT, we can use either a Polygon plot or SGAnnotate to draw the circular grids.

Wind_Polar_3In this article, we will discuss another popular polar graph called the Wind Rose Graph.  This graph was developed to depict the wind speed and direction, and can be useful to present any directional information, or information that is cyclical in nature.

The Wind Rose graph on the right is created using the SGPLOT procedure.  Here I have simulated wind data by direction and speed category.  The data was generated using some trigonometric and random functions and does not represent real or sampled data.

Note, this visualization does not use a scatter plot, as was the case with the polar graph in the previous article.  Here, we have used a "Bar" to represent the wind from each direction.  This needs a bit more work to create.

Wind_Data_3I start with generating the data in (R, Theta) coordinates, as shown on the right.  For 16 values around the circle, I generate the wind percentages by 4 "Knots" categories.  These can be seen in the legend in the graph above.  I have generated Low and High values for each segment in the table on the right.  This is done for ease of transformation later into the polar graph.  For simplicity, each group has equal number of percentages.  Values with total > 80 has 4 groups.

We can certainly plot this data directly as a HighLow plot in the XY space as shown below.  Click on the graph for a higher resolution image.

Wind_XY_Highlow_3title h=10pt 'Wind data as stacked HighLow segments';
proc sgplot data=WindSpeed noborder;
  styleattrs datacolors=(forestgreen lightgreen gold cxD00000);
  highlow x=theta low=low high=high / group=knots type=bar ;
  yaxis offsetmin=0 label='Percent' grid;
  xaxis values=(0 to 360 by 45);
run;

Note in the bar chart on the right, the bars are actually plotted using the High-Low plot, with Type=bar.  The overall wind values are easy to compare side by side.  However, since the data is really directional, let us plot the bars on the compass directions.

Wind_Polar_Data_3Using the equation shown above, we transform the (R, Theta) coordinates to the (x, y) coordinates.  A polygon is generated for each segment of the high-low in the original (R, Theta) space.  Then each vertex of the polygon is converted into the (x, y) space as shown in the table on the right.  When plotted, the "Polar Bars" (pun intended) are displayed.  The table includes the Knots variable to be used as color group and polygon Id.

In the same data set, data is generated for the 16 radial grid lines, with values 0-315, the angle around the circle.  A Vector plot is used to display this on the graph as the 16 directions.  A Text plot is used to display the labels for each direction using a user defined format.  X and Y axes are turned off, and Aspect=1 is used.  The result is shown in the graph at the top of the article.

title h=10pt 'Wind Rose created using SAS SGPLOT Procedure';
proc sgplot data=WindPolar aspect=1 noborder nowall noautolegend sganno=anno subpixel;
  format label dir.;
  format knots knots.;
  styleattrs datacolors=(forestgreen lightgreen gold cxD00000);
  polygon x=x y=y id=id / dataskin=sheen fill nooutline group=knots name='a';
  vector x=x2 y=y2 / xorigin=x1 yorigin=y1 lineattrs=(color=lightGray) noarrowheads;
  text x=xl y=yl text=label / textattrs=(size=7);
  keylegend 'a' / position=right across=1;
  xaxis display=none;
  yaxis display=none;
run;

anno_dataIn the previous article for Polar Graph, I had used GTL to display the circular grid lines using the EllipseParm statement.  In this graph, I have used SGAnnotate to draw the circular grids using the OVAL function.  Click on the table on the right to see the annotate data in detail.  Note the use of "Layer=back".  This draws the circular grids behind the graph.  To see these annotations, we need to turn off the Wall display.

A label for each circular grid can be added using the TEXT function.  That exercise is left to the motivated reader.

Full SAS 9.4 SGPLOT Code:  WindRoseGraph

 

 

 

Post a Comment