Coffee Recipes

PhilzFor a long time, Starbucks represented to me as the good coffee cup, with me paying upwards of $4 for a Latte.  But on a recent visit to San Francisco, my son introduced me to a few other options.

Philz crafts a great cup of java, with the barista making the coffee right there in front of you, just like you want it, and they'll go the distance to get it right.  I believe I got the "Jacob's Wonderbar" with heavy cream.  It was great.

Blue_BottleThen, I was introduced to Blue Bottle, another great (and maybe local) coffee bar.  The ambiance was great.  I got a Cafe Mocha.  The Barista did a great design on the top by hand as shown on the right. Since the coffee has chocolate, it was mildly sweet already.  It was served in a nice porcelain cup that was very enjoyable.

All this turned out to be a great lead-in to the article I had planned on creating an interesting graphic for coffee recipes in continuation on the series on "Info-Graphs".

Now to be sure there are many great coffee recipes, but I limited my task to display four common ones in a panel using the data shown below.  The idea being this is all data driven, and more recipes can be easily added to create more graphs.

The four recipes are listed, each with its ingredients.  "Expresso" has only Expresso, "Macchiato" has Expresso and milk foam, and so on.  The fraction of the volume of the cup is shown under the "Value" column.  So, "Cafe Latte" has 25% Expresso, 50% steamed milk and 25% milk foam.
First, we start with a plain stacked graph showing the recipes for each coffee type.  The code is shown below.  I used a HighLow plot instead of the VBarParm because I need the flexibility to raise the bottom of the bar to adjust to the mask later.  For this, the two modified columns "Low" and "High" are used.

Coffee_HighLow_1The result is shown on the right.  The graph includes a legend to identify the ingredients.  A discrete attributes map is used to set the preferred colors for each ingredient.

title j=l h=1 'Coffee Recipes';
proc sgplot data=Coffee noborder   noautolegend nocycleattrs
                     dattrmap=attrmap pad=(bottom=10pct);
  highlow x=name low=low high=high / group=group type=bar
                 nooutline barwidth=0.7 name='a' attrid=Coffee;
  keylegend 'a';
  xaxis display=( nolabel noticks) offsetmin=0.12 offsetmax=0.12;
  yaxis display=none min=0 max=1 offsetmin=0.15 offsetmax=0.42 values=(0 0.5 1);

Coffee_HighLow_Text_1Next, we improve the graph by labeling each ingredient directly in the bar chart using a Text plot.  Now, we can do away with the legend.  The information is easier to consume with the ingredients labeled directly, thus reducing eye movement to decode each using the legend.

Note the use of "Backlight" for the Text plot.  This ensures that text that is light on light can still be easily read.  Click on the graph for a higher resolution image.

title j=l h=1 'Coffee Recipes';
proc sgplot data=Coffee noborder noautolegend nocycleattrs
                     dattrmap=attrmap pad=(bottom=20pct);
  highlow x=name low=low high=high / group=group type=bar
                  nooutline barwidth=0.7 name='a' attrid=Coffee;
  text x=name y=mid text=group / backlight=0.4
              textattrs=(size=6 color=white);
  xaxis <options>;
  yaxis <options>;

Coffee_HighLow_Text_Mask_1Now comes the fun part.  We use the SAS 9.4 feature to define a marker from an image icon.  In this case, we use an icon of the coffee cup.  Then, we can layer a SCATTER plot over the HighLow to create the graph shown on the right.

We use an image where the middle portion of the cup is transparent.   The cup part is in dark color and the outer pixels of the image are white.  When this icon is layered over the bar, the bar colors show through the transparent portion of the cup, thus producing an interesting and memorable graphic.

Coffee_HighLow_Text_Mask_Steam_1Finally,  we use another icon to display the steam rising from the coffee.  I hope you enjoy this cup of Java.

Thanks to Riley Benson, our super UX expert who helped me clean up the icons so the result was suitable for publishing.

See full code below for all the details.  I have also attached a zip file of the icons needed for the graph.

To run the program, you will need to put the icons a in folder and supply the full path name to that folder in place of the <your folder> in the code for "Cup" and "Steam" macro variables.

Full SAS 9.4 Code:  Coffee

Icon ZIP file:  Icons




Post a Comment

Displaying Group Values on the Axis

Recently a user was working with the HBAR statement with cluster groups with SG procedures.  User wanted to see the group values on the axis.  SGPLOT does not display multi level axes as these are shared with different plot types.  However, with SGPLOT, there is often a way to get what you want.

As frequent readers of this blog know that the real power of SGPLOT is in the myriad ways you can combine different compatible plot types together in one graph to create just the graph you need.  With SAS 9.4, you options are even greater.  Let us see what we can do in this case.

HBarChartHere is a basic cluster grouped bar chart in SGPLOT for the SASHELP.CARS data set.  We are viewing an HBAR of Response=mpg_city by Category=Type and Group=Origin.

The category values are displayed on the Y axis.  All the group values for each category are displayed within each category, clustered around the tick mark.  Each group value is colored by the group, and the values are displayed in the legend.

title 'Mileage by Type and Origin';
  proc sgplot ne 'Hybrid')) noborder nowall;
  hbar type / response=mpg_city stat=mean group=origin groupdisplay=cluster
           dataskin=pressed filltype=gradient baselineattrs=(thickness=0);
  xaxis display=(noline noticks nolabel) grid;
  yaxis display=(nolabel);

HBarChartLabelUser wants to see the group values on the axis itself.  To do this in SGPLOT, we can layer the values on top of each bar using a TEXT plot (SAS 9.4) or a SCATTER plot with MARKERCHAR option (SAS 9.3).

However, note that the HBAR statement does not allow layering of other basic plot types with it.  So, to do this, we have to first summarize the data ourselves using the PROC MEANS procedure.  Then, we will use the HBARPARM statement to draw the pre-summarized data with the group values.  Click on the graph for a higher resolution image.

/*--Summarize the data by Type and Origin--*/
proc means ne 'Hybrid')) noprint;
  class type origin;
  var mpg_city;
  output out=cars(where=(_type_ > 2))

/*--Add x and Y label locations--*/
data cars;
  set cars;
  xlbl=0.1; ylbl=0.1;

/*--HBAR with cluster groups--*/
ods graphics / reset width=5in height=3in imagename='HBarChart';
title 'Mileage by Type and Origin';
proc sgplot ne 'Hybrid')) noborder nowall;
  hbar type / response=mpg_city stat=mean group=origin groupdisplay=cluster
            dataskin=pressed filltype=gradient baselineattrs=(thickness=0);
  xaxis display=(noline noticks nolabel) grid;
  yaxis display=(nolabel);

HBarChartAxisTableIf the user really wants the group values displayed on the axis like in GCHART, we can use the AXISTABLE to draw the values to the left of the bars as shown on the right.  Now, the label for the group values is also displayed at the top of the axis.  We have removed the legend, and could also have changed all the bars to a single color if needed.

title 'Mileage by Type and Origin';
proc sgplot data=cars noborder nowall noautolegend;
  hbarparm category=type response=mileage / group=origin groupdisplay=cluster
      dataskin=pressed filltype=gradient baselineattrs=(thickness=0);
  text y=type x=xlbl text=origin / group=origin groupdisplay=cluster
       textattrs=(color=black size=7) position=right contributeoffsets=none;
  xaxis display=(noline noticks nolabel) grid;
  yaxis display=(nolabel);

HBarChartAxisTable2For a consistent look, we can do the same treatment for the category variable using another Text plot, and suppress the Y axis entirely as shown on the right.

Note, now we have the category axis on the outside.  The group values are within each category tick value as indicated by the alternate color bands.  The labels for the category and group axis are displayed at the top of each values.

title 'Mileage by Type and Origin';
proc sgplot data=cars noborder nowall noautolegend;
  hbarparm category=type response=mileage / group=origin groupdisplay=cluster
          dataskin=pressed filltype=gradient baselineattrs=(thickness=0);
  yaxistable type / location=inside position=left
  yaxistable origin / class=origin classdisplay=cluster location=inside position=left
        valuejustify=right valueattrs=(size=6) labelattrs=(size=7);
  xaxis display=(noline noticks nolabel) grid;
  yaxis display=none colorbands=odd colorbandsattrs=(transparency=0.5);

VBarChartLabelVertWith a VBAR, adding group values can be a bit tricky as there may not be enough space to display all the group values in the space available.  However, we can work around this issue by rotating the group values as shown on the right.

While I have shown you some ways to get an alternative look for the clustered bar chart, one can be sure you can customize the graph to your own specifications by combining the plot statements as you need.

title 'Mileage by Type and Origin';
proc sgplot data=cars noborder nowall ;
  vbarparm category=type response=mileage /  group=origin groupdisplay=cluster
         dataskin=pressed filltype=gradient baselineattrs=(thickness=0);
  text x=type y=ylbl text=origin / group=origin groupdisplay=cluster rotate=90
         textattrs=(color=black size=7) position=right contributeoffsets=none;
  yaxis display=(noline noticks nolabel) grid;
  xaxis display=(nolabel);

Full SAS 9.4 program:  Group_Labels

Post a Comment

New SAS Press Book on Clinical Graphs using SAS

Most regular readers may have already noticed the release of my new book "Clinical Graphs using SAS", as indicated by the icon of the book cover and the link to the SAS Press page under the "About this blog" section on the right.

BookThis book is a result of the various techniques we have discussed for creating clinical graphs using SAS in this blog, in the SAS Support  Communities web page and various papers and presentations at user meetings.

Starting with SAS 9.3, many features were added to the SG procedures and GTL to enable the creation of clinical graphs with use of the statements and options.  Annotation was also introduced for SG procedures that allowed the addition of axis aligned statistics for graphs like the Survival Plot and the Forest Plot.

The release of SAS 9.4 along with the three maintenance releases included new statements that were designed explicitly to make clinical graphs easier to create.  These included the Axis Table statements, Data Set based Annotation and Attribute Maps for GTL.  Now, group attributes can be set through statement syntax and display of axis tick values and data labels can be controlled.

This book uses all these techniques, both in SAS 9.4 and SAS 9.3 to create the commonly requested clinical graphs such as the Survival Plot, Forest Plot, Swimmer Plot, Adverse Event Timeline and many more.  I hope you will find this book useful.

SAS Global Forum is in Las Vegas this year, April 18-21.  A solid slate of papers related to graphs are offered.  Here is the List of Graphics related Papers.  I will look forward to seeing y'all at the conference.

Post a Comment

Basic ODS Graphics Examples

I have written a new book: Basic ODS Graphics Examples.

It is available as a free PDF file on the web. It is in color, and all of the SAS code is available by double clicking a link at the beginning of each example. This new book complements my other recent book: Advanced ODS Graphics Examples.

The new book replaces my 2010 SAS Press book: Statistical Graphics in SAS: An Introduction to the Graph Template Language and the Statistical Graphics Procedures. Like the 2010 book, the new book provides a gentle and parallel introduction to the graph template language and the SG procedures. Most of it has been rewritten, and many new examples have been added. Both books are designed for use with SAS 9.4.

Look for me this year at SAS Global Forum, PharmaSUG, and the Joint Statistical Meetings where I will be giving talks on methods of annotating and customizing the graphs that analytical procedures produce.

Post a Comment

Series Plot with Response Color Segments

Returning from my recent visit to India, I was reading an article that included a graph with a series plot where the color of the series itself changed based on the Y response.  Now, for sure, the SAS 9.40M3 SERIES plot in the SGPLOT procedure supports color response, but that applies to the entire curve.  If the series has multiple groups, the curve for each group can have a color by a response variable.  I discussed this features in the article about Response Color and Thickness.

Vector_With_Response_ColorBut what if you want to vary the color of the segments of the single series itself based on some response value, say the Y variable as shown in the graph on the right?  How would we do this?

The SERIES plot statement in SGPLOT cannot create such a graph, nor can GTL.  But, I created the graph on the right using SGPLOT.  How did I do this?

The answer is using a VECTOR plot.  The series plot needs data with two columns for (x, y), and a number of observations that decide the shape of the curve.  I added two additional columns (xp, yp), which includes the previous (x, y) location.  So, now I have a data set with a series of observations having the data for short segments of the series plot as vectors.  Then, I use these four columns to draw the curve shown above, the shape of the curve is identical to the shape if drawn by the SERIES plot statement.

title 'Series Plot with Color Response by Date';
proc sgplot data=series subpixel noborder;
  vector x=date y=a / xorigin=Prevdate yorigin=preva noarrowheads
              colorresponse=a colormodel=(red yellow green) lineattrs=(thickness=2);
  xaxis display=(noline noticks nolabel) grid;
  yaxis display=none grid;

The VECTOR plot statement supports the COLORRESPONSE option.  We have used the same variable "A" as used for the Y role, so we can see the color varies correctly by the height Y height of the plot in the graph in the graph above.

Vector_With_Response_Color_CClearly this is even more useful if the color variable is a different measure.  In the graph on the right, I have used a different variable "C" for the color response to view the variation of  another measure by the (x, y) point on the series.  For the graph on the right, I have increased the thickness of the plot.  Click on the graph for a higher resolution graph.

Now, to be sure, this is a somewhat of a "poor man's" variable color series plot.  When viewing the higher resolution image, some of you may have seen an artifact in the thicker line.  Since the curve is made up of short vector segments, each segment does not join up correctly at the edges as the curve thickness increases.

Vector_With_Response_Color_C_10To illustrate this more clearly, I have increased the thickness to 40 pixels for the graph on the right.  Some artifacts are visible in the sharp curved region of the curve as in the top left green segment.  View the graph in higher resolution, and you will see this clearly.  The case on the right is an extreme case for illustration purposes.  For most use cases, a moderately thick line may work.

For the motivated reader, there is a way to get around this artifact using the POLYGON plot instead of a VECTOR plot.  The idea is to create individual polygons for each segment, by computing the points at each corner of the polygon using the normal vector half way between the two segments.  You would use the same technique I used in the previous article for drawing curved links in the diagram.

Anyone wants to post such a solution?

Full SAS 9.40M3 code: Series_With_Response_Color  

Post a Comment

Diagrams with curved links

Let us continue with our journey beyond standard plots and charts.  Often we need to create some simple diagrams to visualize the connections between different entities such as patients and providers or even a social network.

Sketch2_DiagramMany of you may not have a custom tool to create diagrams.  But you have Base SAS, so let us see what we can do with the SGPLOT procedure and some Data Step coding to create simple diagrams.

Note:  The emphasis is on Simple Diagrams.

Say we want to create this simple diagram sketched on the right that I made from a display on the web.  The nodes are shown as circles with node ids of 1-9.  The links are shown as lines with link ids of 1-9.  Nodes and links count need not be the same.

If the location of the nodes can be determined by some other process or procedure, then we can create this diagram using SGPLOT.  So, let us assume the (x, y) coordinates of the nodes is known, and is as per the grid shown in the sketch.

Links2NodesGenerally, the links between the nodes are relationships that are known.  These could represent patients and providers or social networks.  Here are the two data sets.

The Nodes data set contains the information about the nodes, including the unique node id, the location (x, y) of each node and other information.

The Links data set contains only the connectivity information, including the unique link id and the "From" and "To" node.  We could have other information like response that could stand for the frequency of interaction, or dollar value.  Note the LinkId=6 has a high response value.

Displaying the nodes is very straightforward using a SCATTER statement.  Here I have used the FilledOutlined markers along with the a data label displaying the name of the person at the bottom with GROUP=sex.

Network_Nodes_2Here is the SAS 9.40M3 code for display of nodes:

title 'Social Network';
proc sgplot data=network noautolegend aspect=1;
  styleattrs backcolor=cxfaf3f0;
  scatter x=xn y=yn / group=sex
               markerattrs=(symbol=circlefilled size=16)
               filledoutlinedmarkers markerfillattrs=(color=white)
               dataskin=sheen datalabel=name datalabelpos=bottom;
  xaxis min=0 max=4 display=none;
  yaxis min=0 max=4 display=none;

Now we need to add the display of the links.  This can be easily done using the SERIES statement available in SGPLOT.  However, note in the Links data set, we only have the connectivity of the links in the form of the "From"  and "To" nodes.  So, the first thing we have to do is to generate the information needed to draw the links as series plots, with line id, and the (x, y) coordinates of the two end points derived from the Nodes data set.

We do this using the Hash Object as shown in the full code below.  The key aspects are as discussed below:

  • First we create an ordered Hash Object with key of "NodeId", and data of "NodeId', "Xn" and "Yn".
  • Then, for each link in the links data set, we find the "From" node in the Hash object, and write out the coordinates of that node as the starting (x, y) coordinates for the link.
  • Then for each link in the links data set, we find the "To" node in the Hash object, and write out the coordinates of that node as the ending (x, y) coordinates for the same link.
  • At the end of these steps, we have created a Links data with two observations for each link with the (x, y) coordinates of the two ends of the link.

Network_StraightNow, we can merge the Nodes and Links data sets and use the following program to display the diagram.  We added the SERIES statement to display the links.  The various options of the SCATTER statement are same as before, and are trimmed here to conserve space.

title 'Social Network';
proc sgplot data=network noautolegend aspect=1;
  styleattrs backcolor=cxfaf3f0;
  series x=xl y=yl / group=LinkId lineattrs=graphdatadefault;
  scatter x=xn y=yn / group=sex <options>;
  xaxis min=0 max=4 display=none;
  yaxis min=0 max=4 display=none;

At this stage, we have the diagram representing the sketch I started with.  Note, the links are straight lines connecting the from and to nodes.

Network_CurvedBut in the title of the article, I suggested we would draw curved connecting links  to make the display a bit nicer as shown on the right.  This is especially true from an "Infographics" perspective as it inserts some visual interest in the diagram.  The question is how do we do this using SGPLOT.

Starting with SAS 9.40M3, the SGPLOT procedure includes a new statement - The SPLINE plot.  This behaves similar to the SERIES plot, except that it draws smooth splines between the vertices of the segments.  The smooth curve line is guaranteed to start at the first vertex, and end at the last, but is not guaranteed to pass through any of the intermediate vertices which are "control points" that determine the shape of the curve.  This is different from SMOOTHCONNECT for SERIES, where the curve still passes through all the points.

In order to get the curved shape, we need at least 3 points per curve.  So far we have only two, the "From" location and the "To" location for each link.  Now, we need to generate one middle point that is about half way between these two, but offset to one side a bit.  This can be done by using some Vector math.

Sketch2_NormalThe sketch on the right shows one link from point "1" to "2".  For this link (vector), we can compute the direction cosines of the vector as Cx and Cy.  Cx=(x2-x1) /L; where L is the length of the vector.  Similarly, Cy=(y2-y1)/L.

Now, by vector math, the slope of the line normal to this vector (the dashed diagonal line)  has Cxn=-Cy and Cyn=Cx.  The center point of the vector can be computed with Xm=(x1+x2)/2 and Ym=(y1+y2)/2.  The new offset point we want is x3=xm-Cyn*L*F and y3=ym+Cxn*L*F.  F is a factor that moves the point further and closer along the dashed line.  Here I used F=0.15 to create shallow curves.

Using this technique, we compute an extra middle point for each link to create the graph with shallow curved links shown below.

Network_Curved_RespNow, one last item.  Note in the Links data set we had a column "RespA".  This contains a response value for each link that could represent some measure of the importance of the link based on traffic, number of calls, number of references, or some other value.  We can adjust the thickness of the link based on this response value as shown in the graph on the right.  Here, "Ted" and "Bill" have more frequent communication than the other people.

The full code is included in the program linked below.  The SPLINE statement has new options to control line thickness:

spline x=xl y=yl / group=LinkId lineattrs=graphdatadefault
thickresp=respA thickmaxresp=10 thickmax=4;

THICKRESP=RespA makes link thickness based on the column "RespA".  THICKMAX sets the maximum thickness of a link in pixels for THICKMAXRESP setting.  Here we have set THICKMAXRESP=10 and THIXKMAX=4.  So if RespA has a value of 10 for any link, the line thickness will be 4 pixels.  Other sizes will be proportional.

Note:  Here I have shown how you can create simple network diagrams using the SGPLOT procedure.  If the positions of the nodes can be determined, you can display the diagram.  For simple cases, this can often be done in your code.   I am not claiming this provides an alternative to products that solve the entire problem of node layout and display of the diagram.  Algorithms for the computation of the of node locations can get complicated for large diagrams.  Some algorithms are available on the web for MultiLevel Layout and Force-Directed Layout.

Full SAS 9.40M3 program:  Network


Post a Comment

Infographics Bar Chart

Last week I posted an article on creating Infographics using SAS.   The interest shown by the SAS community in this topic came as a surprise.  Also, it so happened by coincidence, a SAS users also  called into Tech Support just about the same time with a query about creating  Infographics type graph for their use.

DelawareThis users wants to create a graph shown in the Dover School link.  The graph is shown on the right.  Click on the graph for a higher resolution view.  The readiness is displayed using icons for the students, one set for "This School" and one for "State" side by side.  The actual values are displayed on the left of the "Bar".

Functionally, the information in the graph can be represented by a 2-cell horizontal bar chart, comparing the readiness of the students in this school for college by their level with the overall readiness for the state.

Rediness_Bar_PanelNow, there are many ways to visualize this data effectively.  One simple way is shown on the right.   The graph displays the same data as side by side bar charts in a class panel.  The code for the graph is shown below, and is longer only because of appearance customization.

title j=l h=1 'College Rediness';
proc sgpanel data=Rediness_Bar_Panel  noautolegend ;
styleattrs datacolors=(%rgbhex(254, 145, 104)
                   %rgbhex(130, 109, 146))
  Panel_Datapanelby School / columns=2 novarname onepanel sort=data
                  noheaderborder noborder;
    hbar level / response=value datalabel group= school
              baselineattrs=(thickness=0) dataskin=pressed;
    rowaxis reverse display=(nolabel noticks noline) splitchar='.'
    colaxis offsetmin=0 display=none grid;

The data set is shown on the right.  In this graph I have tried to mimic the layout of the display using simpler bar chart.  However, a better comparison of the data for This School vs  State can be made for each level using a clustered HBAR as shown below.

Rediness_Bar_ClusterThe graph on the right places the data for This School and State adjacent to each other by Level, and the proximity of the bars allows for a better comparison of the values.  Yes, there is a way to make the order of the group values in the legend mimic the bars.  See recent article on Legend Order.

title j=l h=1 'College Rediness';
proc sgplot data=Rediness_Bar_Panel nowall noborder;
  styleattrs datacolors=(%rgbhex(254, 145, 104) %rgbhex(130, 109, 146))
    hbar level / response=value datalabel group=school groupdisplay=cluster
              baselineattrs=(thickness=0) dataskin=pressed grouporder=reversedata;
    yaxis reverse display=(nolabel noticks noline) splitchar='.' fitpolicy=splitalways;
    xaxis offsetmin=0 display=none grid;
    keylegend / location=inside position=bottomright across=1;

Readiness_Info_4Having said that, let us now turn our attention to how we can create the "Infographics" type graph shown at the top using SGPLOT.  The graph on the right is created using SAS 9.4 SGPLOT procedure.  Click on the graph to see a higher resolution view.  This graph is very much like the one on top, and the header icons can be added.

The main difference is that I have used only one "Partial" icon per school.  In the graph at the top,  a full color icon is displayed for every 10% value, and a gray icon is displayed for every 10% less than 100.  But the icon for the partial number (say 74) is about 4% filled and 6% empty.  In the graph on the right, I have used only one "Partial" icon to represent such a case.

For this graph I have use SCATTER plot to display an "ImageMarker" at every 10% location for each graph.  So, I have to first expand the data into multiple observations, 10 for each Level and School.  For each of these I generate either a full color icon, or the gray color icon.  For the case where the value is not a round 10%, I have to generate a marker using the "Partial" marker.

I defined 5 icons, 2 full color icons for each school, 2 "Partial" icons for each school, and one gray icon.  I used the SYMBOLIMAGE statement to create marker shapes from each of these, and then used them to draw the graph.  The full code is attached in the link below.  You will need the icons to really run the program, but you get the idea.  More "Partial" icons could be created and used to get finer coloring like in the original graph above.  I will leave that as an exercise to the interested reader.

Now, Dan Heath has suggested an interesting idea, where the icon could be a "Mask", that allows a background color to show through the marker, while masking the rest with opaque white.  Then one could draw a bar chart of the right colors behind, and just draw the mask icons in front.  Using an appropriately sized markers in front, one could get the correct fractional shaded shape, and we only need one icon, not 5.  I suspect this will need some tweaking, but I will give it a go to see what can be done with this idea.

Graph and program updated to add headers.

SAS 9.4 SGPLOT Code:  Info_Graphics_Bar_2

Post a Comment

Box Plot with Proportional Widths

Last week a question was posted on the communities page about creating Box Plots where the width of each box is proportional to the frequency for the category.  The comment was that PROC BOXPLOT can create such a graph, but there seems no way to do this using the SGPLOT procedure.

The user is right.  The SGPLOT procedure does not provide a way to create box plots where the width of each box is proportional to frequency.  However, there is a way to create such a graph using SGPLOT and GTL and a bit of coding.

We know by now that SGPLOT scripts out a GTL template along with the data needed to render the graph.  SGPLOT scripts a template using the BoxPlotParm statement.  This statement can render a box plot from a data set with three columns - X, Statistic and Y.

For this article, we will create a box plot of Mileage by Type where each box width is proportional to the frequency for each category.  First, we run the SGPLOT procedure for a basic box plot of the same variables as follows.  We add the ODS OUTPUT statement to save the processed data into the 'BoxData' data set.

ods output sgplot=boxdata
ods graphics / reset width=5in height=3in imagename='Box';
title 'Mileage by Type';
Boxproc sgplot;
  vbox mpg_city / category=type;
  xaxis display=(nolabel);

The graph of Mileage by Type is  displayed on the right.  Note, all boxes are of the same width.  A BOXWIDTH option is available, but it takes a scaler, and is applied to all boxes.

BoxWidth_Data_1The SGPLOT procedure also generates the data necessary to render the graph using the BoxPlotParm GTL statement.  The generated data set contains three computed columns that have long names ending with "_X", "_Y" and _ST".  I have renamed these above to "X", "Y" and "Stat".  For each category value "X", the "Stat" variable contains names of various box plot statistics such as "MEAN", "MEDIAN", "Q1", "Q3" and so on.  The Y variable contains the corresponding y value used to draw each box.

Starting with SAS 9.3, the BoxPlotParm statement also supports the BOXWIDTH statistic, which is a fractional value 0.0 - 1.0, and determines the width of each box.  What we need to do is to compute this statistic based on N, and insert it into the data set.  Then, we can use a GTL template with the BoxPlotParm procedure to render this graph.

BoxWidth_Data_3I have used some data step code (see in linked code) to first compute the maximum value of N for the data.  Then, I have computed the appropriate fractional value for BoxWidth for each category, and inserted a new observation into the data set as shown in the figure on the right.  After I encounter "N" in the data set, I script out the new observation with Stat='BOXWIDTH' as seen in obs #9 on the right.

Now, we create a simple GTL template using the BoxPlotParm statement to render this data.  The template is shown below.  Then, we run the SGRENDER procedure using the new data set and the template to create the graph.

BoxWidthproc template;
  define statgraph BoxWidth;
      entrytitle 'Mileage by Type';
      entrytitle 'Box Width is Proportional to N';
      layout overlay;
        boxplotparm x=x y=y stat=stat;
        scatterplot x=x y=eval(y*0+5) /

proc sgrender data=boxdataWidth template=BoxWidth;
  format n 3.0;

In the graph above, the box plot is now rendered showing the various statistics as before, but now the width of each box is made proportional to N using the new BOXWIDTH statistics value.  Just for verification, I have also displayed the value of N at the bottom of each box.  Click on graph for bigger graph

Full SAS 9.3 Code:  Box_With_Variable_Width_93

Post a Comment

Infographics using SAS

Infographics is all the rage today.  Open any magazine or newspaper and we see data and numbers everywhere.  Often, such information is displayed by adding some graphical information to add context to the data.  A couple of good examples are Communicating numeric information, and Facts about Hot Dogs.

Infographics1ARiley Benson, our UX expert explained it this way.  The reason such infographics are used is to provide a memorable item associated with the numbers.  An aesthetic graphic invites the user to spend more time viewing the graphic and number, thus making it more memorable.  While the icon label is needed initially, frequent usage of the same icons makes them easier to recognize later without the label.

I was curious about how we could leverage GTL or SG procedures to make such graphs easier.  So, I created the graphic on the right for display of the % revenues from a particular sector, in this case "Utilities".  I found an image, and used PROC SGPLOT to create this "InfoGraphic"

Given a value to be displayed, an icon and the label, it is easy to create the display on the right.  The SAS 9.4 code is shown below.

title h=1 '2015 Software Revenue for Utilities';
proc sgplot data=infographics (where=(industry='Utilities'))
         pad=(left=20 right=20 bottom=20) noborder;
  symbolimage name=Utilities image="&file7";
  styleattrs backcolor=cxfaf3f0;
  scatter x=x y=y / group=industry dataskin=sheen
                markerattrs=(symbol=Utilities size=120);
  text x=xlbl y=ylbl text=value / textattrs=(size=24);
  text x=xnam y=ynam text=industry / textattrs=(size=12);
  xaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;
  yaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;

In the code above, we have used the SYMBOLIMAGE statement to create a new marker symbol from the image icon for "Utilities", which is the "Bulb".  Additional columns are used for the "Value" and also the (x, y) location of the icon and the value.  In this example, I have customized the (x, y) locations for the icon and the value based on their arrangement.

Data1The data for the program is on the right.  The data includes the Value, Industry, Image File Name for the icon, the (x, y) location for the icon center, the value label and the industry name.  In the code above, only one industry (Utilities) is used.

The nice part of using the code is that you can change the relative layout of the icon, value and label easily.  Also, we can create a panel of the values by industry in any layout.

Infographics2The graph on the right is a 4 column layout of all the industries using the SGPANEL procedure, showing all the icons and values from the data set above.  I have used the SGPANEL procedure to arrange the layout.  The icons, values and labels all fall into place easily.

The SGPANEL program is shown below.  Note the use of ATTRPRIORITY=NONE on the ODS Graphics statement.  This is required in case a color priority style like HTMLBlue is active, because we want to cycle through all marker shapes per group.  Also note the use of SORT=DATA to ensure the panel classifiers are in the data order so the data and the layout are in sync.

Infographics3ods graphics / reset attrpriority=none noborder;
title '2015 Software Revenue by Industry';
proc sgpanel data=infographics pad=(left=20 right=20);
  panelby industry / noborder noheader spacing=20
                   onepanel columns=4 sort=data;
  symbolimage name=Banking image="&file1";
  symbolimage name=Government image="&file2";
  symbolimage name=Services image="&file3";
  symbolimage name=Insurance image="&file4";
  symbolimage name=LifeSciences image="&file5";
  symbolimage name=Retail image="&file6";
  symbolimage name=Utilities image="&file7";
  symbolimage name=Education image="&file8";
  styleattrs datasymbols=(Banking Government Services
                       Insurance LifeSciences Retail Utilities
                     Education) backcolor=cxfaf3f0;
  scatter x=x y=y / group=industry markerattrs=(size=120)
  text x=xval y=yval text=value / textattrs=(size=14);
  text x=xnam y=ynam text=industry / textattrs=(size=7);
  colaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;
  rowaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;

A 2x4 layout can easily be created by changing the panelby settings.

Full SAS 9.4 code:  Info_Graphics

Post a Comment

Legend Order

In the previous article on managing legends, I described the way to include items in a legend that may not exist in the data.  This is done by defining a Discrete Attribute Map, and then requesting that all the values defined in the map should be displayed in the legend.

AE_4In the graph on the right, the data contains only Severity values of "Mild" and "Moderate".  However, since three values are defined in the attribute map, and "Show" column is set to "Attrmap", all values for the group are displayed.  That causes the value "Severe" to be displayed in the legend, even though there is no observation in the data with this severity.

Another useful (and intentional) result is the legend items are displayed in the order they are defined in the discrete attribute map as it allows you to control the order of the items in the legend.  This feature is also useful to addresses an issue that a user was grappling with recently as described below.

The order of the items in the legend is based on the order the group values are encountered in the data.  Legend values can be sorted in alphabetical order, but if you want a custom order, you can use the attribute map as discussed below.

Bar_1The graph on the right shows the stacked cumulative counts for the cars by Type and Origin.  The legend inside is intentionally set with one column to make it easier to associate the colors with the stacking order.  However, the order in the legend is the reverse of the order in the graph.

I can change the stacking order by setting the GROUPORDER option to "ReverseData".  However, the order in the legend also reverses, thus keeping the legend order out of sync with the bar order.

Bar_4The way to address this is to use the Discrete Attr Map, and provide the group values and the corresponding colors in the order you want.  Now, the legend item values will be displayed in the order of the values defined in the attr map.  Note the items in the legend in the graph now are in the same order as the bar segments.

Note also in the Attr Map, we have not used actual fill color, like in the first case, but instead we have used the style elements.  This can be done by using the FillStyleElement column name instead of the FillColor column name.

CarsShowAll_1The view of the Attribute Map for the the graph above is shown on the right.  The code for the attr map and the graph is shown below.

data CarsShowAll;
  retain Id 'Origin' Show 'Attrmap';
  length Value $10 FillStyleElement $15;
  input value $ FillStyleElement $;
USA       graphdata3
Europe graphdata2
Asia       graphdata1

title "Counts by Type and Origin";
proc sgplot dattrmap=carsShowAll nowall noborder;
  vbar type / group=origin dataskin=gloss filltype=gradient
                         baselineattrs=(thickness=0) attrid=Origin;;
  keylegend / location=inside across=1 position=topright opaque
                           fillheight=12px fillaspect=golden;
  xaxis display=(noline noticks nolabel);
  yaxis display=(noline nolabel noticks) grid;

Full SAS 9.4 Code:  Legend_Order

Post a Comment