Series Plot with Response Color Segments

Returning from my recent visit to India, I was reading an article that included a graph with a series plot where the color of the series itself changed based on the Y response.  Now, for sure, the SAS 9.40M3 SERIES plot in the SGPLOT procedure supports color response, but that applies to the entire curve.  If the series has multiple groups, the curve for each group can have a color by a response variable.  I discussed this features in the article about Response Color and Thickness.

Vector_With_Response_ColorBut what if you want to vary the color of the segments of the single series itself based on some response value, say the Y variable as shown in the graph on the right?  How would we do this?

The SERIES plot statement in SGPLOT cannot create such a graph, nor can GTL.  But, I created the graph on the right using SGPLOT.  How did I do this?

The answer is using a VECTOR plot.  The series plot needs data with two columns for (x, y), and a number of observations that decide the shape of the curve.  I added two additional columns (xp, yp), which includes the previous (x, y) location.  So, now I have a data set with a series of observations having the data for short segments of the series plot as vectors.  Then, I use these four columns to draw the curve shown above, the shape of the curve is identical to the shape if drawn by the SERIES plot statement.

title 'Series Plot with Color Response by Date';
proc sgplot data=series subpixel noborder;
  vector x=date y=a / xorigin=Prevdate yorigin=preva noarrowheads
              colorresponse=a colormodel=(red yellow green) lineattrs=(thickness=2);
  xaxis display=(noline noticks nolabel) grid;
  yaxis display=none grid;
run;

The VECTOR plot statement supports the COLORRESPONSE option.  We have used the same variable "A" as used for the Y role, so we can see the color varies correctly by the height Y height of the plot in the graph in the graph above.

Vector_With_Response_Color_CClearly this is even more useful if the color variable is a different measure.  In the graph on the right, I have used a different variable "C" for the color response to view the variation of  another measure by the (x, y) point on the series.  For the graph on the right, I have increased the thickness of the plot.  Click on the graph for a higher resolution graph.

Now, to be sure, this is a somewhat of a "poor man's" variable color series plot.  When viewing the higher resolution image, some of you may have seen an artifact in the thicker line.  Since the curve is made up of short vector segments, each segment does not join up correctly at the edges as the curve thickness increases.

Vector_With_Response_Color_C_10To illustrate this more clearly, I have increased the thickness to 40 pixels for the graph on the right.  Some artifacts are visible in the sharp curved region of the curve as in the top left green segment.  View the graph in higher resolution, and you will see this clearly.  The case on the right is an extreme case for illustration purposes.  For most use cases, a moderately thick line may work.

For the motivated reader, there is a way to get around this artifact using the POLYGON plot instead of a VECTOR plot.  The idea is to create individual polygons for each segment, by computing the points at each corner of the polygon using the normal vector half way between the two segments.  You would use the same technique I used in the previous article for drawing curved links in the diagram.

Anyone wants to post such a solution?

Full SAS 9.40M3 code: Series_With_Response_Color  

Post a Comment

Diagrams with curved links

Let us continue with our journey beyond standard plots and charts.  Often we need to create some simple diagrams to visualize the connections between different entities such as patients and providers or even a social network.

Sketch2_DiagramMany of you may not have a custom tool to create diagrams.  But you have Base SAS, so let us see what we can do with the SGPLOT procedure and some Data Step coding to create simple diagrams.

Note:  The emphasis is on Simple Diagrams.

Say we want to create this simple diagram sketched on the right that I made from a display on the web.  The nodes are shown as circles with node ids of 1-9.  The links are shown as lines with link ids of 1-9.  Nodes and links count need not be the same.

If the location of the nodes can be determined by some other process or procedure, then we can create this diagram using SGPLOT.  So, let us assume the (x, y) coordinates of the nodes is known, and is as per the grid shown in the sketch.

Links2NodesGenerally, the links between the nodes are relationships that are known.  These could represent patients and providers or social networks.  Here are the two data sets.

The Nodes data set contains the information about the nodes, including the unique node id, the location (x, y) of each node and other information.

The Links data set contains only the connectivity information, including the unique link id and the "From" and "To" node.  We could have other information like response that could stand for the frequency of interaction, or dollar value.  Note the LinkId=6 has a high response value.

Displaying the nodes is very straightforward using a SCATTER statement.  Here I have used the FilledOutlined markers along with the a data label displaying the name of the person at the bottom with GROUP=sex.

Network_Nodes_2Here is the SAS 9.40M3 code for display of nodes:

title 'Social Network';
proc sgplot data=network noautolegend aspect=1;
  styleattrs backcolor=cxfaf3f0;
  scatter x=xn y=yn / group=sex
               markerattrs=(symbol=circlefilled size=16)
               filledoutlinedmarkers markerfillattrs=(color=white)
               markeroutlineattrs=(thickness=4)
               dataskin=sheen datalabel=name datalabelpos=bottom;
  xaxis min=0 max=4 display=none;
  yaxis min=0 max=4 display=none;
run;

Now we need to add the display of the links.  This can be easily done using the SERIES statement available in SGPLOT.  However, note in the Links data set, we only have the connectivity of the links in the form of the "From"  and "To" nodes.  So, the first thing we have to do is to generate the information needed to draw the links as series plots, with line id, and the (x, y) coordinates of the two end points derived from the Nodes data set.

We do this using the Hash Object as shown in the full code below.  The key aspects are as discussed below:

  • First we create an ordered Hash Object with key of "NodeId", and data of "NodeId', "Xn" and "Yn".
  • Then, for each link in the links data set, we find the "From" node in the Hash object, and write out the coordinates of that node as the starting (x, y) coordinates for the link.
  • Then for each link in the links data set, we find the "To" node in the Hash object, and write out the coordinates of that node as the ending (x, y) coordinates for the same link.
  • At the end of these steps, we have created a Links data with two observations for each link with the (x, y) coordinates of the two ends of the link.

Network_StraightNow, we can merge the Nodes and Links data sets and use the following program to display the diagram.  We added the SERIES statement to display the links.  The various options of the SCATTER statement are same as before, and are trimmed here to conserve space.

title 'Social Network';
proc sgplot data=network noautolegend aspect=1;
  styleattrs backcolor=cxfaf3f0;
  series x=xl y=yl / group=LinkId lineattrs=graphdatadefault;
  scatter x=xn y=yn / group=sex <options>;
  xaxis min=0 max=4 display=none;
  yaxis min=0 max=4 display=none;
run;

At this stage, we have the diagram representing the sketch I started with.  Note, the links are straight lines connecting the from and to nodes.

Network_CurvedBut in the title of the article, I suggested we would draw curved connecting links  to make the display a bit nicer as shown on the right.  This is especially true from an "Infographics" perspective as it inserts some visual interest in the diagram.  The question is how do we do this using SGPLOT.

Starting with SAS 9.40M3, the SGPLOT procedure includes a new statement - The SPLINE plot.  This behaves similar to the SERIES plot, except that it draws smooth splines between the vertices of the segments.  The smooth curve line is guaranteed to start at the first vertex, and end at the last, but is not guaranteed to pass through any of the intermediate vertices which are "control points" that determine the shape of the curve.  This is different from SMOOTHCONNECT for SERIES, where the curve still passes through all the points.

In order to get the curved shape, we need at least 3 points per curve.  So far we have only two, the "From" location and the "To" location for each link.  Now, we need to generate one middle point that is about half way between these two, but offset to one side a bit.  This can be done by using some Vector math.

Sketch2_NormalThe sketch on the right shows one link from point "1" to "2".  For this link (vector), we can compute the direction cosines of the vector as Cx and Cy.  Cx=(x2-x1) /L; where L is the length of the vector.  Similarly, Cy=(y2-y1)/L.

Now, by vector math, the slope of the line normal to this vector (the dashed diagonal line)  has Cxn=-Cy and Cyn=Cx.  The center point of the vector can be computed with Xm=(x1+x2)/2 and Ym=(y1+y2)/2.  The new offset point we want is x3=xm-Cyn*L*F and y3=ym+Cxn*L*F.  F is a factor that moves the point further and closer along the dashed line.  Here I used F=0.15 to create shallow curves.

Using this technique, we compute an extra middle point for each link to create the graph with shallow curved links shown below.

Network_Curved_RespNow, one last item.  Note in the Links data set we had a column "RespA".  This contains a response value for each link that could represent some measure of the importance of the link based on traffic, number of calls, number of references, or some other value.  We can adjust the thickness of the link based on this response value as shown in the graph on the right.  Here, "Ted" and "Bill" have more frequent communication than the other people.

The full code is included in the program linked below.  The SPLINE statement has new options to control line thickness:

spline x=xl y=yl / group=LinkId lineattrs=graphdatadefault
thickresp=respA thickmaxresp=10 thickmax=4;

THICKRESP=RespA makes link thickness based on the column "RespA".  THICKMAX sets the maximum thickness of a link in pixels for THICKMAXRESP setting.  Here we have set THICKMAXRESP=10 and THIXKMAX=4.  So if RespA has a value of 10 for any link, the line thickness will be 4 pixels.  Other sizes will be proportional.

Note:  Here I have shown how you can create simple network diagrams using the SGPLOT procedure.  If the positions of the nodes can be determined, you can display the diagram.  For simple cases, this can often be done in your code.   I am not claiming this provides an alternative to products that solve the entire problem of node layout and display of the diagram.  Algorithms for the computation of the of node locations can get complicated for large diagrams.  Some algorithms are available on the web for MultiLevel Layout and Force-Directed Layout.

Full SAS 9.40M3 program:  Network

 

Post a Comment

Infographics Bar Chart

Last week I posted an article on creating Infographics using SAS.   The interest shown by the SAS community in this topic came as a surprise.  Also, it so happened by coincidence, a SAS users also  called into Tech Support just about the same time with a query about creating  Infographics type graph for their use.

DelawareThis users wants to create a graph shown in the Dover School link.  The graph is shown on the right.  Click on the graph for a higher resolution view.  The readiness is displayed using icons for the students, one set for "This School" and one for "State" side by side.  The actual values are displayed on the left of the "Bar".

Functionally, the information in the graph can be represented by a 2-cell horizontal bar chart, comparing the readiness of the students in this school for college by their level with the overall readiness for the state.

Rediness_Bar_PanelNow, there are many ways to visualize this data effectively.  One simple way is shown on the right.   The graph displays the same data as side by side bar charts in a class panel.  The code for the graph is shown below, and is longer only because of appearance customization.

title j=l h=1 'College Rediness';
proc sgpanel data=Rediness_Bar_Panel  noautolegend ;
styleattrs datacolors=(%rgbhex(254, 145, 104)
                   %rgbhex(130, 109, 146))
                   datacontrastcolors=(black);
  Panel_Datapanelby School / columns=2 novarname onepanel sort=data
                  noheaderborder noborder;
    hbar level / response=value datalabel group= school
              baselineattrs=(thickness=0) dataskin=pressed;
    rowaxis reverse display=(nolabel noticks noline) splitchar='.'
                    fitpolicy=splitalways;
    colaxis offsetmin=0 display=none grid;
run;

The data set is shown on the right.  In this graph I have tried to mimic the layout of the display using simpler bar chart.  However, a better comparison of the data for This School vs  State can be made for each level using a clustered HBAR as shown below.

Rediness_Bar_ClusterThe graph on the right places the data for This School and State adjacent to each other by Level, and the proximity of the bars allows for a better comparison of the values.  Yes, there is a way to make the order of the group values in the legend mimic the bars.  See recent article on Legend Order.

title j=l h=1 'College Rediness';
proc sgplot data=Rediness_Bar_Panel nowall noborder;
  styleattrs datacolors=(%rgbhex(254, 145, 104) %rgbhex(130, 109, 146))
                     datacontrastcolors=(black);
    hbar level / response=value datalabel group=school groupdisplay=cluster
              baselineattrs=(thickness=0) dataskin=pressed grouporder=reversedata;
    yaxis reverse display=(nolabel noticks noline) splitchar='.' fitpolicy=splitalways;
    xaxis offsetmin=0 display=none grid;
    keylegend / location=inside position=bottomright across=1;
run;

Readiness_Info_4Having said that, let us now turn our attention to how we can create the "Infographics" type graph shown at the top using SGPLOT.  The graph on the right is created using SAS 9.4 SGPLOT procedure.  Click on the graph to see a higher resolution view.  This graph is very much like the one on top, and the header icons can be added.

The main difference is that I have used only one "Partial" icon per school.  In the graph at the top,  a full color icon is displayed for every 10% value, and a gray icon is displayed for every 10% less than 100.  But the icon for the partial number (say 74) is about 4% filled and 6% empty.  In the graph on the right, I have used only one "Partial" icon to represent such a case.

For this graph I have use SCATTER plot to display an "ImageMarker" at every 10% location for each graph.  So, I have to first expand the data into multiple observations, 10 for each Level and School.  For each of these I generate either a full color icon, or the gray color icon.  For the case where the value is not a round 10%, I have to generate a marker using the "Partial" marker.

I defined 5 icons, 2 full color icons for each school, 2 "Partial" icons for each school, and one gray icon.  I used the SYMBOLIMAGE statement to create marker shapes from each of these, and then used them to draw the graph.  The full code is attached in the link below.  You will need the icons to really run the program, but you get the idea.  More "Partial" icons could be created and used to get finer coloring like in the original graph above.  I will leave that as an exercise to the interested reader.

Now, Dan Heath has suggested an interesting idea, where the icon could be a "Mask", that allows a background color to show through the marker, while masking the rest with opaque white.  Then one could draw a bar chart of the right colors behind, and just draw the mask icons in front.  Using an appropriately sized markers in front, one could get the correct fractional shaded shape, and we only need one icon, not 5.  I suspect this will need some tweaking, but I will give it a go to see what can be done with this idea.

Graph and program updated to add headers.

SAS 9.4 SGPLOT Code:  Info_Graphics_Bar_2

Post a Comment

Box Plot with Proportional Widths

Last week a question was posted on the communities page about creating Box Plots where the width of each box is proportional to the frequency for the category.  The comment was that PROC BOXPLOT can create such a graph, but there seems no way to do this using the SGPLOT procedure.

The user is right.  The SGPLOT procedure does not provide a way to create box plots where the width of each box is proportional to frequency.  However, there is a way to create such a graph using SGPLOT and GTL and a bit of coding.

We know by now that SGPLOT scripts out a GTL template along with the data needed to render the graph.  SGPLOT scripts a template using the BoxPlotParm statement.  This statement can render a box plot from a data set with three columns - X, Statistic and Y.

For this article, we will create a box plot of Mileage by Type where each box width is proportional to the frequency for each category.  First, we run the SGPLOT procedure for a basic box plot of the same variables as follows.  We add the ODS OUTPUT statement to save the processed data into the 'BoxData' data set.

ods output sgplot=boxdata
        (rename=(BOX_MPG_CITY_X_TYPE_SORTORDER__X=X
                            BOX_MPG_CITY_X_TYPE_SORTORDER__Y=Y
                           BOX_MPG_CITY_X_TYPE_SORTORDER_ST=Stat));
ods graphics / reset width=5in height=3in imagename='Box';
title 'Mileage by Type';
Boxproc sgplot data=sashelp.cars;
  vbox mpg_city / category=type;
  xaxis display=(nolabel);
run;

The graph of Mileage by Type is  displayed on the right.  Note, all boxes are of the same width.  A BOXWIDTH option is available, but it takes a scaler, and is applied to all boxes.

BoxWidth_Data_1The SGPLOT procedure also generates the data necessary to render the graph using the BoxPlotParm GTL statement.  The generated data set contains three computed columns that have long names ending with "_X", "_Y" and _ST".  I have renamed these above to "X", "Y" and "Stat".  For each category value "X", the "Stat" variable contains names of various box plot statistics such as "MEAN", "MEDIAN", "Q1", "Q3" and so on.  The Y variable contains the corresponding y value used to draw each box.

Starting with SAS 9.3, the BoxPlotParm statement also supports the BOXWIDTH statistic, which is a fractional value 0.0 - 1.0, and determines the width of each box.  What we need to do is to compute this statistic based on N, and insert it into the data set.  Then, we can use a GTL template with the BoxPlotParm procedure to render this graph.

BoxWidth_Data_3I have used some data step code (see in linked code) to first compute the maximum value of N for the data.  Then, I have computed the appropriate fractional value for BoxWidth for each category, and inserted a new observation into the data set as shown in the figure on the right.  After I encounter "N" in the data set, I script out the new observation with Stat='BOXWIDTH' as seen in obs #9 on the right.

Now, we create a simple GTL template using the BoxPlotParm statement to render this data.  The template is shown below.  Then, we run the SGRENDER procedure using the new data set and the template to create the graph.

BoxWidthproc template;
  define statgraph BoxWidth;
    begingraph;
      entrytitle 'Mileage by Type';
      entrytitle 'Box Width is Proportional to N';
      layout overlay;
        boxplotparm x=x y=y stat=stat;
        scatterplot x=x y=eval(y*0+5) /
                      markercharacter=n;
      endlayout;
    endgraph;
  end;
run;

proc sgrender data=boxdataWidth template=BoxWidth;
  format n 3.0;
run;

In the graph above, the box plot is now rendered showing the various statistics as before, but now the width of each box is made proportional to N using the new BOXWIDTH statistics value.  Just for verification, I have also displayed the value of N at the bottom of each box.  Click on graph for bigger graph

Full SAS 9.3 Code:  Box_With_Variable_Width_93

Post a Comment

Infographics using SAS

Infographics is all the rage today.  Open any magazine or newspaper and we see data and numbers everywhere.  Often, such information is displayed by adding some graphical information to add context to the data.  A couple of good examples are Communicating numeric information, and Facts about Hot Dogs.

Infographics1ARiley Benson, our UX expert explained it this way.  The reason such infographics are used is to provide a memorable item associated with the numbers.  An aesthetic graphic invites the user to spend more time viewing the graphic and number, thus making it more memorable.  While the icon label is needed initially, frequent usage of the same icons makes them easier to recognize later without the label.

I was curious about how we could leverage GTL or SG procedures to make such graphs easier.  So, I created the graphic on the right for display of the % revenues from a particular sector, in this case "Utilities".  I found an image, and used PROC SGPLOT to create this "InfoGraphic"

Given a value to be displayed, an icon and the label, it is easy to create the display on the right.  The SAS 9.4 code is shown below.

title h=1 '2015 Software Revenue for Utilities';
proc sgplot data=infographics (where=(industry='Utilities'))
         pad=(left=20 right=20 bottom=20) noborder;
  symbolimage name=Utilities image="&file7";
  styleattrs backcolor=cxfaf3f0;
  scatter x=x y=y / group=industry dataskin=sheen
                markerattrs=(symbol=Utilities size=120);
  text x=xlbl y=ylbl text=value / textattrs=(size=24);
  text x=xnam y=ynam text=industry / textattrs=(size=12);
  xaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;
  yaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;
run;

In the code above, we have used the SYMBOLIMAGE statement to create a new marker symbol from the image icon for "Utilities", which is the "Bulb".  Additional columns are used for the "Value" and also the (x, y) location of the icon and the value.  In this example, I have customized the (x, y) locations for the icon and the value based on their arrangement.

Data1The data for the program is on the right.  The data includes the Value, Industry, Image File Name for the icon, the (x, y) location for the icon center, the value label and the industry name.  In the code above, only one industry (Utilities) is used.

The nice part of using the code is that you can change the relative layout of the icon, value and label easily.  Also, we can create a panel of the values by industry in any layout.

Infographics2The graph on the right is a 4 column layout of all the industries using the SGPANEL procedure, showing all the icons and values from the data set above.  I have used the SGPANEL procedure to arrange the layout.  The icons, values and labels all fall into place easily.

The SGPANEL program is shown below.  Note the use of ATTRPRIORITY=NONE on the ODS Graphics statement.  This is required in case a color priority style like HTMLBlue is active, because we want to cycle through all marker shapes per group.  Also note the use of SORT=DATA to ensure the panel classifiers are in the data order so the data and the layout are in sync.

Infographics3ods graphics / reset attrpriority=none noborder;
title '2015 Software Revenue by Industry';
proc sgpanel data=infographics pad=(left=20 right=20);
  panelby industry / noborder noheader spacing=20
                   onepanel columns=4 sort=data;
  symbolimage name=Banking image="&file1";
  symbolimage name=Government image="&file2";
  symbolimage name=Services image="&file3";
  symbolimage name=Insurance image="&file4";
  symbolimage name=LifeSciences image="&file5";
  symbolimage name=Retail image="&file6";
  symbolimage name=Utilities image="&file7";
  symbolimage name=Education image="&file8";
  styleattrs datasymbols=(Banking Government Services
                       Insurance LifeSciences Retail Utilities
                     Education) backcolor=cxfaf3f0;
  scatter x=x y=y / group=industry markerattrs=(size=120)
                dataskin=sheen;
  text x=xval y=yval text=value / textattrs=(size=14);
  text x=xnam y=ynam text=industry / textattrs=(size=7);
  colaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;
  rowaxis min=-2 max=2 display=none offsetmin=0 offsetmax=0;
run;

A 2x4 layout can easily be created by changing the panelby settings.

Full SAS 9.4 code:  Info_Graphics

Post a Comment

Legend Order

In the previous article on managing legends, I described the way to include items in a legend that may not exist in the data.  This is done by defining a Discrete Attribute Map, and then requesting that all the values defined in the map should be displayed in the legend.

AE_4In the graph on the right, the data contains only Severity values of "Mild" and "Moderate".  However, since three values are defined in the attribute map, and "Show" column is set to "Attrmap", all values for the group are displayed.  That causes the value "Severe" to be displayed in the legend, even though there is no observation in the data with this severity.

Another useful (and intentional) result is the legend items are displayed in the order they are defined in the discrete attribute map as it allows you to control the order of the items in the legend.  This feature is also useful to addresses an issue that a user was grappling with recently as described below.

The order of the items in the legend is based on the order the group values are encountered in the data.  Legend values can be sorted in alphabetical order, but if you want a custom order, you can use the attribute map as discussed below.

Bar_1The graph on the right shows the stacked cumulative counts for the cars by Type and Origin.  The legend inside is intentionally set with one column to make it easier to associate the colors with the stacking order.  However, the order in the legend is the reverse of the order in the graph.

I can change the stacking order by setting the GROUPORDER option to "ReverseData".  However, the order in the legend also reverses, thus keeping the legend order out of sync with the bar order.

Bar_4The way to address this is to use the Discrete Attr Map, and provide the group values and the corresponding colors in the order you want.  Now, the legend item values will be displayed in the order of the values defined in the attr map.  Note the items in the legend in the graph now are in the same order as the bar segments.

Note also in the Attr Map, we have not used actual fill color, like in the first case, but instead we have used the style elements.  This can be done by using the FillStyleElement column name instead of the FillColor column name.

CarsShowAll_1The view of the Attribute Map for the the graph above is shown on the right.  The code for the attr map and the graph is shown below.

data CarsShowAll;
  retain Id 'Origin' Show 'Attrmap';
  length Value $10 FillStyleElement $15;
  input value $ FillStyleElement $;
  datalines;
USA       graphdata3
Europe graphdata2
Asia       graphdata1
;
  run;

title "Counts by Type and Origin";
proc sgplot data=sashelp.cars dattrmap=carsShowAll nowall noborder;
  vbar type / group=origin dataskin=gloss filltype=gradient
                         baselineattrs=(thickness=0) attrid=Origin;;
  keylegend / location=inside across=1 position=topright opaque
                           fillheight=12px fillaspect=golden;
  xaxis display=(noline noticks nolabel);
  yaxis display=(noline nolabel noticks) grid;
run;

Full SAS 9.4 Code:  Legend_Order

Post a Comment

Legendary

Entries in a legend are populated automatically based on the data.  When creating a graph with group classification,  the display attributes for each bar are derived from the GraphData1-12 style elements from the active style.

AEThe graph on the right shows you the result of creating an adverse event timeline by AE and Severity.  The data contains four AE names with two severity values.  The severity values are assigned the display attributes from GraphData1 and GraphData2, which for the HTMLBlue style are blue and red.

Now, if the data for today arrives in a different group order, the assignment may change, so it is hard to ensure that the color assignments are consistent.

AE_2This can be addressed by using the Discrete Attribute Map as shown in the graph on the right.  Here we have defined a Discrete Attribute Map where the display attribute for each group value is defined in a data set like a format.

Now, the display attributes such as color or marker symbol for each group are obtained from the attribute map by the value of the group.

AttrMapThe Discrete Attribute Map is a data set with predefined column names as shown on the right.  Multiple maps can be defined in a data set by "ID".  Here we have defined only one map, with ID=Severity.  Three levels are defined, "Mild", "Moderate" and "Severe".  Now, the colors for the each group are well defined, and will remain consistent regardless of the position of the observation in the data.

AE_3Note however, in the graph above, only two of the three defined values are displayed in the legend.  This is normal, and only the values in the data are displayed. However, often we additional classifications for the data that may or may not be in the data at any one time, but we may want to display all the "possible" values for the classification variable in the legend as shown in the graph on the right.  In this graph the legend item for "Severe" with a red color swatch is included in the legend, even though there is no observation in the data set with a group value of "Severe"

AttrMapShowAllWith SAS9.4M3 release, this is easily done by requesting that all the levels for a particular attribute id in a Discrete Attribute Map be shown in the legend.  Note the column "Show" with value of "Attrmap".  This instructs the system to display all values for this AttrId that are marked as "Attrmap".   Note, this is also a great way to populate the legend with other items you may need that are not in the data.

AE_4Another noteworthy feature released with SAS 9.4 is the ability to control the size of the legend items.  When skins are in effect, or with fill patterns, or just because you want it so, it is often desirable if the color swatches in the legend could be made bigger.  This can be done using the new FILLHEIGHT and FILLASPECT options.

title "Adverse Event Timeline Graph by Day";
proc sgplot data=ae dattrmap=attrmapShowAll;
  highlow y=ae low=low high=high / type=bar group=severity
                   dataskin=pressed barwidth=0.8 lowlabel=label
                   attrid=Severity labelattrs=(color=black size=9);
   refline 0 / axis=x;
  xaxis display=(nolabel) values=(0 to 96 by 2);
  yaxis display=(noticks novalues);
  keylegend / fillheight=12px fillaspect=golden;
run;

Full SAS 9.4 Code: Legend 

Post a Comment

Easy Box Plot with Multiple Connect Lines

Last month I wrote an article on connecting multiple statistics by category in a box plot using SGPLOT.  In the first article I described the way you can do this using overlaid SERIES on a VBOX using SAS 9.4, which allows such a combination.  However, if you have SAS 9.3, I described how you can do this using annotation.

Recently a question was posted on the SAS communities site for SAS/GRAPH and ODS Graphics, a question was posted on how to do this when using a BY variable.  That got me thinking on whether there could be an easier way.  Turns out there is.

Note, I changed the examples as connect makes more sense when x axis is numeric.  The data is not important.

 

BoxConnect_1Prior to SAS 9.4, the SGPLOT procedure limits the combination of some plot types.  While "Basic" plots can be layered in any combination, Category plots (VBAR, VLINE) or Distribution plots (VBOX, Histogram) could only be combined with other plots of the same type.  So, a SERIES plot could not be combined with a VBOX.

However, we are allowed to combine multiple VBOX plots since SAS 9.2 and CONNECT is available since SAS 9.3.  So, the idea here is to overlay multiple VBOX statements, each with a different CONNECT option.  This works just fine as shown above.  The only trick is to make sure that only the first VBOX uses the FILL option (default) while all the others use NOFILL.

title 'Distribution of Value by Week';
proc sgplot data=ValueByWeek nocycleattrs noautolegend;
  vbox value / category=week connect=q1;
  vbox value / category=week nofill connect=q3;
  xaxis display=(nolabel);
run;
BoxConnectPanel_1

What could be simpler than this approach?  The additional benefit is that one can easily create a panel of such graphs.

proc sgpanel data=ValueByWeek nocycleattrs noautolegend;
  panelby location/ layout=panel columns=1;
  vbox value / category=week connect=q1;
  vbox value / category=week nofill connect=q3;
  colaxis display=(nolabel);
run;

Further more, this also works when using BY variable processing.  Now, the procedure correctly pages the graph by the BY variable, and each graph has the correct connect lines.  No need to figure out the data needed for the overlaid SERIES plot, or the annotate data set.


BoxConnectBy_1

title 'Distribution of Value by Week';
proc sgplot data=ValueByWeek nocycleattrs noautolegend;
  by location;
  vbox value  / category=week connect=q1;
  vbox value / category=week nofill connect=q3;
  xaxis display=(nolabel);
run;

Full SAS 9.3 SGPLOT code:  Box_Connect_Numeric

Post a Comment

Fit Plot Customizations

A customer wants to use PROC REG to fit a simple regression model but display in the fit plot markers that differentiate groups of individuals.

Click on a graph to enlarge.

wfkfit4
Before we see how to do that, let's look at some simpler examples.

The following step fits a linear regression model and displays an ordinary fit plot:

proc sgplot data=sashelp.class;
   title 'Simple Linear Regression Fit Plot -- PROC SGPLOT';
   reg y=weight x=height / cli clm;
run;

The CLI option produces prediction limits and the CLM option produces confidence limits.

wfkfit
The following steps fit the same model, but males are displayed as filled squares and females are displayed as filled circles:

ods graphics on / attrpriority=none;
 
proc format;
   value $sex 'M' = 'Male' 'F' = 'Female';
run;
 
proc sgplot data=sashelp.class;
   title 'Simple Regression but with a Classification Variable Displayed -- PROC SGPLOT';
   styleattrs datasymbols=(squarefilled circlefilled);
   reg y=weight x=height / cli clm nomarkers;
   scatter y=weight x=height / group=sex  name='scatter';
   keylegend 'scatter' / location=inside across=1 position=topleft;
   format sex $sex.;
run;

wfkfit1

These examples all use the HTMLBlue style, which is an ATTRPRIORITY=COLOR style. The ATTRPRIORITY=NONE option enables marker differences to be displayed as well as color differences. The $SEX format provides meaningful labels in the legend. The STYLEATTRS statement creates the custom markers. The NOMARKERS option suppresses the markers from being displayed by the REG statement. Instead, they are displayed by the SCATTER statement, which uses the GROUP=SEX option to distinguish the groups. The KEYLEGEND statement displays a legend inside the graph.

While this is a nice graph and it is easy to make, the customer specifically wanted PROC REG, because PROC REG displays a table of statistics along with the fit plot. The following step illustrates:

proc reg data=sashelp.class;
   model  weight = height;
quit;

wfkfit2

PROC REG will not use the classification variable SEX in the graph without a template change. However before you can proceed, you need to see if the SEX variable is available in the data object that underlies the graph. The following step outputs the data object to a SAS data set:

proc reg data=sashelp.class;
   ods select fitplot;
   ods output fitplot=fp;
   model weight = height;
   id sex;
quit;

If you print the data set, you will see that the SEX variable is in the output data set, but it is named ID1. In fact, not one of the original variable names is present in the output data set. This is because analytical procedures need to have precise control the data object column names so that the templates will work with the wide variety of models that people specify.

We will use a DATA step and CALL EXECUTE to modify the graph template for the fit plot. There are other ways to modify a template, but the DATA step provides a parsimonious way to show small changes to large templates. You cannot write template modification code like the DATA step below without first looking at the template. The following step writes the fit plot template to a file called temp.tmp:

proc template;
   source Stat.REG.Graphics.Fit / file='temp.tmp';
quit;

The following step reads the template, adds a PROC TEMPLATE statement, drops the MARKERATTRS= option from the SCATTERPLOT statement, and adds the GROUP=ID1 option. It also adds options to the BEGINGRAPH statment to control the markers:

options source;
data _null_;
   infile 'temp.tmp';
   input;
   if _n_ = 1 then call execute('proc template;');
   if left(_infile_) =: 'SCATTERPLOT y=DEPVAR' then do;
      _infile_ = tranwrd(_infile_, 'markerattrs=GRAPHDATADEFAULT', ' ');
      _infile_ = tranwrd(_infile_, '/', '/ group=id1 ');
      end;
   if left(_infile_) =: 'BeginGraph' then
      _infile_ = 'BeginGraph / attrpriority=none' ||
                 ' datasymbols=(squarefilled circlefilled);';
   call execute(_infile_);
run;

Other statements are executed as is. The OPTIONS SOURCE statement is not required. It shows the code that is generated by CALL EXECUTE, so it can help you understand what is happening when things do not work.

The following step uses the modified template:

proc reg data=sashelp.class;
   ods select fitplot;
   model weight=height;
   id sex;
quit;

wfkfit3

You can also add a NAME= option to the SCATTERPLOT statement and a DISCRETETELEGEND statement after the SCATTERPLOT statement to display the values of SEX in a legend:

data _null_;
   infile 'temp.tmp';
   input;
   if _n_ = 1 then call execute('proc template;');
   if left(_infile_) =: 'SCATTERPLOT y=DEPVAR' then do;
      _infile_ = tranwrd(_infile_, 'markerattrs=GRAPHDATADEFAULT', ' ');
      _infile_ = tranwrd(_infile_, '/', '/ group=id1 name="sc"');
      end;
   if left(_infile_) =: 'BeginGraph' then
      _infile_ = 'BeginGraph / attrpriority=none' ||
                 ' datasymbols=(squarefilled circlefilled);';
   call execute(_infile_);
   if left(_infile_) =: 'SCATTERPLOT y=DEPVAR' then
   call execute('discretelegend "sc" / location=inside across=1 autoalign=(topleft);');
run;
 
proc reg data=sashelp.class;
   ods select fitplot;
   model weight=height;
   id sex;
   format sex $sex.;
quit;

wfkfit4

The DATA _NULL_ step reads the same (unmodified) temp.tmp file and creates a new template modification.

The following step deletes the modified template:

proc template;
   delete Stat.REG.Graphics.Fit / store=sasuser.templat;
quit;

This all works because the SEX variable appears in the data object when it is specified in the ID statement. It appears in the data object so that it can appear in HTML tooltips. What if it had not been there? The next part of the example shows how you can output the data object, modify it (that is, merge in the SEX variable), and create the desired graph with PROC SGRENDER. The PROC SGRENDER step uses the modified data object, the modified graph template, and the style template, but it needs one more thing: dynamic variables. Procedures set dynamic variables that control many aspects of the graphs and contain other values such as the statistics that are displayed in the table.

The following step captures the graph, including the dynamic variables and their values, in an ODS document. It also captures the data object in a SAS data set:

ods document name=MyDoc (write);
proc reg data=sashelp.class;
   title 'Not Shown';
   ods select fitplot;
   ods output fitplot=fp;
   model weight=height;
quit;
ods document close;

The following step lists the contents of the ODS document:

proc document name=MyDoc;
   list / levels=all;
quit;

wfkdoc1
You need to copy the path of the graph from the LIST statement output into the OBDYNAM statement.

The following step creates a SAS data set that contains the values of the dynamic variables:

proc document name=MyDoc;
   ods exclude dynamics;
   ods output dynamics=dynamics;
   obdynam \Reg#1\MODEL1#1\ObswiseStats#1\Weight#1\FitPlot#1;
quit;

The following step displays the data set of dynamic variables (some of which are shown):

proc print; 
run;

wfkdoc4

The following step merges the SEX variable into the output data set made from the data object:

data both(drop=height weight rename=(sex=id1));
   merge sashelp.class(keep=height weight sex) fp;
   if height ne _indepvar1 or weight ne depvar then put _all_;
   format sex $sex.;
run;

The SEX variable is renamed ID1 so that it can work with the same template as before. You cannot rely on a merge operation being as simple as the one shown here. Data sets made from graph data objects can vary from input data sets in many ways. An IF statement is added to check the merge only to emphasize that you need to carefully combine data from separate sources and always check your results.

The following step modifies the template (as before):

data _null_;
   infile 'temp.tmp';
   input;
   if _n_ = 1 then call execute('proc template;');
   if left(_infile_) =: 'SCATTERPLOT y=DEPVAR' then do;
      _infile_ = tranwrd(_infile_, 'markerattrs=GRAPHDATADEFAULT', ' ');
      _infile_ = tranwrd(_infile_, '/', '/ group=id1 name="sc"');
      end;
   if left(_infile_) =: 'BeginGraph' then
      _infile_ = 'BeginGraph / attrpriority=none' ||
                 ' datasymbols=(squarefilled circlefilled);';
   call execute(_infile_);
   if left(_infile_) =: 'SCATTERPLOT y=DEPVAR' then
   call execute('discretelegend "sc" / location=inside across=1 autoalign=(topleft);');
run;

The following step uses CALL EXECUTE to run PROC SGRENDER along with a DYNAMIC statement that provides the value of each of the dynamic variables:

data _null_;
   set dynamics(where=(label1 ne '___NOBS___')) end=eof;
   if nmiss(nvalue1) and cvalue1 = '.' then cvalue1 = ' ';
   if _n_ = 1 then do;
      call execute('proc sgrender data=both');
      call execute('template=Stat.REG.Graphics.Fit;');
      call execute('dynamic');
   end;
   if cvalue1 ne ' ' then
      call execute(catx(' ', label1, '=',
                   ifc(n(nvalue1), cvalue1, quote(trim(cvalue1)))));
   if eof then call execute('; run;');
run;

wfkfit6

The DATA _NULL_ step with the CALL EXECUTE statements generate the following DYNAMIC statement:

dynamic _SHOWCLM = 1 _SHOWCLI = 1 _WEIGHT = 0 _SHOWSTATS = 1 _NSTATSCOLS = 2
   _SHOWNOBS = 1 _NOBS = 19 _SHOWTOTFREQ = 0 _TOTFREQ = 19 _SHOWNPARM = 1 
   _NPARM = 2 _SHOWEDF = 1 _EDF = 17 _SHOWMSE = 1 _MSE = 126.02868962 
   _SHOWRSQUARE = 1 _RSQUARE = 0.7705068427 _SHOWADJRSQ = 1 
   _ADJRSQ = 0.7570072452 _SHOWSSE = 0 _SSE = 2142.4877235 _SHOWDEPMEAN = 0
   _DEPMEAN = 100.02631579 _SHOWCV = 0 _CV = 11.223296526 _SHOWAIC = 0 
   _AIC = 93.780394884 _SHOWBIC = 0 _BIC = 96.223301459 _SHOWCP = 0 _CP = 2
   _SHOWGMSEP = 0 _GMSEP = 140.9531397 _SHOWJP = 0 _JP = 139.29486747 
   _SHOWPC = 0 _PC = 0.2834915472 _SHOWSBC = 0 _SBC = 95.669272843 _SHOWSP = 0 
   _SP = 7.876793101 _TITLE = "Fit Plot" _DEPNAME = "Weight" _DEPLABEL = "Weight"
   _SHORTYLABEL = "Weight" _SHORTXLABEL = "Height" _CONFLIMITS = "95% Confidence
   Limits" _PREDLIMITS = "95% Prediction Limits" _XVAR = "_INDEPVAR1";

The following step deletes the modified template:

proc template;
   delete Stat.REG.Graphics.Fit / store=sasuser.templat;
quit;

You can process the data set of dynamic variables and create a similar graph using PROC SGPLOT:

data _null_;
   length s $ 500;
   retain s;
   set dynamics(keep=label1 nvalue1) end=eof;
   if label1 = '_NOBS'    then l = 'Observations';
   if label1 = '_NPARM'   then l = 'Parameters';
   if label1 = '_EDF'     then l = 'Error DF';
   if label1 = '_MSE'     then l = 'MSE';
   if label1 = '_RSQUARE' then l = 'R-Square';
   if label1 = '_ADJRSQ'  then l = 'Adj R-Square';
   if l ne ' ' then s = catx(' ', s, quote(l), '=', quote(put(nvalue1, best6.)));
   if eof then call symputx('insets', s);
run;
 
%put &insets;
 
proc sgplot data=sashelp.class;
   title 'PROC SGPLOT with an Inset Table';
   styleattrs datasymbols=(squarefilled circlefilled);
   reg y=weight x=height / cli clm nomarkers;
   scatter y=weight x=height / group=sex  name='scatter';
   keylegend 'scatter' / location=inside across=1 position=topleft;
   inset (&insets) / position=bottomright border;
   format sex $sex.;
run;

wfk2fit

The DATA step generates the following list of insets:

"Observations" = "    19" "Parameters  " = "     2" "Error DF    " = "    17" 
"MSE         " = "126.03" "R-Square    " = "0.7705" "Adj R-Square" = " 0.757"

ODS Graphics provides you with ways to make simple graphs and customize every aspect of them. While not shown in this example, you can also annotate graphs and modify dynamic variables. For more information about SG annotation and the techniques shown in this blog, see the free book Advanced ODS Graphics Examples

Post a Comment

CandleStick Chart with SAS 9.2

Let us start the new year by taking a trip back in history to SAS 9.2, first released in 2008, and the first SAS release that included the new ODS Graphics software including GTL and SG procedures.  While we have recently released the third maintenance on SAS 9.4 (SAS 9.40M3), many of you are using various maintenance releases of SAS 9.3, and some are still using SAS 9.2.

One such SAS 9.2 user recently saw my post on creating a CandleStick Chart using SAS 9.3  which included a new plot type called the HighLow plot.  This is a versatile plot that can not only handle the "Candlestick" chart commonly used in the financial domain, but is also useful to create many different graphs as you can see in other articles in this blog..  This user wanted to create a similar chart using SAS 9.2.

I first sent them a program to create a High-Low-Close type graph using the GPLOT procedure, but user wanted something similar the the graph shown the linked article.

Stock Plot_GTL_92_aWhile I could not think of a way to create such a graph using SGPLOT, it was possible, with some effort to create one using GTL.  The graph on the right is created using the GTL BoxPlotParm statement.  This statement was originally added to provide the user a way to plot a custom box plot, where the values for the various features of the box are computed by the user.  So, the data set provided by user needs to contain the various statistics like "High", "Low", "Q1", "Q3" and so on for each value of the category.

DataThe data set would look like the table on the right.  In this example, for each value of Date, we have 4 observations, one for each named statistic.  Here we have the "Min", "Max", "Q1" and "Q3" values computed for each value of Date.  The column names can be anything, but the "Stat" values must have the text strings shown.

In my case, I used a data step to compute these values.  The Q1-Q3 range is represented by the "Open" and "Close" value of the stock, and the "Low" and "High" are the low and high values for the stock for that day.

proc sort data=sashelp.stocks
        (where=(stock='IBM' and date > '01Jan2004'd))
        out=ibm;
 by date;
run;

data boxParm;
  length Group $4;
  format DateUp DateDn date7.;
  keep Date DateUp DateDn Stat Value Group Close2;
  set ibm;

  Stat='Min'; Value=low; output;
  Stat='Max'; Value=high; output;
  Stat='Q1'; Value=min(open, close); output;
  Stat='Q3'; Value=max(open, close); Close2=close; output;
run;

Now, we create a template using the BoxPlotParm statement for the graph.  Note we have also superposed a Series plot to connect the "Close" value for each day.

/*--Template for OHLC plot--*/
proc template;
  define statgraph OHLC;
    begingraph;
      entrytitle 'Stock Chart for IBM';
      layout overlay / xaxisopts=(display=(ticks tickvalues line)
                                        discreteopts=(tickvaluefitpolicy=thin));
        boxplotparm x=date y=value stat=stat;
        seriesplot x=date y=close2 / lineattrs=(color=gray);
      endlayout;
    endgraph;
end;
run;

/*--OHLC plot--*/
proc sgrender data=boxParm template=OHLC;
run;

User also wanted to see the boxes colored by whether the price was up or down.  This would be easy with a GROUP option for the BoxPlotParm.  Unfortunately, the SAS 9.2 release does not support a Group option.  However, the saving grace was that this was not a real group, where there could be one or more group per category.  Instead it is really a single colored box by the group classifier.

Stock Plot_Group_GTL_92With some creative coding we can achieve this result.  Can you guess how I might have done this?

What I have done is displayed all the boxes by date using the green color.  Then, I have overdrawn only the boxes for the days the stock value was down.  This causes only some of the green boxes to be hidden by the red ones.  See the program for the full data step and code.

/*--Template for OHLC plot by group--*/
proc template;
  define statgraph OHLC_Grp;
    begingraph;
      entrytitle 'Stock Chart for IBM';
      layout overlay / xaxisopts=(display=(ticks tickvalues line)
                                       discreteopts=(tickvaluefitpolicy=thin));
        boxplotparm x=date y=value stat=stat / fillattrs=graphdata2
                                   name='All' legendlabel='Up';
        boxplotparm x=datedn y=value stat=stat / fillattrs=graphdata1
                                  name='Dn' legendlabel='Down';;
        seriesplot x=date y=close2 / lineattrs=(color=gray);
        discretelegend 'All' 'Dn';
      endlayout;
    endgraph;
  end;
run;

SAS 9.2 Code for CandleStick Chart:  Stock_Plot_92

Post a Comment