Advanced ODS Graphics: Remove ODS Subtitles

In my last blog, I showed you how to change the titles in graphs produced by analytical procedures; today I will show you how to remove subtitles that procedures display on some output pages. The following step creates output that contains a SAS title ('Illustrate the CIF Plot'), a PROCTITLE ('The LIFETEST Procedure'), and a subtitle ('Failed Event: Event Indictor: 1=Event 0=Censored=1') that is set by the LIFETEST procedure.

title 'Illustrate the CIF Plot';
ods graphics on;
proc lifetest data=sashelp.Bmt plots=all;
   strata group;
   ods select FailureSummary cifPlot;
   time T*Status(0) / eventcode=1;
run;

Click on images to enlarge.
odsdoc1
You can remove the first and all subsequent titles that are set by TITLEn statements by specifying:

title;

You can remove the PROCTITLE by specifying:

ods noproctitle;

There is not a one-line specification that will remove subtitles, but you can use the ODS Document and PROC DOCUMENT. The following step captures the procedure output into an ODS Document named MYDOC:

title;
ods noproctitle;
ods document name=mydoc (write);
proc lifetest data=sashelp.Bmt plots=all;
   strata group;
   ods select FailureSummary cifPlot;
   time T*Status(0) / eventcode=1;
run;
ods document close;

The WRITE option creates a new document rather than appending to an old document. The following step lists the contents of the document:

proc document name=mydoc;
   list / levels=all;
run;

odsdoc2
This document contains a table and a graph. The listing provides you with the paths that you need to copy and paste to write the next step, which replays the table and graph while suppressing all subtitles:

proc document name=mydoc;
   obstitle \Lifetest#1\Failcode#1\FailureSummary#1;
   replay   \Lifetest#1\Failcode#1\FailureSummary#1;
   obstitle \Lifetest#1\Failcode#1\cifPlot#1;
   replay   \Lifetest#1\Failcode#1\cifPlot#1;
run;

The OBSTITLE (subtitle object) statements, when specified with only an object path, suppress all subtitles. The REPLAY statements display the table and the graph.
odsdoc3
Since the code depends a previous step, and the results from that step (the contents of the ODS Document) can be stored in a SAS data set, you can easily use a macro to avoid copying and pasting paths to the PROC DOCUMENT step:

ods document name=mydoc (write);
proc lifetest data=sashelp.Bmt plots=all;
   strata group;
   ods select FailureSummary cifPlot;
   time T*Status(0) / eventcode=1;
run;
ods document close;
 
%macro nosubs;
proc document name=mydoc;
   ods exclude properties;
   ods output properties=p;
   list / levels=all;
run;
 
data _null_;
   set p end=eof;
   if _n_ = 1 then call execute('proc document name=mydoc;');
   if type = 'Table' or type = 'Graph' then do;
      call execute(catx(' ', 'obstitle', path, ';'));
      call execute(catx(' ', 'replay'  , path, ';'));
   end;
   if eof then call execute('quit;');
run;  
%mend;
 
%nosubs

CALL EXECUTE writes SAS code to a buffer, and that code is run after the DATA step terminates. The macro creates the same PROC DOCUMENT step and results as the first PROC DOCUMENT step--a table and a graph with no titles or subtitles. See the documentation for PROC DOCUMENT for more information about rearranging, subsetting, and generally controlling SAS output.

Members of the Advanced Analytics division at SAS (who create products including SAS/STAT, SAS/QC, SAS/ETS, SAS/IML, SAS/OR, and many others) create documentation that contains text interleaved with output. They use macros that run PROC DOCUMENT to display subsets of procedure ouput. You can use a very similar system called StatRep to create documents that contain SAS code and reproducible results.

Update (November 28, 2016).
You can use the following step to replay the table and the graph with no page break in between.

proc document name=mydoc;
   obstitle \Lifetest#1\Failcode#1\FailureSummary#1;
   obpage   \Lifetest#1\Failcode#1\cifPlot#1 / delete;
   obstitle \Lifetest#1\Failcode#1\cifPlot#1;
   replay   \Lifetest#1\Failcode#1\FailureSummary#1, 
            \Lifetest#1\Failcode#1\cifPlot#1;
quit;
Post a Comment

Layers vs annotation

Last week a user asked about BY variable group processing for SGAnnotate with SGPLOT procedure.  The user provided a simple use case for the question (always a good idea) using the sashelp.class data set.  The graph included a display of reference lines for the mean value of height using annotation.  The problem was that all the lines defined were being rendered in each graph and were not getting filtered with the BY group as SGAnnotation does not support BY variable processing.  See the graph in the linked question above.

This is a good example of a user who is familiar with SAS/GRAPH programming using SGPLOT.  When you do that, it is useful to remember that SGPLOT supports many plot statements that can be "Layered"  together to create a graph.  With SGPLOT, graphs should preferably be built by adding plot layers as far as possible.  This will work for a large number of graphs and annotation may be needed only for a few cases.

data_2This use case is actually better handled in SGPLOT by using plot layers.  Instead of building a separate data set of the mean values by sex for the annotated reference lines, we can merge that data into the single data set.  We compute the mean values by sex using the MEANS procedure and then we can merge the computed data into the original data set (by Sex).  The last few rows of the final data set are shown on the right.  The data set is a merge of the relevant columns of sashelp.class and the computed mean values by sex.  I have also added a column called LBL for the label that I want to display on the mean reference line at x=zero.

Now, in addition to the SCATTER statement, we can use a REFLINE statement layer to display the reference lines at the mean value, and also label the line as necessary using a TEXT plot.  The active BY variable for the procedure will automatically work with the reference line and text plot to render the desired results.  The data set is sorted by sex for BY variable processing.

SGPLOT code:

title 'Height by Weight';
proc sgplot data=class2 uniform=yscale noautolegend;
  by sex;
  scatter x=weight y=height;
  refline meanHeight;
  text x=zero y=meanHeight text=lbl / position=topright ;
run;

heightbysexThe graph on the right is one of the two graphs generated for By value of Sex=F.  The reference line is displayed at the appropriate value, along with the label "Mean=60.8" positioned at the left end of the reference line.  If your SAS release does not include support for TEXT plot, you can leave it out, or use SCATTER with MarkerChar option.

Note the use of  the procedure option UNIFORM=yscale which makes the y-axis data range uniform across all the graphs.   Click on the graph for a higher resolution view.

There are multiple benefits of using plot layers instead of annotation.

  • Each plot contributes its data range and offset requirements to the axes.
  • The axes union the data ranges, and communicate the information back to the plots for display.
  • The plots can be interspersed between other plots in the order we want.
  • The plots contribute to legends (and work with attribute maps).
  • The data also works correctly with the BY variable.

While not significant here, plot ordering and group attribute assignment and contribution to legends can be a big benefit in other graphs such as the Swimmer Plot.  So, when coming to SGPLOT from SAS/GRAPH, it will be to your benefit if you don't just duplicate the process you use with GPLOT and annotate.  There may be other ways to achieve the right results.

Full SAS9.40 code: layers 

 

 

Post a Comment

Outside-the-box: Directed circle link graphs

circle_graph_arrowOne request came in for the previous article on Circle link graph, for the addition of arrow heads to indicate the direction of the flow.  Given that I am using a SERIES plot to render the links, it is relatively easy to add arrow heads to the links as the SERIES plot statement itself supports options for displaying arrow heads.

The arrow head sizes depend on the shape of the arrow head and also on the thickness of the line. Arrowheads generally work out well for smaller line thicknesses.  So, in the case of the graph on the right, I have remove the response thickness option and made all the arrows 5px thick.  Click on the graph for a higher resolution view.

series x=x y=y / group=link
           lineattrs=(pattern=solid thickness=5)
           grouplc=colorTo transparency=0.05
           nomissinggroup
           arrowheadpos=end arrowheadshape=barbed;

One important change in the code is related to the length of each link.  Previously, without arrow heads, the links were drawn from and to the middle of the circle rim, and then overdrawn with the circular nodes.  Now, that is not suitable since portion of the arrow head would be hidden.  Also, drawing the links over the circular nodes looks less aesthetic.  So, I have to stop the link short on the "To" side of the node.

For this graph, I am computing the spline curve myself.  In the %makelink() macro, the vertex values for the curved links are computed for t=0 to 1.0 by 0.05.  So, I can stop the computation short of 1.0, say, 0.97 to get the shortened links so the arrow heads are fully visible.  Also, when the SERIES plot computes the arrow head, the thick line needs to be stopped a bit early so the arrow head point is not overdrawn by the thick line.  To allow this to happen, the last link of the series needs to be long enough to allow it to be shortened.  To do this, I used:  do t=0.0 to 0.85 by 0.05, 0.97;  This provides a longer last line segment from t=0.85 to 0.95 for the last link.

circle_graph_arrow_respVariable line thickness can still be supported, but I reduced the max line thickness, as seen in the graph on the right.  If the links are made translucent, the overlap between the arrowhead and line segment can be seen.  So, it is better to have a high level of opacity.  Just for aesthetics, I moved one of the links, but it would still work.

Full SAS 9.40M3 SGPLOT code: directed_circle_graph

Post a Comment

Outside-the-box: Circle link graph

There has been some interest in "Circle Link Graph" diagrams where the nodes are laid out in a circle, with links going from one node in the circle to another.

circle_graph_dataI recall seeing one diagram during the 2014 World Cup Soccer tournament, showing the number of players from one country that are playing in a league in another country.  I thought it would be an interesting exercise to see how to build this graph using SGPLOT procedure.

To create such a graph, I just made up some data of the number of players in each team that are playing in their own country and in some other country.  Then, I used some data step code using Hash Object to build a data set that contains the coordinates of the nodes as segments of arcs around the circle, and links that traverse from the center on one node to another.

circle_links_fromThe graph is shown on the right.  Click on the graph for a higher resolution view.  I first did this exercise back in 2014.  At that time SERIES plot did not support line thickness response to make the line thickness proportional to some variable.  Also, SERIES plot did not support coloring lines by a separate group and smooth splines.  Nor was there any expressed interest in the user community for such a graph, so I did not post my findings.

Now, there appears to be some interest expressed in the SAS community for such a graph.  Also, with SAS 9.40M3, we have all the features in place to make decent circle-link graph as shown on the right.    It will be clear when you see the code linked below that the task is not trivial.  Here are the steps to create this graph.

  • The data set as shown above has three columns, From, To and LinkCount.  These represent the number of players that are "From" one country playing in the "To" country.
  • Run a data step to get the total number of players.
  • Run a data step to add all the node names and links into two separate Hash Objects.
  • Iterate over all the nodes in the nodes hash object to compute the start and end angles for all nodes.
  • Iterate over all the links in the links hash object and compute the coordinates for each link .
  • A macro is used to compute the spline shape.
  • Note, each link starts and ends at the center of the node.
  • Use two TEXT plot do display the country names.  Two TEXT statements are needed to change the Position option to "Right" and "Left" as we go around the circle.

circle_graph_to_2Node names can be displayed in the circle instead of "radial" as shown on the right.  The angle of rotation "rotate2" is already computed.  "Backlight" option is used to ensure the text is visible.

SGPLOT code:

proc sgplot data=links aspect=1 nowall noborder subpixel;
  series x=x y=y / group=link lineattrs=(pattern=solid)
             grouplc=colorTo transparency=0.2 nomissinggroup
             smoothconnect thickresp=linkcount
             thickmax=36 thickmaxresp=3;
  series x=x y=y / group=country lineattrs=(thickness=20
            pattern=solid color=white) nomissinggroup;
  series x=x y=y / group=country nomissinggroup
            lineattrs=(thickness=20 pattern=solid)
           grouplc=colorTo transparency=0.2 name='a';
  text x=xlbl y=ylbl text=country / rotate=rotate2
          position=center textattrs=(size=9 color=White) backlight;
  xaxis display=none;
  yaxis display=none;
run;

Next Steps:  With SAS 9.40M3, a SPLINE statement is also available which does all the work of computing a smooth spline for each link.  The program could be updated to use those and remove the need for the macro.

Also, it would be nice if the start and end points of the links can be distributed along the arc of the nodes.  In that case the width of the node and of the link will both have to be proportional to the number of players.

I believe such graphs can be useful to view all kinds of "Consumer" and "Provider" relationships.  For "Patients" and "Providers" it could be that some providers are also patients.

Full SAS9.40M3 SGPLOT code:   circle_graph

Post a Comment

Clinical Graphs: Spider plot

A Spider Plot is another way of presenting the Change from Baseline for tumors for each subject in a study by week.  The plot can be classified by response and stage.  Another way of displaying Tumor Response data was discussed earlier in the article on Swimmer Plot.

spiderThis article is prompted by a question on the SAS communities page on how to create a Spider plot.  The user provided an illustration of what the plot might look like. I followed the example and generated some data to create the graph shown on the right.  Click on the graph for a higher resolution view.

The data is arranged in six columns.  Four columns are needed to draw the progression of the disease for each subject over time:  Subject, Week, Change, RGroup.

spider_dataTwo additional columns are used to display the status at the end of the curve for each subject: WeekS, TGroup.  The first three observations are just to ensure the groups are assigned colors in the order we want.  A Discrete Attributes Map can be used, but I ran into some minor difficulties, so I skipped that step.  This exercise hepled reveal a minor problems, but I was able to work around it.

In this case, WeekS=Week+2 in this case, to position the marker at the right of the curve.  Only the last point in the curve has this nonmissing value.

The SAS 9.4M2 options for coloring the series and markers by another classifier (GroupLC and GroupMC) is used to color each curve for the Subject by the Response Group (RGroup).  Note, the connectivity for each curve is determined by setting Group=Subject.  Then, with in this, the color of each curve is set by setting GroupLC=RGroup.  This allows multiple curves to be classified in the same category.

The SAS 9.40M1 SymbolChar statement is used to define a few symbols to represent the status of each Subject, such as "Treatment Ongoing" etc.  These are inserted into the group symbols list using the DataSymbols option.  Custom group data colors are set using the DataContrastColor option.

A Scatter plot is used to display these markers.

SAS 9.4M2 SGPLOT Code:

title "Tumor Response by Week";
ods graphics / reset width=5in height=3in imagename='Spider';
proc sgplot data=spider noborder tmplout='c:\spider.sas';
  format tgroup $growth.;
  symbolchar name=ongoing char='2192'x / scale=1;
  symbolchar name=growtht char='2020'x / scale=1;
  symbolchar name=growthnt char='2021'x / scale=1;
  styleattrs datacontrastcolors=(green gold red)
                    datasymbols=(ongoing growtht growthnt );
  refline 0 / lineattrs=(pattern=shortdash);
  series x=week y=change / group=subject grouplc=rgroup groupmc=rgroup
             markers markerattrs=(symbol=circlefilled)
            lineattrs=(thickness=2 pattern=solid) name='a';
  scatter x=weekS y=change / group=TGroup markerattrs=(size=16 color=black)
           nomissinggroup name='b';
  keylegend 'a' / title='Response' type=linecolor valueattrs=(size=7)
            location=inside position=topright across=1 opaque;
  keylegend 'b' / valueattrs=(size=7) noborder;
  xaxis label='Week';
run;

There are also examples of Spider plots with negative date values on the x-axis.  These appear to track the disease state before and after start of treatment.  The above code will likely work if the data itself includes values from before start of treatment and the baseline value is 1.0.

Full SAS 9.4M2 SGPLOT code:  spider

Post a Comment

Outside-the-box: CONSORT diagram

Over the past few weeks I have heard about the "Consort Diagram".  This was mentioned in a Communities article, and also by a couple of users separately.

consort_diagram_poster_800This topic was also covered by Anusha Mallavarapu and Dean Shults from Cytel in a poster at PhUSE 2016 as shown on the right.  The authors discuss an automated way to create the diagram using RTF template.

Speaking with the author, it appears that the diagram structure is relatively fixed for a set number of arms of the study.  The authors showed a sample diagram for a 4 arm study shown on the right.

I thought it would be an interesting exercise to see if I could create this diagram fully in SAS, thereby reducing the complexity of using multiple tools.  The diagram I created is meant to essentially mimic the diagram presented by the authors above.  I have not filled in every box, but did one of each to see how we can create such a graph using the SGPLOT procedure.  Other diagram structures could be defined in a similar way.

consortdiagramI have used the following statements with SAS 9.4M3 SGPLOT to create the diagram shown on the right.  You can click on it for a higher resolution view.

  1. A Series plot to draw the links.
  2. A Polygon plot to draw the empty boxes.
  3. A Polygon plot to draw the filled boxes.
  4. A Text plot to draw the center aligned horizontal text.
  5. A Text plot to draw the left aligned horizontal text.
  6. A Text plot to draw the rotated text in the filled boxes.

The full program is linked below.  The diagram is created in a 0-200 vertical and 0-100 horizontal space. Vertices for the links are defined as nodes data set with Node Id and their (x, y) coordinates based on the shape of the diagram.  A Hash Object is created to hold the node ids and their x and y coordinates.

Links are defined as multi segment lines with the Node Ids as vertices.  Up to 4 nodes can be used to allow for the angled links.  Then, the Node Hash Object is used to get the coordinates of the vertices for the links, and written out as a series plot with multiple legs.  Separate data sets are defined for the empty rectangles, and for the filled rectangles so two Polygon statements can be used to draw these, one for empty and one for filled.

While I defined the polygon vertices directly for ease of use, this could also be based on the Node Ids from the Hash Object.

Similarly, text is defined in three data sets, one for the rotated text, one for the center aligned text and one for the left aligned text.  These are used with the three Text statements to draw the information.  a FitPolicy=SplitAlways is used with a SplitChar="." to arrange the text.  Finally, all the data sets are merged into one data set for use with the procedure.

I defined the text location directly for ease of use, but that could also be associated to the Node Ids and extracted from the Hash Object.

My goal is to show how the data should be arranged and which statements to use to draw the graph.  The authors indicated that often the Consort Diagram structure and textual information (except the numbers) is static, with changing numbers.  In that case, the diagram can be defined once and reused.  The "N" values can likely be held in macro variables and inserted into the right places.

SAS 9.40M3 is necessary as I have used the TEXT plot statement to draw the text in the nodes.  It may be possible to do this using Annotate with earlier versions, likely a bit harder.  I would be interested to hear if this helps in the task, and what other details may need to be addressed.  So, please feel free to chime in.

SAS 9.4 SGPLOT Code:  consort_diagram

 

Post a Comment

Advanced ODS Graphics: Title change macro

Have you ever wanted to modify a graph title that is produced by an analytical procedure? You can make a wide variety of changes by modifying the graph template. Modifying the graph template is straight forward. You specify ODS TRACE ON, run the procedure, find the template name, display the template, modify the template, submit it, and rerun the procedure. I will show you a different way.

First, consider running PROC LOGISTIC as follows:

data Neuralgia;
   input Treatment $ Sex $ Age Duration Pain $ @@;
   datalines;
P F 68 1 No B M 74 16 No P F 67 30 No P M 66 26 Yes B F 67 28 No B F 77 16 No A
F 71 12 No B F 72 50 No B F 76 9 Yes A M 71 17 Yes A F 63 27 No A F 69 18 Yes B
F 66 12 No A M 62 42 No P F 64 1 Yes A F 64 17 No P M 74 4 No A F 72 25 No P M
70 1 Yes B M 66 19 No B M 59 29 No A F 64 30 No A M 70 28 No A M 69 1 No B F 78
1 No P M 83 1 Yes B F 69 42 No B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes A M 70
12 No A F 69 12 No B F 65 14 No B M 70 1 No B M 67 23 No A M 76 25 Yes P M 78 12
Yes B M 77 1 Yes B F 69 24 No P M 66 4 Yes P F 65 29 No P M 60 26 Yes A M 78 15
Yes B M 75 21 Yes A F 67 11 No P F 72 27 No P F 70 13 Yes A M 75 6 Yes B F 65 7
No P F 68 27 Yes P M 68 11 Yes P M 67 17 Yes B M 70 22 No A M 65 15 No P F 67 1
Yes A M 67 10 No P F 72 11 Yes A F 74 1 No B M 80 21 Yes A F 69 3 No
;
 
ods trace on;
ods graphics on;
proc logistic data=Neuralgia;
   class Treatment Sex / param=glm;
   model Pain= Treatment|Sex Age;
   lsmeans Treatment / plots=anom;
run;

Click on a graph to enlarge.

anomplot

Here is the trace output for the analysis of means plot:

Name:       AnomPlot
Label:      Treatment ANOM Plot
Template:   Stat.Logistic.Graphics.AnomPlot
Path:       Logistic.LSMeans.AnomPlot

You can use a macro to modify the way that the template creates titles as follows:

%grtitle(path=Stat.Logistic.Graphics.AnomPlot)

The macro displays the following:

Stat.Graphics.AnomPlot
   Graphics_AnomPlot
   Graphics_AnomPlot2

The macro first lists the template that it modified. It is not the same template that you specified, because the specified template is a link. The macro follows links and modifies the actual graph template at the end of the link chain. The names of two macro variables follow the template name. The modified template creates titles from the values of the macro variables when those variables exist and uses the original titles when the macro variables do not exist. The names of the macro variables are constructed from the second and last levels of the template name. In most cases, these two levels are the procedure name and the graph name (but not in this case because of the link).

The following step creates the graph using the modified template and the two macro variables.

%let Graphics_AnomPlot = Neuralgia Study;
%let Graphics_AnomPlot2 = Analysis of Means with 95% Confidence Limits;
proc logistic data=Neuralgia;
   ods select anomplot;
   class Treatment Sex / param=glm;
   model Pain= Treatment|Sex Age;
   lsmeans Treatment / plots=anom;
run;

anomplot1

Here is how it works. The original template has two ENTRYTITLE statments:

entrytitle _TITLE;
entrytitle textattrs=GRAPHVALUETEXT _CLSTR;

The macro modifies the template as follows:

mvar Graphics_AnomPlot Graphics_AnomPlot2;
if (EXISTS(GRAPHICS_ANOMPLOT))
   entrytitle GRAPHICS_ANOMPLOT;
else
   entrytitle _TITLE;
endif;
if (EXISTS(GRAPHICS_ANOMPLOT2))
   entrytitle GRAPHICS_ANOMPLOT2;
else
   entrytitle textattrs=GRAPHVALUETEXT _CLSTR;
endif;

If the macro variable exists, your title is used, otherwise the original title is used. If the macro variable exists but has a null value (%LET Graphics_AnomPlot = ;), that title is suppressed. Notice that there are no ampersands. Macro variable values are substituted at the time that PROC LOGISTIC is run not at the time that the macro runs PROC TEMPLATE.

The modified template is in an item store in the WORK library. You can delete that item store as follows:

%grtitle(options=delete)

You can modify multiple templates at once. The three steps below modify all templates, all STAT templates, and all PROC LOGISTIC templates. Link chains are not followed when multiple templates are modified.

%grtitle;
%grtitle(path=stat)
%grtitle(path=stat.logistic)

The following step lists all of the modified templates that are in the WORK library:

proc template; list / store=work.templat; quit;

The macro header provides additional documentation. The macro does the following:

  • All modified templates in the WORK library are deleted.
  • PROC TEMPLATE writes the graph templates to a file.
  • DATA steps follow the links when there is only one template.
  • A DATA step modifies the template code and submits it to SAS using CALL EXECUTE.

Macro Code: Graph Title Macro

You can use this code as a prototype for other forms of customization. You could just as easily add DRAW statements that add watermarks or logos to every graph. SAS provides all of the tools that you need to easily access all of the templates and systematically modify them.

Post a Comment

Getting Started with SGPLOT - Part 1 - Scatter Plot

Last week I had the pleasure of presenting my paper "Graphs are Easy with SAS 9.4" at the Boston SAS Users Group meeting.  The turn out was large and over 75% of the audience appeared to be using SAS 9.4 back home.  This was good as my paper was focused on the cool new and useful features released with SAS 9.4 release, the most prominent of these (in my opinion) are the AxisTable statements that make it very easy to add axis-aligned textual information to the graphs.

A mixer was organized on the upper floor of the Microsoft NERD building that afforded great views of the river.  Here, I got an opportunity to chat with attendees and her their opinions.  During these conversations I noted that many users were very excited about the new graph features, but were not using these procedures for various reasons.  So while I peddled this blog every chance I got, it became clear to me that we could use some "tutorial" style articles, geared towards the new user.

So, here is the first of such articles focused on the SGPLOT procedure.  The SGPLOT procedure is really a great way to create graphs, from the simplest Scatter Plot to complex Forest Plots.   The SGPLOT procedure supports multiple plot statements like Scatter, Series, Step, Histogram, Density, VBar, HBar, VBox, HBox, HighLow and many many more.  These statements can be used individually to create many basic graphs.  Many of these statements can also be combined to create more complex plots.

scatterplotIn this article, we will explore some of the key features of the Scatter plot, arguably the most simple, useful and commonly used plot.  The most basic use case is shown on the right, displaying the weight x height for all the observations in the sashelp.class data set.

Click on the graph for a higher resolution image.  The program code is shown below.

title 'Weight by Height';
proc sgplot data=sashelp.class;
  scatter x=height y=weight;
run;

What could be simpler than the code above?  The graph created by the SGPLOT procedure uses predefined style information to render a clean and uncluttered graph using the principles of effective graphics as recommended by thought leaders in the industry.  Axis extents are derived from the data, and ticks on the axis are drawn only when necessary.  Statement options are available to customize the graph.

scatterplotgroupThe graph on the right displays the same data by Gender of each student.  Now, different marker shapes are automatically selected from the Style to represent the male and female persons in the graph.  A legend is automatically displayed in the default location at the bottom of the graph.

title 'Weight by Height by Gender';
proc sgplot data=sashelp.class;
  scatter x=height y=weight / group=sex;
run;

When a group role is in effect, the different unique values from the group variable are assigned distinct marker shapes and colors.  The marker symbol and color are cycled at the same time for most styles with ATTRPRIORITY=none.  For some styles like HTMLBlue, the ATTRPRIORITY=color.  For such styles, only the color is cycled first.  After all 12 color values are used up, then the marker symbol is changed.  ATTRPRIORITY can be set  to 'Color' or 'None' for any program in the ODS GRAPHICS statement to obtain the preferred cycling of attributes.

scatterplotgroupmarkersGroup attributes are obtained from the Style that is associated with the destination.  If you want to use custom group colors and or symbols, you could derive a new style from an existing one and change the color and symbol settings for the GraphData1-12 elements in the style.  This can be done using the TEMPLATE procedure or use the %MODSTYLE macro.  An easier way is to set the group data colors and or symbols in the program code using the STYLEATTRS statement.

title 'Weight by Height by Gender';
proc sgplot data=sashelp.class;
  styleattrs datasymbols=(circlefilled trianglefilled)
                   datacontrastcolors=(olive maroon);
  scatter x=height y=weight / group=sex filledoutlinedmarkers
               markerattrs=(size=12) markerfillattrs=(color=white)
               markeroutlineattrs=(thickness=2);
  keylegend / location=inside position=bottomright;
run;

In the graph and code above, we have made the following customizations:

  1.   We have defined the list of symbols to be used for the groups.
  2.   We have defined the list of colors to be used for the groups.
  3.   We have requested the use of "filled and outlined" markers.
  4.   We have moved the legend inside the data area.

scatterplotgroupimagemarkersFinally, in the graph on the right, we have used custom symbols to represent the "male" and "female" persons in the data.  Click on the graph for a higher resolution view.

Here are the steps we have used to create this graph:

  1. We have defined two custom named symbols using the SYMBOLIMAGE statement.  Each symbols uses an image file to define the shape and color.
  2. We have provided these two named symbols in the list of symbols for drawing the graph.
  3. We have disabled the axis lines and ticks and enabled the grid lines.
  4. We have disabled the graph and data area borders.
  5. We have also removed the legend as the shapes are self explanatory.
  6. Also note, we have displayed the names of the students with the extreme weight values.  The names are displayed below the marker.  All names are not displayed to avoid clutter.

SGPLOT procedure code is shown below.  See the link at the bottom for the full code.

title 'Weight by Height by Gender';
proc sgplot data=class noborder noautolegend;
  symbolimage name=male image="&fileM";
  symbolimage name=female image="&fileF";
  styleattrs datasymbols=(male female);
  scatter x=height y=weight / group=sex markerattrs=(size=20)
               datalabel=label datalabelpos=bottom;
  xaxis offsetmin=0.05 offsetmax=0.05 display=(noline noticks) grid;
  yaxis offsetmin=0.1 offsetmax=0.05 display=(noline noticks) grid;
run;

Full SAS 9.4 SGPLOT Code:  getting_started_1_scatterplots 

 

Post a Comment

Spark and Summary Plots

In the area of graphical visualization of data, Edward Tufte is a thought leader and has put forth many innovative ideas that enhance the understanding of the information in the graph with minimal distractions and potential for misinterpretation.

One of his ideas has been the use of "Spark" plots.  As per my understanding, these are very light weight graphs that can depict the key information in a very small space.  Often such graphs can be included inline with other textual information in a paragraph like this:  spark_3.  In this case, I have generated this graph using SGPLOT procedure with minimal decorations to depict the trend of the stock prices for Intel from the sashelp.stocks data set.  I display only the series, last value and a label.

SGPLOT code for Spark Plot;

proc sgplot data=spark noautolegend noborder nowall;
  series x=date y=adjclose;
  scatter x=date y=lastvalue / markerattrs=(color=blue symbol=circlefilled size=12);
  text x=date y=lastvalue text=lastvalue / position=topright textattrs=(size=20);
  text x=date y=firstvalue text=label / position=left textattrs=(size=20)
         splitpolicy=splitalways splitchar='.';
  xaxis display=none;
  yaxis display=none offsetmin=0 offsetmax=0;
run;

Recently, I received a request from SAS user Benjamin Knisley to create a similar lightweight "Graphical Summary" for visualizing patient data over time.  The graph shown below includes display of the visits and hospitalization over time.  Multiple visits are depicted as dots for easy viewing and the x and y axes are removed.  Some significant information about the patient, clinic and actual start and end dates is added.  See link below for full code.  I believe this depiction of the data is also motivated by Tufte's ideas.

visits_dot_4

One customization needed in the above graph is the use of the VALUES option since user wanted a sparse display of the years on the x-axis.  This too can be generalized by using GTL which provides the INTERVAL and INTERVALMULTIPLIER options on the TIMEOPTS bundle.

SGPLOT code for Graphical Summary Graph:

title j=l 'Family name, Given name' j=r 'County Clinic';
proc sgplot data=dots noautolegend noborder nowall;
  scatter x=date y=y / markerattrs=(symbol=circlefilled size=5);
  xaxistable hospitalized / x=date nomissingchar labelattrs=(size=9 weight=bold)
                     valueattrs=(size=10 weight=bold);
  text x=date y=ylbl text=firstdate / position=right contributeoffsets=none;
  text x=date y=ylbl text=lastdate / position=left contributeoffsets=none;
  xaxis type=time values=('01jan1980'd '01jan1985'd '01jan1990'd '01jan1995'd)
           valueshint display=(nolabel) valuesformat=year. valueattrs=(size=9 weight=bold);
  yaxis display=(noline noticks novalues) labelattrs=(size=9 weight=bold);
run;

Full SAS 9.4 code: graphicalsummary  

Post a Comment

Legend order and group attributes

In this blog, I will show you how to control the order of the entries in a legend and explicitly control the correspondence between groups and style elements in PROC SGPLOT. In many cases, the colors that are used to differentiate groups do not matter--the graph simply needs to display different groups using different colors. That is not true for other graphs. It might be confusing if males were displayed using pink markers and lines and if females were displayed using blue markers and lines. For adverse events, you might prefer to use green for mildly adverse events and red for more severe events. Furthermore, you might want to order the events in the legend from mild to severe, and that might not conveniently depend on the order of the events in the data or a sorted order. The easiest way to control both legend order and group to style element correspondence is by using attribute maps. A series of examples provides background and shows other options.

The first graphs show default legend orderings and correspondence. They show that these can change depending on the data and the type of graph that you create. The fifth graph shows how you can use the STYLEATTRS statement in PROC SGPLOT to override components of style elements. The seventh (and last) graph shows how you can use an attribute map to control both the order of the entries in a legend and the correspondence between groups and style elements. With attribute maps, you do not have to know the original order. You can completely control the legend order and assign or override the default style elements. The PROC SGPLOT documentation contains much more information about the STYLEATTRS statement and attribute maps.

All of the graphs use this format in creating the legend:

proc format;
   value $sex 'M' = 'Male' 'F' = 'Female';
run;

This step creates a simple scatter plot with two groups:

proc sgplot data=sashelp.class;
   title '(1) Scatter Plot of the Class Data Set by Sex';
   scatter y=height x=weight / group=sex;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData1) are first in the data set and legend.';
   footnote2 justify=left 'Females (GraphData2) are second in the data set and legend.';
run;

Click on a graph to enlarge.

Order1

The GROUP= option is specified in the SCATTER statement so that males are displayed differently from females. The first observation in the SASHelp.Class data set is a male. Therefore, males are displayed using the GraphData1 style element (blue circles) and females are displayed using the GraphData2 style element (red circles). The legend entries are similarly ordered male and then female.

The following step creates a regression fit plot with two groups:

proc sgplot data=sashelp.class;
   title '(2) Fit Plot of the Class Data Set by Sex';
   reg y=height x=weight / group=sex degree=2 nomarkers;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData2) are first in the data set.';
   footnote2 justify=left 'Males are second in the legend and use GraphData2 '
                          'because the male function was fit second.';
   footnote3 justify=left 'Females (GraphData1) are second in the data set.';
   footnote4 justify=left 'Females are first in the legend and use GraphData1 '
                          'because the female function was fit first.';
run;

Order2

Males are still first in the data set, but now males appear second in the legend and are plotted using GraphData2 (red line), and females appear first in the legend and are plotted using GraphData1 (blue line). This is because the regression code gathers together the females first and then the males('F' is sorted ahead of 'M'). Therefore, the legend order and the GraphDatan assignment changes from the scatter plot.

The following step uses both a SCATTER and a REG statement:

proc sgplot data=sashelp.class;
   title '(3) Fit Plot of the Class Data Set by Sex';
   scatter y=height x=weight / group=sex;
   reg     y=height x=weight / group=sex degree=2 nomarkers;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData2) are first in the data set.';
   footnote2 justify=left 'Males are second in the legend and use GraphData2 '
                          'because the male function was fit second.';
   footnote3 justify=left 'Females (GraphData1) are second in the data set.';
   footnote4 justify=left 'Females are first in the legend and use GraphData1 '
                          'because the female function was fit first.';
run;

Order3

The legend order and the GraphDatan assignment still depends on the order in which the regression analysis is performed for each group.

The next step creates a grouped scatter plot from sorted data:

proc sort data=sashelp.class out=class;
   by sex;
run;
 
proc sgplot data=class;
   title '(4) Scatter Plot of the Sorted Class Data Set by Sex';
   scatter y=height x=weight / group=sex;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData2) are second in the data set and legend.';
   footnote2 justify=left 'Females (GraphData1) are second in the data set and legend.';
run;

Order4

Since females now appear first in the data, they appear first in the legend and are displayed using GraphData1. Males appear second in the legend and are displayed using GraphData2.

The next step relies on the default group order (males then females in this case) and uses the STYLEATTRS statement to set the marker and line colors:

proc sgplot data=sashelp.class;
   styleattrs datacontrastcolors=(Blue cxFFAAAA);
   title '(5) Fit Plot of the Class Data Set by Sex';
   reg y=height x=weight / group=sex degree=2;
   format sex $sex6.;
   footnote1 justify=left 'Males (Blue) are first in the data set.';
   footnote2 justify=left 'Males are second in the legend '
                          'because the male function was fit second.';
   footnote3 justify=left 'Females (Pink) are second in the data set.';
   footnote4 justify=left 'Females are first in the legend '
                          'because the female function was fit first.';
   footnote5 justify=left 'The STYLEATTRS statement sets the colors for '
                          'males then females.';
run;

Order5

The STYLEATTRS statement sets the contrast colors to blue and a shade of pink for GraphData1 and GraphData2. The order of the legend entries is alphabetized, and the colors are consistent with gender identity colors.

Notice that the preceding step uses a REG statement without the NOMARKERS option. With this combination, the assignment of style elements to groups is reversed from the example with the REG statement and the NOMARKERS option. If you cannot anticipate which style element is used with which group, do not worry about it; it will all become easier in the last example. You can use attribute maps to control the order of the legend and override the GraphDatan style elements.

This second last example still relies on knowing the default group assignment. The first step creates an attribute map with females first and then males. Therefore, females will appear first in the legend. In this example, the only attribute that is set is FillColor, which is irrelevant in this graph. Specifying an irrelevant variable like this enables you to use an attribute map to simply control legend order:

data order;
   input Value $;
   retain ID 'A' Show 'AttrMap' FillColor 'Red';
   datalines;
Female
Male
;
 
proc sgplot data=sashelp.class dattrmap=order;
   title '(6) Scatter Plot of the Class Data Set by Sex';
   scatter y=height x=weight / group=sex attrid=A;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData1) are first in the data set.';
   footnote2 justify=left 'Males are second in the legend '
                          'because they are second in the attribute map.';
   footnote3 justify=left 'Females (GraphData2) are second in the data set.';
   footnote4 justify=left 'Females are first in the legend '
                          'because they are first in the attribute map.';
run;

Order6

The last example is more typical, and it does not require you to know the default group order. The attribute map names females first and males second so that the legend entries appear in that order. Furthermore, females are explicitly specified to use all of the components of the GraphData2 style element and males use all of the components of GraphData1.

data order;
   input Value $ n;
   retain ID 'A' Show 'AttrMap';
   FillStyle        = cats('GraphData', n);
   LineStyle        = cats('GraphData', n);
   MarkerStyle      = cats('GraphData', n);
   TextStyleElement = cats('GraphData', n);
   datalines;
Female 2
Male   1
;
 
proc sgplot data=sashelp.class dattrmap=order;
   title '(7) Scatter Plot of the Class Data Set by Sex';
   scatter y=height x=weight / group=sex attrid=A;
   format sex $sex6.;
   footnote1 justify=left 'Males (GraphData1) are first in the data set.';
   footnote2 justify=left 'Males are second in the legend '
                          'because they are second in the attribute map.';
   footnote3 justify=left 'Females (GraphData2) are second in the data set.';
   footnote4 justify=left 'Females are first in the legend '
                          'because they are first in the attribute map.';
   footnote5 justify=left 'Males are explicitly assigned GraphData1.';
   footnote6 justify=left 'Females are explicitly assigned GraphData2.';
run;

Order7

The correspondence between groups of observations and GraphDatan style elements can be confusing. It might depend on the order of the observations in the data set or it might depend on the order in which ODS Graphics does computations. You can use STYLEATTRS to override GraphDatan style elements. Even more powerfully, you can use attribute maps to control the order of the legend and correspondence between groups of observations and GraphDatan style elements. The STYLEATTRS statement and attribute maps are much more powerful than is shown here. See the PROC SGPLOT documentation for more information.

Post a Comment