A Macro for Polygon Area and Center

A few weeks back I saw a couple of posts on the Communities page from users wanting to find ways to compute the area of an general polygon and also the center of the area.  I felt such features likely existed somewhere in the SAS/GRAPH set of procedures, so I asked our resident expert(s).  Initially, there was some miscommunication due to the requirement to compute areas.  The %Centroid macro is available among the Annotate macros but it does not report areas.  Also, see another clarification at the bottom of this article.

DataIn the meantime, this piqued my interest and I took a stab at it and wrote up a macro to compute the Area and the centroid of the polygon as described below.

First, I needed a few random polygons, so I wrote a small routine to generate the data in the format on the right.   Changing the seed values can generate different shapes, some concave.   I generated 3 polygons with random number of nodes, and  added one custom polygon to get a specific shapes like the right triangle and a concave "L" shape for verification.

polyAreaSGThen, I used the POLYGON statement in SAS 9.4 SGPLOT procedure to plot the polygons to see what I have, as shown on the right.  See code in the link below.

The macro is included in the code and takes a data set with columns shown above, along with the global XMin and YMin of the entire data (easily computed) and placed into a couple of macro variables XMin and YMin.  The macro takes the name of the input data set "DS", the xmin and ymin values previously computed (just to save a step in the macro to compute them), and polygon data with polygon "ID", and the coordinates.  The result is placed in the output data set "OUT".

%macro polyarea (ds=, xmin=, ymin=, out=, Id=, X=, Y=);

The macro steps around the polygon nodes and incrementally adds the segment areas of each parallelogram against both the X and Y axis.  The sum of each segment around X or Y should be (and are) equal.  Then it takes the first moment of the area of each parallelogram segment with its center about the XMin and YMin axes to compute the center of the area in each direction.  The "X" areas are used for computing the x coordinate, and "Y" areas for the y coordinate.

polyAreaCentroidSG1The result is plotted with the SGPLOT polygon statement with a TEXT overlay to display the area and the location (x, y) of the center of area as shown on the right.  The results "look" kind of right to the eye.

I did not attempt to handle polygons with holes.  There is a good chance the algorithm itself will work for a polygon with holes.

I did not attempt to handle polygons with multiple segments, like in a map data sets.  This macro could be extended to get the area and CG of each segment by making each polygon have a unique id.  But to get the area of the composite multi-segment polygon would need some more thought.  That exercise is left to the motivated reader (meaning I am ducking that exercise).  :-).

If the polygon is highly concave, it is possible the CG will not be within the polygon boundary.  That would be another good exercise to think about.

I mentioned the %Centroid macro earlier.  Another difference between this macro and the %Centroid Macro is that this macro computes the mathematical centroid of each polygon, while the %Centroid macro computes a "good" location for labeling of a polygon, and not the mathematical centroid.  If you want to label a polygon (like state or country name), the %Centroid may be the preferred tool.

Area and Centroid Macro:  AreaCentroidMacro 

 

Post a Comment

Annotating multiple panels

In the past few weeks, I have written two blogs on SG annotation and on saving and then modifying the graphs that analytical procedures produce:
  Modifying dynamic variables in ODS Graphics
  Annotating graphs from analytical PROCs

Today, I finish this series with one more blog. This one shows how you can annotate graphs with multiple panels. If you want to fully understand today's blog, you will need to understand my previous two blogs. Those blogs show you how to run an analytical procedure, output the data object that underlies a graph, save the dynamic variables in an ODS document, process either the template or the PROC SGRENDER call to incorporate the dynamic variables, and then modify the graph. In this example, you will learn how to use a macro to add ANNOTATE statements to each LAYOUT OVERLAY code block in a template that an analytical procedure uses. This enables you to send annotations to each panel and use panel-specific drawing spaces. In contrast, my previous blog showed you how to add a single ANNOTATE statement to the template, which enables annotation but does not provide the ability to specify panel-specific drawing spaces. This example also modifies the data object and the graph template.

I hope that everyone cringed after reading the last sentence. Modifies the data object? Really? Yes, there are times when it makes sense. However, you should never change the data that underlie a graph. This example changes the data object in order to change which parts of the graph are labeled; no numbers are changed.

This example works with the standardized coefficients progression plot in PROC GLMSELECT. The following step creates the plot:

ods graphics on;
proc glmselect data=sashelp.baseball plots=coefficients;
   class league division;
   model logSalary = nAtBat nHits nHome nRuns nRBI nBB
                     yrMajor|yrMajor crAtBat|crAtBat crHits|crHits
                     crHome|crHome crRuns|crRuns crRbi|crRbi
                     crBB|crBB league division nOuts nAssts nError /
                     selection=forward(stop=AICC CHOOSE=SBC);
run;

Click on a graph to enlarge.

glms1a

The graph shows how the coefficients change as new terms enter the model. PROC GLMSELECT labels some of the series plots. It is common in this graph for several coefficients to have similar values in the final model. PROC GLMSELECT tries to thin labels to avoid conflicts. For example, the first term that enters the model after the intercept is CrRuns. Its label is not displayed since it would conflict with the label for CrHits. In this example, you will learn how to select a different set of labels to display. In particular, you will display labels for the standardized coefficients in the selected model that are outside the range -1 to 1. This requires you to change the data object to change which series plots are labeled. Then you can add annotation to highlight the selected model. In PROC GLMSELECT, the final model does not usually correspond to the end of the progression of the coefficients. In this case, it corresponds to the model that is displayed at the reference line at step 9.

You can preview the results as they will be after annotation next.

glms1b

You begin by creating a data object and storing the graph along with the dynamic variables in an ODS document:

ods document name=MyDoc (write);
proc glmselect data=sashelp.baseball plots=coefficients;
   ods select CoefficientPanel;
   ods output CoefficientPanel=cp;
   class league division;
   model logSalary = nAtBat nHits nHome nRuns nRBI nBB
                     yrMajor|yrMajor crAtBat|crAtBat crHits|crHits
                     crHome|crHome crRuns|crRuns crRbi|crRbi
                     crBB|crBB league division nOuts nAssts nError /
                     selection=forward(stop=AICC CHOOSE=SBC);
run;
ods document close;

The next step reads the data object, extracts the parameter labels for the coefficients that are greater than 1 and less then -1 in the selected model, and outputs to a macro variable the number of the last step:

data labelthese(keep=parameter rename=(parameter=par));
   set cp end=eof;
   if eof then call symputx('_step', step);
   if step eq 9 and (StandardizedEst gt 1 or StandardizedEst lt -1);
run;

This step relies on knowing that the selected model was found in step 9. If you are writing a general purpose program to do this modification, you can process the __outdynam data set that the macro makes below, output the value of the variable _ChosenValue, and then run the preceding step.

The next step processes the data set that was created from the data object:

data cp2;
   set cp;  
   match = 0;
   if step ne &_step then return;
   do i = 1 to ntolable;
      set labelthese point=i nobs=ntolable;
      match + (par = parameter);
      end;
   if not match then parameter = ' ';
   if nmiss(rhslabelYvalue) then rhslabelYvalue = StandardizedEst;
run;

This data object is typical of the data objects that are used to make graphs. It has several components of different sizes and missing values elsewhere. The last part of the data set contains the coordinates and strings that are needed to label each profile. The preceding step sets the parameter value to blank in the last step (the one that corresponds to the labels) for all but the terms with the most extreme coefficients. When the Y coordinate for a label is missing (because PROC GLMSELECT suppressed it due to collisions), the Y coordinate value is restored.

The next step provides the macro that contains the code that modifies the graph template:

%macro tweak;
   if index(_infile_, 'datalabel=PARAMETER') then 
      _infile_ = tranwrd(_infile_, 'datalabel', 
                          'markercharacterposition=right markercharacter');
   if index(_infile_, 'curvelabel="Selected Step"') then 
      _infile_ = tranwrd(_infile_, 'curvelabel="Selected Step"', ' ');
%mend;

It performs two changes. By default, labels are positioned by using the DATALABEL= option in a SCATTERPLOT statement. This step removes that option and instead specifies the MARKERCHARACTER= option. You can use the MARKERCHARACTER= option to position labels precisely at a point. In contrast, the DATALABEL= option moves labels that conflict. The first IF statement also adds the option MARKERCHARACTERPOSITION=RIGHT so that labels are positioned to the right of the coordinates. This change is based on the idea that sometimes it is better for labels to be precisely positioned, even if they collide. You can additionally modify the label coordinates if minimizing collisions is important. The TRANWRD (translate word) function performs the change, substituting a longer string from a shorter string. The second IF statement removes the curve label. You will later add it back in through SG annotation.

The next step creates the SG annotation data set:

data anno;
   length ID $ 3 Function $ 9 Label $ 40;
   retain X1Space Y1Space X2Space Y2Space 'DataPercent' Direction 'In';
   length Anchor $ 10 xc1 xc2 $ 20;
   retain Scale 1e-12 Width 100 WidthUnit 'Data' CornerRadius 0.8 
          TextSize 7 TextWeight 'Bold'
          LineThickness 0.7 DiscreteOffset -0.3 LineColor 'Green';
 
   ID       = 'lo1';            Function  = 'Text';           
   Anchor   = 'Right';          TextColor = 'Green';        
   x1       = 55;               y1        = 94; 
   Label    = 'Coefficients for the Selected Model';
   output;
 
   Function = 'Line';           x1        = .;     
   X1Space  = 'DataValue';      X2Space   = X1Space;
   xc1      = '9+CrBB';         xc2       = '8+CrRuns*CrRuns'; 
   y1       = 94;               y2        = 94;          
   output;
 
   Function = 'Rectangle';      Y1Space   = 'WallPercent';
   Anchor   = 'BottomLeft';     y1        = 10;
   Height   = 80;               Width     = 0.6;
   output;
 
   ID       = 'lo3';            Width     = 100;              
   Function = 'Text ';          Label     = 'Selected Value';
   X1Space  = 'DataPercent';    Y1Space   = X1Space;
   Anchor   = 'Left';           TextColor = 'Blue';
   x1       = 86;               y1        = 84; 
   output;
 
   Function = 'Arrow';          LineColor = 'Blue';
   X1Space  = 'DataValue';      X2Space   = X1Space;
   xc1      = '9+CrBB';         xc2       = '12+CrHits*CrHits'; 
   y1       = 4;                y2        = 83;
   DiscreteOffset = .1;         x1        = .;     
   output;
run;

This step creates a data set with five observations:
   1) the text 'Coefficients for the Selected Model'
   2) a line from the text to the rectangle
   3) a rectangle with rounded corners that surrounds the coefficients for the selected model
   4) the text 'Selected Value'
   5) an arrow pointing from the text to the selected value

This SG annotation data set has many variables and options. More will be said about the SG annotation data set after the graph is displayed. Fully explaining SG annotation is beyond the scope of this blog.

The template processing macro, %ProcAnnoAdv, is next:

%macro procannoadv(data=, template=, anno=anno, document=mydoc, adjust=,
                   overallanno=1);
 
   proc document name=&document;
      ods exclude properties;
      ods output properties=__p(where=(type='Graph'));
      list / levels=all;
   quit;
 
   data _null_;
      set __p;
      call execute("proc document name=&document;");
      call execute("ods exclude dynamics;");
      call execute("ods output dynamics=__outdynam;");
      call execute(catx(' ', "obdynam", path, ';'));
   run;
 
   proc template; 
      source &template / file='temp.tmp';
   quit;
 
   data _null_;
      infile 'temp.tmp';
      input;
      if _n_ = 1 then call execute('proc template;');
      %if &adjust ne %then %do; %&adjust %end;
      call execute(_infile_);
      if &overallanno and _infile_ =:     '   BeginGraph' then bg + 1;
      else if not &overallanno and index(_infile_, '   layout overlay') 
         then lo + 1;
      if bg and index(_infile_, ';') then do;
         bg = 0;
         call execute('annotate;'); 
      end;
      if lo and index(_infile_, ';') then do;
         lo = 0;
         lonum + 1;
         call execute(catt('annotate / id="lo', lonum, '";'));
      end;  
   run;
 
   data _null_;
      set __outdynam(where=(label1 ne '___NOBS___')) end=eof;
      if nmiss(nvalue1) and cvalue1 = '.' then cvalue1 = ' ';
      if _n_ = 1 then do;
         call execute("proc sgrender data=&data");
         if symget('anno') ne ' ' then call execute("sganno=&anno");
         call execute("template=&template;");
         call execute('dynamic');
      end;
      if cvalue1 ne ' ' then 
         call execute(catx(' ', label1, '=',
                      ifc(n(nvalue1), cvalue1, quote(trim(cvalue1)))));
      if eof then call execute('; run;');
   run;
 
   proc template; 
      delete &template;
   quit;
%mend;

This macro is similar to the %ProcAnno macro that I provided and explained in my previous blog. The macro adds the ANNOTATE statements to the template and calls PROC SGRENDER with the appropriate dynamic variables specified. You can specify a macro name in the ADJUST= argument to insert code into the macro to edit the graph template. In this case, you will add the macro %Tweak. You can set the ANNO= option to blank to prevent PROC SGRENDER from specifying the SGANNO= option. By default, when OVERALLANNO=1, a single ANNOTATE statement is added to the template (as in my previous blog). In this example, OVERALLANNO=0 and an ANNOTATE statement is added to each layout overlay. The following statements are added to the template:

   annotate / id="lo1";
   annotate / id="lo2";
   annotate / id="lo3";

You can use the three IDs in your annotation data set to modify each of the three overlays. In this template, the first layout is unconditionally used and either the second and or third layout is conditionally used. In this example, the first and third layouts are used.

The following step runs the macro and creates the modified graph:

%procannoadv(data=cp2, template=Stat.GLMSELECT.Graphics.CoefficientPanel,
             adjust=tweak, overallanno=0)

glms1b

This SG annotation data set is large. There are many variables, and varying subsets are used for each annotation. The output shown in the links below list the relevant subsets.

Click to see a subset of observation 1

Observation 1 positions text in the LAYOUT OVERLAY labeled 'lo1'. It specifies coordinates based on the percentage of the data area. The string is anchored on the right, next to the line.

Click to see a subset of observation 2

Observation 2 draws a line in the LAYOUT OVERLAY labeled 'lo1'. The X coordinates are in the space 'DataValue'. Since the X axis variable is a character variable, the variables x1c and x2c are used. When the variables x1 and x2 exist for other observations, they must be set to missing for this observation. The Y coordinates are in the space 'DataPercent', and the variables y1 and y2 provide coordinates. Each pair of X and Y coordinates specifies one end of the line. The discrete offset of -0.3 moves the line 0.3 data units to the left from the coordinates specified in (x1c, y1) and (x2c, y2).

Click to see a subset of observation 3

Observation 3 draws a rounded rectangle in the LAYOUT OVERLAY labeled 'lo1'. There is one X and one Y coordinate. The X coordinate is in the data space and the Y coordinate is in the wall percentage space. The rectangle is anchored in the bottom left (that is where drawing starts), then it is drawn with a height of 80% of the wall and a width of 0.6 times the width of a discrete cell. The discrete offset of -0.3 moves the rectangle 0.3 data units to the left from the coordinates specified in (x1c, y1). The CornerRadius variable controls the degree of rounding. The result is a rounded rectangle centered around the reference line for the selected step.

Click to see a subset of observation 4

Observation 4 positions text in the LAYOUT OVERLAY labeled 'lo3'. Notice that the layout has changed with this observation. The text is anchored on the left, next to the arrow.

Click to see a subset of observation 5

Observation 5 draws an arrow in the LAYOUT OVERLAY labeled 'lo13'. The X coordinates are in the space 'DataValue'. Since the X axis variable is a character variable, the variables x1c and x2c are used. When the variables x1 and x2 exist for other observations, they must be set to missing for this observation. The Y coordinates are in the space 'DataPercent', and the variables y1 and y2 provide coordinates. Each pair of X and Y coordinates specifies one end of the arrow. The discrete offset of 0.1 moves the arrow 0.1 data units to the right from the coordinates specified in (x1c, y1) and (x2c, y2). The Scale variable scales the size of the arrowhead. The Direction variable points the arrow in (toward x1c and y1).

In summary, this example builds on examples in my previous blogs to show you a small part of the flexibility of ODS Graphics. You can modify graph templates, dynamic variables, and you can use SG annotation to customize the graphs that analytical procedures produce. While not shown here, you can also change styles. You can even (cringe!) modify the data object.

Post a Comment

Bar Chart on Interval Axis - SAS 9.40M3

When we first released GTL and SG Procedures back with SAS 9.2, Box Plots and Bar Charts would always treat the category axis as discrete.  We realized soon enough that we need to support box plots on scaled interval axes for many clinical applications, and this was added in SAS 9.3.

Data2The same is now true for Bar Chart.  With SAS 9.40M3, a bar chart can now display data on a scaled interval axis like Linear, Time or Log.  For this article, I created a simple data set of simulated revenues by region and date.  The dates are 01Jan2014, 07Jan2014, 15Jan2014, 01Feb2014, 01Mar2014 and 01Apr2014.  A few observations for the data set are shown on the right.

title 'Revenues by Date';
  proc sgplot data=Sales noborder cycleattrs;
  vbar date / response=sales nostatlabel dataskin=pressed;
  xaxis type=time display=(nolabel noline);
  yaxis grid;
run;

IntervalBarThe graph on the right shows the Revenues by Date and the x-axis TYPE has been set to "Time".  Each bar is now drawn in the correct scaled position along the x-axis displaying the summarized value for Revenues.

One might ask what is the benefit of this over a needle plot which can do something similar.

  • A needle plot does not summarize the data by the category value.
  • The needles are plotted as lines with a thickness of 1 pixel, or as set by user.
  • The bar does a good job of setting the bar width, using the default 85% of the "effective" midpoint spacing, which is determined by the minimum spacing between the values on the x-axis.  This behavior is very similar the the box plot on interval axis and works well for cluster groups.

IntervalStackedBarAnother bar feature that just works in this case is the default stacked and cluster groups.  For the graph on the right, the data is summarized by date and region and plotted as stacked bars on the scaled time axis.

By default (without setting the TYPE option), the behavior remains the same as before, and the bar chart will still force a discrete axis.  There will be no change for your existing programs.

Note the legend at the bottom.  I have used the new SAS 9.40M3 Legend options FILLHEIGHT and the FILLASPECT to get larger skinned color swatches.

IntervalClusterBarFor cluster groups, the cluster width is determined by the minimum distance between the values on the x-axis.  Now the bar widths are correctly sized to fit all the four regions in the "Effective" midpoint spacing.  Since the smallest interval is 7 days on the left side of the graph, the available spacing for each cluster is determined by that.  To improve the clustering effect, I have set clusterwidth=0.75.

Also, I have customized the legend swatches to a thinner and longer shape to match the thinner bars in the graph.  The new options allow you to do such customization.

IntervalClusterBarLineThe interval axis VBAR can also be used in conjunction with the VLINE to create a BarLine graph on a time axis.  Here, the bars show Sales by date for two regions.  The line shows the Target by date for the two regions.  Both statements use GROUPDISPLAY=CLUSTER to the right colored lined join the same colored bars.

Recently there was a question on the communities web site from a user who wanted to plot a bar chart with target values.  The graph above displays the target values using a VLINE plot.  However, a full line overlay may not be desirable, and we may want individual markers over each bar to indicate the target for just that bar.  We can extend the above technique to do something like the graph shown below.  Click on the graph for a detailed view.  I would agree the target markers may need more emphasis.

IntervalClusterBarTargetNormally, one cannot overlay a "basic" plot like scatter on a VBAR statement to display the target values.  You would have to first summarize the data using PROC MEANS, and then use a combination of the VBarParm overlaid with a scatter or other basic plot.

For this graph, we have overlaid the VBAR with a VLINE, and turned on markers and turned off the display of the line by setting the following option.

vline date / response=target group=region  markers lineattrs=(thickness=0);

This will draw the default marker on each bar, and we can change the marker shape to one of the supported shapes.  However, since there is no "Bar" shape available, we have used the SYMBOLCHAR statement to define a custom marker called "Line" using one of the characters from the Unicode font.

symbolchar name=line char='2012'x / voffset=0.08;

Note the use of the VOFFSET=0.08.   If you overplot a regular marker on the line marker defined using SYMBOLCHAR, you will see they do not line up exactly.  This is because in the character glyph defined for value '2012'x, the line is not exactly in the middle of the glyph bounding box.  This can convey the wrong value to the graph consumer.  So, I have used the VOFFSET to shift the line up a bit to ensure it is exactly lined up with a regular marker.  The voffset feature was added just such cases.

Now, you have one more tool in your tool box to create effective graphs.  The interval bar chart should fill a gap and make it easier to create graphs.

Full SAS 9.40M3 Code:  Interval_Bar

 

 

Post a Comment

Annotating graphs from analytical PROCs

Dynam3DropV

There are many ways to modify the graphs that SAS creates. Standard graph customization methods include template modification (which most people use to modify graphs that analytical procedures produce) and SG annotation (which most people use to modify graphs that procedures such as PROC SGPLOT produce). However, you can also use SG annotation to modify graphs that analytical procedures produce. Graphs are constructed from a matrix of information (the ODS data object), layout instructions (a graph template), instructions for the overall appearance (a style template), and dynamic variables. Procedures create dynamic variables to send values (that only become known at procedure run time) to graph templates. These values include statistics, variable names, variable labels, and so on. You cannot fully create, re-create or modify a graph without all four components. On July 31, I wrote about how you can create graphs by using ODS Graphics and then modify the dynamic variables and display the results by using PROC DOCUMENT. Today, I will show you how to capture dynamic variables, modify them, and create a modified graph by using PROC SGRENDER instead of PROC DOCUMENT. This approach enables you to use SG annotation to modify graphs that analytical procedures create.

Let's begin by running PROC REG, displaying the diagnostics panel, and outputting the data object to a SAS data set:

ods graphics on;
proc reg data=sashelp.class;
   ods select diagnosticspanel;
   ods output diagnosticspanel=dp;
   model weight = height;
quit;

Click on graphs to enlarge.

Graph

You might consider a naive approach to re-creating the diagnostics panel from the data object and the graph template by using PROC SGRENDER as follows:

proc sgrender data=dp template=Stat.REG.Graphics.DiagnosticsPanel;
run;

Part Missing

For some graphs, this might completely work (if there are no dynamic variables) or it might completely fail (for example, if there is one graph statement and a critical part depends on dynamic variables). The preceding step partially works. In this example, the statistics table is completely missing, part of the title is missing, and some reference lines are missing.

You can run the following step to create the graph, output the data object to a SAS data set, and capture the dynamic variables in an ODS document.

ods document name=MyDoc (write);
proc reg data=sashelp.class;
   ods select diagnosticspanel;
   ods output diagnosticspanel=dp;
   model weight = height;
quit;
ods document close;

You can list the contents of the ODS document as follows:

proc document name=MyDoc;
   list / levels=all;
quit;

You can store the names of the dynamic variables and their values in a SAS data set as follows:

proc document name=MyDoc;
   ods output dynamics=outdynam;
   obdynam \Reg#1\MODEL1#1\ObswiseStats#1\Weight#1\DiagnosticPlots#1\DiagnosticsPanel#1;
quit;

The path on the OBDYNAM statement is copied from the listing of the contents of the ODS document.

The next several steps process both the data set of dynamic variables and the graph template so that a subsequent PROC SGRENDER step can re-create the graph. Before I show you those steps, I need to explain the syntax for dynamic variables. Graph templates that procedures use often have a DYNAMIC statement that lists dynamic variables. Graph templates that you write can use dynamic variables, but they can also get dynamic information through macro variables. You can use an MVAR statement to provide character macro variables, and you can use an NMVAR statement to provide macro variables whose values are processed as numbers. The next steps process the dynamic variables and their values, output them to macro variables, and modify the graph template to use MVAR and NMVAR statements instead of a DYNAMIC statement.

The following step preprocesses the data set of dynamic variables:

data dynamics;
   length label1 $ 32;
   set outdynam;
   label1 = upcase(label1);
   if label1 ne '___NOBS___';
run;

Variable names are upper cased, and the automatic dynamic variables that contain the number of observations in the data object columns are discarded.

The following step writes the graph template to a file:

proc template; 
   source Stat.REG.Graphics.DiagnosticsPanel / file='temp.tmp';
quit;

If you need to do ad hoc template modifications, you can do them before you perform the preceding step or build them into the subsequent DATA step that processes the template.

The following step reads the file that contains the graph template, identifies the beginning of the DYNAMIC statement, and extracts the names of all of the dynamic variables:

data d(keep=label1);
   infile 'temp.tmp';
   input;
   length label1 $ 32;
   if _infile_ =: '   dynamic ' then do;
      d + 1;
      substr(_infile_, 1, 10) = ' ';
      end;
   if d then do;
      do i = 1 to 128 until(label1 eq ' ');
         label1 = upcase(scan(_infile_, i, ' ;'));
         if label1 ne ' ' then output;
         end;
      end;
   if d and index(_infile_, ';') then stop;
run;

This step stops when it hits the semicolon at the end of the DYNAMIC statement.

The following steps sort the two lists of dynamic variables so that they can be merged:

proc sort data=dynamics; by label1; run;
proc sort data=d;        by label1; run;

The following step merges the two dynamic variable lists and sets missing character values to ordinary blank missing:

data dynamics(drop=label2 cvalue2 nvalue2);
   merge d dynamics;
   by label1;
   if nmiss(nvalue1) and cvalue1 = '.' then cvalue1 = ' ';
run;

The following step reads the template file again and modifies it:

data _null_;
   infile 'temp.tmp';
   input;
   if _n_ = 1 then call execute('proc template;');
   if _infile_ =: '   dynamic ' then do;
      substr(_infile_, 1, 10) = '*';
      do i = 1 to ndynam;
         set dynamics point=i nobs=ndynam;
         call execute(catx(' ', ifc(n(nvalue1), 'nmvar', 'mvar'), label1, ';'));
         end;
      end;
   call execute(_infile_);
   if _infile_ =: '   BeginGraph' then bg + 1;
   if bg and index(_infile_, ';') then do;
      bg = 0;
      call execute('annotate;');
   end;
run;

This step uses CALL EXECUTE to submit a PROC TEMPLATE statement, convert the DYNAMIC statement to a comment, submit an unmodified version of every other template statement, add an ANNOTATE statement after the BEGINGRAPH statement (to enable subsequent SG annotation), and submit a series of NMVAR and MVAR statements. There are various ways to annotate by using GTL. This is the simplest, and it enables you to to use annotate coordinates in graph percentage units. See the SG Annotation documentation for other options. The following step is not necessary, but it shows the modified template:

proc template; 
   source Stat.REG.Graphics.DiagnosticsPanel;
quit;

The following step creates all of the macro variables that the NMVAR and MVAR statements need:

data _null_;
   set dynamics;
   if label1 = '_SHOWEDF' then cvalue1 = '0';
   call symputx(label1, cvalue1);
run;

This step also modifies one of the dynamic variables. It sets _SHOWEDF to 0 to suppress the display of the error degrees of freedom in the statistics table. (You can instead do this directly in PROC REG.) The following steps create the diagnostics panel from the data set made from the data object, the modified graph template, and all of the dynamic variables (now stored in macro variables):

proc sgrender data=dp template=Stat.REG.Graphics.DiagnosticsPanel;
run;

Dynam3DropV

Now you can use SG annotation to modify the graph. This is illustrated in two simple examples. The first example adds a date to the bottom right corner of the graph:

data anno;
   Function = 'Text'; Label = 'Saturday, July 25, 2015';
   Width = 100;    x1 = 99;    y1 = .1;    Anchor = 'Right';    TextColor = 'Red';
run;
 
proc sgrender data=dp sganno=anno
              template=Stat.REG.Graphics.DiagnosticsPanel;
run;

Dynam4Ann1

The second example also adds a watermark across the graph:

data anno;
   length Label $ 40;
   Function = 'Text';     Label     = 'Saturday, July 25, 2015';
   Width    = 100;        x1        = 99;   y1 = .1;        
   Anchor   = 'Right';    TextColor = 'Red';
   output;
 
   Label = 'Confidential - Do Not Distribute';
   Width = 150;           x1        = 50;   y1     =  50;   Anchor = 'Center';
   Transparency = 0.8;    TextSize  = 40;   Rotate = -45;      
   output;
run;
 
proc sgrender data=dp sganno=anno
              template=Stat.REG.Graphics.DiagnosticsPanel;
run;

Dynam5Ann2
For more information, see the SG Annotation documentation.

Like most things in SAS, there is more than one way to approach a problem. The following step combines all of the steps above that follow the creation of the OUTDYNAM data set (except the annotate data set creation step). The first step adds the ANNOTATE statement to the template:

data _null_;
   infile 'temp.tmp';
   input;
   if _n_ = 1 then call execute('proc template;');
   call execute(_infile_);
   if _infile_ =: '   BeginGraph' then bg + 1;
   if bg and index(_infile_, ';') then do;
      bg = 0;
      call execute('annotate;');
   end;
run;

Other than that, the template is not modified. The following step generates and runs the PROC SGRENDER step:

data _null_;
   set outdynam(where=(label1 ne '___NOBS___')) end=eof;
   if nmiss(nvalue1) and cvalue1 = '.' then cvalue1 = ' ';
   if _n_ = 1 then do;
      call execute('proc sgrender data=dp sganno=anno');
      call execute('template=Stat.REG.Graphics.DiagnosticsPanel;');
      call execute('dynamic');
   end;
   if label1 = '_SHOWEDF' then cvalue1 = '0';
   if cvalue1 ne ' ' then do;
      call execute(catx(' ', label1, '=',
                   ifc(n(nvalue1), cvalue1, quote(trim(cvalue1)))));
   end;
   if eof then call execute('; run;');
run;

The results match the previous graph. Instead of processing two lists of dynamic variables, this step runs PROC SGRENDER along with a customized DYNAMIC statement that populates the dynamic variables with values. This approach has the advantage of requiring less code. However, the final PROC SGRENDER step is entangled with the processing of dynamic variables. You might prefer to process the dynamic variables and then have a simple PROC SGRENDER step you can run each time that you want to try a new modification of the graph. Either way, SAS provides you the flexibility that you need to modify a graph.

One final example modifies the graph template as well to provide the same formatting for the R square and the adjusted R square:

data _null_;
   infile 'temp.tmp';
   input;
   if _n_ = 1 then call execute('proc template;');
 
   i = index(_infile_, 'BEST6.');
   if i and (index(_infile_, '_ADJRSQ') or index(_infile_, '_RSQUARE'))
      then substr(_infile_, i, 6) = '6.4';
 
   call execute(_infile_);
   if _infile_ =: '   BeginGraph' then bg + 1;
   if bg and index(_infile_, ';') then do;
      bg = 0;
      call execute('annotate;');
   end;
run;
 
data _null_;
   set outdynam(where=(label1 ne '___NOBS___')) end=eof;
   if nmiss(nvalue1) and cvalue1 = '.' then cvalue1 = ' ';
   if _n_ = 1 then do;
      call execute('proc sgrender data=dp sganno=anno');
      call execute('template=Stat.REG.Graphics.DiagnosticsPanel;');
      call execute('dynamic');
   end;
   if label1 = '_SHOWEDF' then cvalue1 = '0';
   if cvalue1 ne ' ' then 
      call execute(catx(' ', label1, '=',
                   ifc(n(nvalue1), cvalue1, quote(trim(cvalue1)))));
   if eof then call execute('; run;');
run;

Dynam6Fmt1

The first step uses an IF statement to change the BEST6. format to a 6.4 format for the R square and the adjusted R square. Of course you do not need to modify templates in a DATA step, but this template is so large that it is hard to show other ways to change it.

The following step deletes the modified template:

proc template; 
   delete Stat.REG.Graphics.DiagnosticsPanel;
quit;

In summary, you can capture graphs that analytical procedures create, modify the graph template, modify the dynamic variables, and perform additional modifications by using SG annotation.

Click here for a full example that uses a macro to automate most of the steps.

For more information:
SG Annotation
ANNOTATE Statement
Graph Template Language
CALL EXECUTE

Post a Comment

Customize Legend Entries - SAS 9.40M3

We all want to customize our graphs just so, and have our personal preferences.  Over the past few releases SG Procedures and GTL have added options to customize the look and feel of our graphs.  In this article, I will describe new ways in which you can customize your legends.  We will also see some new visual options.

LegendDefaultHere is a common graph with the default legend.  The graph displays a quadratic fit for MSRP by Horsepower, using the REG statement with the CLI and CLM options and DEGREE=2.

Note the default legend shows a small color swatch for the 95% confidence band, and two line elements for the Prediction limits and the fit.  The line elements are rather long, and designed to be able to represent any of the line patterns that are supported.

LegendLineIn this case,  we have only two line patterns, one solid and one dashed.  So, it is not necessary to use such a long line to represent the lines.  Now, with SAS 9.4M2, the lengths of the line segments in the legend can be controlled as I have done here using the LINELENGTH option:

keylegend / linelength=32;

This makes for a much better legend.  This normally makes sense only when the line patterns are short, or in case of a grouped plot, you are using a "Color" priority style like HTMLBlue.   Now we have addressed the line segment length issue, what about the fill color swatch?

With SAS 9.40M3, new options have been added to the KEYLEGEND statement to provide for more customization of the color swatch.  The default color swatches can be smaller than some of you may want, and when using skins, the small swatch is unable to properly represent the colors in the graph.  This was also brought to our attention by Dr. LeRoy Bessler during Dan's presentation at SGF 2014 in Washington DC.  And then there is always individual personal preferences that come into play.  To address these cases, SGPLOT provides the new  SCALE, FILLHEIGHT and FILLASPECT options.

LegendLineScaleIf all you want is to increase (or decrease) the size of the color swatch, and don't want a particular size, you can use the SCALE option as shown on the right.  Here I have used the SCALE option to increase the size of the fill swatchs:

keylegend / linelength=32 scale=1.2;

Now, the color swatch is a bit bigger.

LegendLineAspect3You can fully customize the size and shape of the color swatches using the FILLHEIGHT and FILLASPECT options.  Now, we have set the height of the swatch to 2.5% of the graph height and the aspect ratio to GOLDEN.  The golden ratio comes from observations of ratios in nature and also from Fibonacci sequence and is equal to 1.618.

keylegend / linelength=32 fillheight=2.5pct                                      fillaspect=golden;

FILLHEIGHT takes a dimension, so it can be pixels (px), percent (pct) inch, cm or mm.  All values are scaled by DPI.  FILLASPECT accepts a value greater than zero.  If the color swatch becomes too big, the legend will drop out.

DeathsUnicode3The example of the right uses swatches that are 2.5% high with an aspect of 2.5.  The bigger swatches provide more space to render the skinned areas.

As expected, GTL  provides the same options in the DISCRETELEGEND statement in the ITEMSIZE options bundle.

Those with a keen eye would have noted a few new visual possibilities.  SGPLOT now allows you to turn off the internal border of the wall, and control the axes lines to cover only the range of the data using the following option:

styleattrs axisextent=data;

Now the x and y axis lines only extend over the data range.  In case of the x axis, the line only goes from the min to max tick mark.  The y axis line stops at y=0 tick and extends to the actual data value on the high side.  In the past, some users have expressed a preference for such treatment of the axes.  Click on any of the fit plots above to see it in more detail.

Full SAS 9.40M3 Code:  Legend

Post a Comment

Modifying Dynamic Variables in ODS Graphics

If you are familiar with the output delivery system (ODS), then you know that you can modify the tables and graphs that analytical procedures display by modifying table and graph templates. Perhaps less familiar is the fact that you can also modify dynamic variables.

Tables and graphs are constructed from a matrix of information (the ODS data object), layout instructions (a table or graph template), instructions for the overall appearance (a style template), and dynamic variables. Procedures use dynamic variables to control certain details of how tables and graphs are created. If a table has a header that says "95% Confidence Limits", chances are the "95" was set by a dynamic variable. You can set this percentage as a procedure option, so the procedure writer cannot specify the "95" directly in the table template.  If a graph might or might not contain a loess fit, chances are a dynamic variable controls whether the loess fit is displayed or not.  More generally, if some portion of a table or graph is conditionally displayed, it is probably controlled by a dynamic variable. Dynamic variables are listed in the DYNAMIC statement in table and graph templates.

You will need to use the ODS document if you want to modify dynamic variables. The ODS document is a repository of information. You can open an ODS document, run one or more procedures, store all of the output (tables, graphs, notes, titles, footnotes, and so on) in the document, then replay some or all of the output in any order that you choose. For example, SAS/STAT documentation uses the ODS document to capture output from the code displayed in the documentation and then replay subsets of the output. This enables SAS documentation to display output, then add explanatory text, then display more output and more text, and so on.

The following steps capture a dendrogram in an ODS document and then replay it:

ods graphics on;
ods document name=MyDoc (write);
proc cluster data=sashelp.class method=ward pseudo;
   ods select dendrogram;
   id name;
run;
ods document close;
 
proc document name=MyDoc;
   replay cluster\dendrogram;
quit;

Both steps produce the same dendrogram:

dyn1a

The ODS DOCUMENT statement opens an ODS document named MyDoc. Since the WRITE option is specified, a new document is created each time this statement is executed, and any old content is discarded.

The following step lists the contents of the ODS document:

proc document name=MyDoc;
   list / levels=all;
quit;

The document contents are displayed here.

This document contains one directory and a graph. There would be more entries in the document if the ODS SELECT statement had not been specified in PROC CLUSTER. Both the data object and the dynamic variables are stored in the ODS document. (Templates are stored in item stores that SAS provides.) You can store the dynamic variables in a SAS data and display them as follows:

proc document name=MyDoc;
   ods output dynamics=dynamics;
   obdynam \Cluster#1\Dendrogram#1;
quit;
 
proc print noobs data=dynamics;
run;

The path specified on the OBDYNAM statement was copied from the results produced by the LIST statement. It is important to list the contents of the ODS document and then copy the path from the listing. Some procedures create multiple tables with the same name (for example, the ModelANOVA table in PROC GLM). The ODS document provides the precise path that you need to display each.

The data set is displayed here.

PROC CLUSTER determines the height and width of the dendrogram at run time after evaluating the number of rows in the graph. These sizes are stored as dynamic variables. The dynamic variable DH sets the design height and DW sets the design width. You can modify the values of the dynamic variables and then use them to replay the graph. The following steps create a smaller dendrogram:

data dynamics2;
   set dynamics;
   if label1 = 'DH' then cvalue1 = '400PX';
   if label1 = 'DW' then cvalue1 = '400PX';
run;
 
proc document name=MyDoc;
   replay \Cluster#1\Dendrogram#1 / dynamdata=dynamics2;
quit;

dyn1c

You can additionally modify both the graph and the style templates for a fuller customization of the graph or table. In summary, ODS provides ways to modify every aspect of how a table or graph is displayed.

Post a Comment

Unicode in Formatted Data - SAS 9.40M3

SAS 9.4 Maintenance release 3 was released on July 14.  The ODS Graphics procedures include many important, useful and cool features in this release, some that have been requested by you for a while.  In the next few articles, I will cover some of these features.  Last time I covered the new HeatMap statement useful for Big Data Visualization.

One cool and useful new features is the support for Unicode values in SAS Formats.  For long, certain parts of the graph could have Unicode text.  These included user provided text strings for Titles, Footnotes, Entries.  These support Unicode characters, and also commands such as SUP and SUB to make any character string into a sub or super script.

Other items like Axis Labels, etc. support Unicode strings but not the commands like SUB and SUP.  However, there was no way to have data strings (from data set) to be displayed on the axis, data labels or legends.  Till now, that is.  Now, with SAS 9.40M3 you can have data values that can be displayed in the graph as Unicode strings using the user defined formats.

Deaths1Here is a simple example of a graph showing the counts of deaths by Age Group and Death Cause for the sashelp.heart data set.  I have created a format to break up the age values into four groups.  Here is the code:

proc format;
  value agegroup
    0 -< 40 = '< 40'
    40 -< 50 = '40 < 50'
    50 -< 60 = '50 < 60'
    60 -< high = '>= 60'
  ;
run;

The code for the graph is shown below.  The graph is shown on the right.  Click on graph to see the full view.  Note, I have added some annotation around the last tick value ">= 60", which is the formatted label for the last age group.  The full code, including the annotation is shown in the link at the bottom.

title 'Counts by Age Group and Death Cause';
proc sgplot data=sashelp.heart(where=(deathcause ne 'Unknown')) sganno=annoAxis;
  format ageatdeath agegroup.;
  vbar ageatdeath / group=deathcause groupdisplay=cluster nooutline
       baselineattrs=(thickness=0) dataskin=pressed filltype=gradient;
  keylegend / location=inside across=1 title='';
  xaxis display=(nolabel noticks);
  yaxis label='Count' grid;
run;

Now, with SAS 9.40M3, I can include a Unicode string in the label for the last age group as shown below.  Here, I have used the unicode value '2265' for the "greater than or equal" symbol.  Note the use of the full default ODS escape character string (*ESC*).  This must be used in the format syntax, and a user defined escape char cannot be used.

proc format;
  value agegroupUnicode
    0 -< 40 = '< 40'
    40 -< 50 = '40 < 50'
    50 -< 60 = '50 < 60'
    60 -< high = "(*ESC*){unicode '2265'x} 60"
 ;
run;

DeathsUnicodeNow, running the same SGPLOT code again with the new format name produces the graph on the right.  Click on the graph to see the full image.  Now, the highlighted tick value uses the Unicode symbol.

This is very convenient, as the only alternative (pre SAS 9.40M3) is to replace the tick value using annotate, which is a messy and non scalable process.  Now, the value is what you want, and will automatically adjust to changing data, sort, graph orientation, etc.

DeathsUnicode2To illustrate this point, the graph on the right switches the category and group roles.  Now, age group is used as a group, so the formatted value for the fourth group is displayed in the legend.  Using this new technique, this happens automatically, no extra work is required.

It is still not possible to send entire long Unicode strings in the data set itself.  However, most of the use cases can be handled by creating a format that includes the unicode value.

Aside:  Personally, I don't like to see grid lines showing through the transparent bars.  I have prevented that in this graph.  Can you see how I did that in the linked code?

I know some of you already have SAS 9.40M3.  Please give this a spin to see how well this works for you and the mileage you get from  it.  You still cannot use the SUP and SUB commands to do something like Alpha ** Beta, but many simple numeric powers and subscripts are available in the Unicode fonts.  Please chime in with your comments.

Full SAS 9.40M3 program:   Unicode 

Post a Comment

Big Data Visualization

Big data is a popular topic, with multiple articles about the analysis of the same.  Today, "Big Data" is measured in multiple of Tera Bytes, and SAS provides special software for analysis and visualization of Big Data - Visual Analytics.

HeatMapWhen data is very big, it may be meaningless, let alone inefficient, to plot a scatter plot of such data. This is especially true when the data is on a server, and we want to create a X-Y plot on a local computer.  Bringing all the data down to plot is prohibitive, and the result is not very helpful.

With the release of SAS 9.40M3 this week, the SGPLOT procedure introduces the HEATMAP statement, a plot type suited for visualization of bigger data.  In this case, the data can be analyzed and binned into discrete bins along X and Y axis, and the results displayed using a color gradient.

The graph above shows a heat map of the distribution of the subjects in a study for Diastolic and Systolic blood pressure.  Admittedly, this graph is of a relatively small data set "sashelp.heart".  This data set has about 5200 observations, which is small from a "Big Data" perspective.  But for our purposes, we can assume we have a data like this for millions of subjects or billions of credit card transactions.  The binning of the data is done on a fast server, along with the computation of the regression fit.  Only the "graphical" information for drawing the bins and the curve are sent to the renderer to creating this graph.

title 'Distribution of Blood Pressure';
proc sgplot data=sashelp.heart;
  heatmap x=systolic y=diastolic / colormodel=(white green yellow red)
          nxbins=40 nybins=30 name='a';
  reg x=systolic y=diastolic / nomarkers degree=2 legendlabel='Fit';
  gradlegend 'a';
  keylegend / linelength=20 location=inside position=topright noborder;
run;

HeatMapCLThis graph now allows us to view the blood pressure distribution of the subjects in a study.   The Heat Map statement works seamlessly with most other statements available in the SGPLOT procedure, so we can plot a regression plot on the heat map as easily as we did on the scatter plot.  In the graph above, I have set a custom color model for the display of the frequency data, starting with white to green to yellow to red, as displayed in the gradient legend on the right.  A discrete legend is displayed identifying the Fit plot.  This results in a nice, clean graph.

We can go a step further, and display the confidence and prediction limits on the heat map as shown on the right.  Once again, the same options are used as would be in the case of a scatter plot.

NumHeatMapResponseFor both of these graphs, the X and Y axis represent continuous, numeric data.  The data is binned into a set number of bins by default as determined by the underlying analytical code.  Bin counts can be controlled as we we have done using the statement options.

Heat Maps are also useful to view response data for the binned data, as shown in the graph on the right.  Here, we have a heat map of weight by height of the subjects in the study.  However, now each bin now shows the Mean of the Cholesterol level for all the subjects in the bin.  This show us the associations between Cholesterol by two analysis variables.

Another interesting use case would be to visualize the credit card balance for all customers of a bank by family income and value of the mortgage.

DiscreteHeatMapResponseThe SGPLOT heat supports numeric axes and discrete axes, and any combination of the two.  The graph on the right displays the mean MSRP value of the cars by Type and Make.  Both axes are discrete, and each bin displays the mean value of MSRP for all the observations in the bin.

Heat Maps have been supported in GTL, and you can find previous articles on GTL Heat Maps and Calendar Heat Maps.

SAS 9.4M3 code for Heat Maps:  HeatMap

Post a Comment

Row Lattice Headers

The SGPANEL procedure makes it easy to create graph panels that are classified by one or more classifiers.  The "Panel" layout is the default and it places the classifier values in cell headers at the top of each cell.

RowLatticeWhen using LAYOUT=Latice or RowLattice, the row headers are placed at the right side of each row, and the header text is rotated as shown in the example on the right.  The graph shown the distribution of Cholesterol and the panel variable (classifier) is "DeathCause".  Three cells are created and each cell displays the value of "DeathCause" on the right.

There are two obvious problems with this arrangement.

  1. Long text strings in the header are truncated, as for "Coronary Heart Disease" and "Cerebral Vascular Disease".
  2. The text strings are displayed in a vertical orientation that is hard to read.

Users have often complained about this, as admittedly, this is not a ideal arrangement.   The SAS code is included below.  Note the use of OFFSETMIN=0 for ROWAXIS, and usage of SPACING=10 for the cells.

proc sgpanel data=heart noautolegend;
  panelby deathcause / layout=rowlattice onepanel novarname spacing=10;
  histogram cholesterol;
  density cholesterol;
  rowaxis offsetmin=0;
  colaxis max=420;
run

RowLatticeInsetSAS 9.4M2 release provides a way to improve the arrangement of such a graph.  Here is a variation where I have suppressed the row headers entirely, and used the INSET statement to display the "DeathCause" values inside the cell at the top left.

The variable provided for the inset statement should have the values we want in each cell to be match merged with the panel by row variable.  In this case we are using the classifier variable itself.  Even though the column has the values repeated multiple times in the data, the value is drawn only once, and from the first observation only.

The NOHEADER option suppresses the row headers.  The INSET statement with column "DeathCause" inserts the text value into the top left of the cell.  In the case of this distribution plot, empty space is often available at the upper corners of the cell.  If not, you can add some offset to the top of the ROWAXIS.

proc sgpanel data=heart noautolegend;
  panelby deathcause / layout=rowlattice onepanel noheader spacing=10;
  inset deathcause / position=topleft nolabel;
  histogram cholesterol;
  density cholesterol;
  rowaxis offsetmin=0;
  colaxis max=420;
run;

To draw the eye to the classifier value, the inset can be highlighted by using a background color or a border on the INSET statement as shown below left. Below right we have a 2x3 panel, showing both the row and column classifiers as insets. Note, I have added the "Death Cause" first since it has long textual values. I also added a OFFSETMAX=0.15 to create some space at the top of each cell.

RowLatticeInset2

 

LatticeInset

Full SAS 9.4M2 Code: Lattice

Post a Comment

Attributes Priority for the Inquiring Mind

When ODS Graphics was first released with SAS 9.2 in 2008, a conscious effort was made to create graphs that were consistent and aesthetically pleasing out of the box.  Features in the graph derive their visual attributes from the active Style.  When Group classifications are in effect, the different classification levels of the group variable are represented on the screen using the attributes from the GraphData1 - GraphData12 elements of the Style.

AttrPriority_ListingThese attributes were carefully designed so the 12 colors are distinct from each other. The groups use up to 11 line patterns and 7 marker symbols. For each group value, the color, marker symbol and pattern are derived sequentially from these lists of 12 colors, 11 patterns and 7 symbols.  So the first group level gets the first color, first pattern and the first symbol.  The second group level gets the second color, second pattern and the second symbol.  This goes on till we run out of the list of symbols (there are only 7).  So, the eighth group level will get the eighth color, eighth pattern and the first symbol.  This goes on in this manner so we can have 84 distinct colored symbols and 132 distinct colored patterns.

The graph above uses the LISTING style.  The list of marker symbols has been changed to include filled markers.  Here you can see the assignment of colors, line patterns and marker symbols for each of the three group values.  Click on the graph to see a higher resolution image.

ods listing style=listing;
title 'Style=Listing'; 
proc sgplot data=seriesGroup;
  styleattrs datasymbols=(circlefilled trianglefilled squarefilled);
  series x=date y=val / group=drug lineattrs=(thickness=2);
  scatter x=date y=val2 / group=drug filledoutlinedmarkers 
          markerfillattrs=(color=white); 
  keylegend / title='' location=inside position=topright across=1;
  xaxis display=(nolabel);
  yaxis display=(nolabel) integer values=(4 to 20 by 4);
  run;

Note the use of FilledOutlinedMarkers in the Scatter plot.  Also, I have used the SAS 9.4 STYLEATTRS feature to change the group symbols to the list of three filled symbols.

AttrPriority_HTMLBlueSoon it was perceived that it is not always necessary to change all the attributes of the element for each group value.  This was especially true for the line patterns.  When using a color Style, it was felt that it was not necessary to change both line color and pattern, but only the color till all colors from the list are used.

The graph above is created using the SAS 9.3 HTMLBlue style.   In this style, the cycling of the attributes (color, symbol and line pattern) is different from the LISTING style.  As you can see, all the three groups have solid line patterns and circle marker.  Only the line color is changed per group.  So, the first 12 group values get the 12 different colors from the Style, along with the first line pattern and first symbol.  The 13th group level will get the 1st color with the 2nd line pattern and 2nd symbol.  Most of the time we only have a handful of group levels, so only color change is seen.

I recall seeing a presentation where the presenter was baffled on why he was not seeing different marker symbols for his scatter plot when he ran his SAS code, but was seeing only circle markers with different colors.  This was because, while at home he was using SAS 9.2, the presentation laptop had SAS 9.3 with the default destination of HTML with the HTMLBlue style.  He had to change the style back to LISTING to see the different shaped markers.

This behavior of the different Styles is called Attribute Priority.  The default AttrPriority is NONE, meaning that all the attributes are cycled together as for the LISTING style.  HTMLBlue has AttrPriority=Color.  This means that only the color attribute is cycled first holding the symbol and pattern constant till all the 12 colors are use up.  Then, we go to the second symbol and second pattern and cycle through all 12 colors again.

While this behavior was first introduced in SAS 9.3, this AttrPriority behavior was internally implemented.  With SAS 9.3M1, the AttrPriority option was surfaced in the Style.  With SAS 9.4, AttrPriority option was surfaced in the ODS Graphics statement.

AttrPriority_HTMLBlue_NoneNow, with SAS 9.4, you can make any Style behave in any attribute priority you want by setting the AttrPriority= option in the ODS Graphics statement.  Here is the HTMLBlue style with the AttrPriority set to NONE.  Now, all the visual elements come from the HTMLBlue Style (except the overridden symbols), but now all the attributes are cycled together.  So, the 2nd group value now gets a dashed line pattern and the TriangleFilled symbol.

ods listing style=htmlblue;
ods graphics / attrpriority=none;
title 'Style=HTMLBlue (Attrpriority=None)'; 
proc sgplot data=seriesGroup;
  styleattrs datasymbols=(circlefilled trianglefilled squarefilled);
  series x=date y=val / group=drug lineattrs=(thickness=2);
  scatter x=date y=val2 / group=drug filledoutlinedmarkers 
          markerfillattrs=(color=white); 
  keylegend / title='' location=inside position=topright across=1;
  xaxis display=(nolabel);
  yaxis display=(nolabel) integer values=(4 to 20 by 4);
  run;

AttrPriority_Analysis_ColorHere is an example of the same graph with the ANALYSIS style with AttrPriority=COLOR.  Note, in this case, both line pattern and marker color are held constant while color changes.

Often, one really does want the colors and symbols to change with group level, but not the line pattern.  This could be another value for the AttrPriority option (future).  But currently, we have only provided for AttrPriority of NONE and COLOR.

AttrPriority_Analysis_SolidTo create graph where the colors and symbols for groups change but not the line pattern, you will have to use a AttrPriority=NONE and hold the pattern by setting it to SOLID in the series plot.  Sure, this is not as good as having a value for AttrPriority that could do that for you, but that will have to wait till there is a strong demand for it.  Note, in the graph on the right, the color and symbols are changing, but the line pattern is held constant by setting lineattrs=(pattern=SOLID) in the code.

Full SAS 9.4 Code:  AttrPriority

Post a Comment