Report from PharmaSUG 2013

The PharmaSUG 2013 conference in Chicago this week was awesome.  From the perspective of graphics, there was great interest in using SG Procedures, Designer and GTL for building clinical graphs.  It was nice to see many papers by users on how they are using these tools for creating graphs on a daily basis.  One presenter concluded her paper as follows:

"With ODS graphics in Base SAS 9.3, many commonly used graphs can be easily generated without the need to program or convert data for graph production using other software applications. As a result, clinical researchers, including PK scientists and phamacometricians, can focus on data exploration, analysis and interpretation, without struggling with programming languages. In practice, this user-friendly graphical tool offers benefits in a variety of areas, e.g. visual access to data as early in the process as possible; easily integrated in automation tools, etc., and provides great opportunities for a streamlined drug development process."

- Alice Zong, Paper # PharmaSUG 2013 - Paper  SP09 "SAS® 9.3: Better graphs, Easier lives for SAS programmers, PK scientists and pharmacometricians"

I love this sentence:  "As a result, clinical researchers, including PK scientists and phamacometricians, can focus on data exploration, analysis and interpretation, without struggling with programming languages."

Kevin Lee of Cytel said he always uses ODS Graphics Designer to make all his graphs.  It is good to see this interactive tool prove useful not only to the new graph user, but also to experienced graph programmers.

There was great interest in the various presentations on creating or modifying Survival Plots.   Multiple papers were presented on this topic, always a popular one.

I presented the paper "Patient Profile Graphs using SAS".  This was a late entry into the conference as a slot came open in the "Management and Support" track.  I was certainly afraid no one interested in Patient Profile graphs would notice this paper in this track.  So, I was very gratified to see a "standing room only" crowd of attendees.  I also appreciated the multiple suggestions I received from the programmers on how to further improve this graph.

The graphs AE and CM above are from the paper.  Note, the "zero" day value in both graphs is synchronized.   I will post an improved version soon.

Kriss Harris showed off his Venn Diagram poster created using the DRAWOVAL statement in the GTL syntax.  Here is Kriss pointing to some important feature of his graph.

The conference was on the Magnificent Mile on Michigan Ave.  It was nice to see the blooming tulips in the middle of the road, along with all the other nice flower beds all around.

 

Post a Comment

Clark Error Grid Graph

The SAS Global Forum conference last week was awesome.  From the perspective of graphics, there were more papers from uses on graphics and ODS graphics then in recent times.  I will post a summary shortly.

One of the interesting papers was "#113-2013 - Creating Clark Error Grid using SAS/GRAPH and Annotate..." by Yongyin Wang and John Shin of Medtronic Diabetes.  Here is the graph included in the paper:

This graph is created using SAS/GRAPH GPLOT procedure with Annotation to draw the lines representing the zones and the labels in the graph.  So naturally, I wanted to see how far I can get using the SGPLOT procedure without any use of annotation.

Step 1:  Create the data for drawing the zones:

The zone lines (or polygons) can be drawn using the SERIES plot.  Here is the graph.  I used the coordinates from the paper.   The grid lines are drawn using the SERIES plot with a group option.    See full code for the data.  Click on graph to see full resolution image.

SAS 9.3 SGPLOT code:

title 'Clark Error Grid';
proc sgplot data=grid noautolegend dattrmap=attrmap;
  series x=rfbg y=sbg / group=id lineattrs=graphdatadefault(color=gray) nomissinggroup;
  xaxis min=0 max=400 offsetmin=0 offsetmax=0 label='Reference Blood Glucose';
  yaxis min=0 max=400 offsetmin=0 offsetmax=0 label='Sensor Blood Glucose';
  run;

 Step 2:  Generate data and assign zones for each point.

Since I do not have access to the real data, I resorted to some tricks to simulate random data with a distribution that loosely follows the real data in the graph.  I used YongLin's equations to assign the zone values.  I also like Robert's idea of defining the map polygons, and using the proc GINSIDE to find the points in the polygons.  Here is the graph with the grids and simulated data using default group attributes.  The scatter plot with GROUP=zone is used to plot the data.

SAS 9.3 SGPLOT code:

title 'Clark Error Grid';
proc sgplot data=plotZone noautolegend;
  scatter x=x y=y / group=zone markerattrs=(symbol=circlefilled size=3);
  series x=rfbg y=sbg / group=id lineattrs=graphdatadefault(color=gray) nomissinggroup;
  xaxis min=0 max=400 offsetmin=0 offsetmax=0 label='Reference Blood Glucose';
  yaxis min=0 max=400 offsetmin=0 offsetmax=0 label='Sensor Blood Glucose';
  run;

 

Step 3:   Count the points in each zone, and use same color scheme as Yongyin's graph.

An Attribute Map is used to color the markers in each zone using the color scheme Yongyin's graph uses.  Points in each zone are counted, and the percent values shown in each zone using the SCATTER plot with MARKERCHAR option.

 

SAS 9.3 SGPLOT code:

/*--Define Attributes Map--*/
data attrmap;
  length id $1 value $1 markercolor $10;
  id='A'; value='A'; markercolor='cx00afdf'; linecolor='cx00afdf'; output;
  id='A'; value='B'; markercolor='cx00ef7f'; linecolor='cx00ef7f'; output;
  id='A'; value='C'; markercolor='gray'; linecolor='gray'; output;
  id='A'; value='D'; markercolor='pink'; linecolor='pink'; output;
  id='A'; value='E'; markercolor='red'; linecolor='red'; output;
run;
 
/*--Draw the Graph--*/
title 'Clark Error Grid';
proc sgplot data=plotZoneCount noautolegend dattrmap=attrmap;
  scatter x=x y=y / group=zone attrid=A markerattrs=(symbol=circlefilled size=3);
  series x=rfbg y=sbg / group=id lineattrs=graphdatadefault(color=gray) nomissinggroup;
  scatter x=xl y=yl / markerchar=label attrid=A;
  xaxis min=0 max=400 offsetmin=0 offsetmax=0 label='Reference Blood Glucose';
  yaxis min=0 max=400 offsetmin=0 offsetmax=0 label='Sensor Blood Glucose';
  run;

Step 4:   Draw boxes around the numbers.
We use the HIGHLOW plot statement with TYPE=BAR to draw white boxes with outline around each label so each label can be seen clearly.

SAs 9.3 SGPLOT code:

/*--Draw the Full Graph with text background--*/
ods graphics / reset antialiasmax=5700 width=6in height=4in imagename='ClarkErrorGrid_4';
title 'Clark Error Grid';
proc sgplot data=plotZoneCount noautolegend dattrmap=attrmap;
  scatter x=x y=y / group=zone attrid=A markerattrs=(symbol=circlefilled size=3);
  series x=rfbg y=sbg / group=id lineattrs=graphdatadefault(color=gray) nomissinggroup;
  highlow y=yl low=low high=high / group=zone type=bar outline fill fillattrs=(color=white)
          lineattrs=(pattern=solid thickness=1 color=black) ;
  scatter x=xl y=yl / markerchar=label group=zone attrid=A;
  xaxis min=0 max=400 offsetmin=0 offsetmax=0 label='Reference Blood Glucose';
  yaxis min=0 max=400 offsetmin=0 offsetmax=0 label='Sensor Blood Glucose';
  run;

If needed, the label backgrounds can be colored by zone.  As we can see,  this entire graph can be created using SGPLOT procedure, without need for any annotation.  Different plot statements can be used to achieve the results you need.  Of course SGANNO feature is available, but I try to avoid it as much as possible.

As usual, we learned something from this exercise.  We need a way to make the text labels in the graph clearly legible, without hiding any of the data (as far as possible).  Drawing a white box is one option, but doing this automatically will be useful.

Full SAS 9.3 Code:  Clark_Error_Grid

Post a Comment

Are you ready for SGF 2013?

The 2013 SAS Global Forum is around the corner in San Francisco and the anticipation is building. Early indications are that attendee registration is up from last year, and we are looking forward to a great conference starting Sunday, April 28.

It is great to see the large and diverse offering of papers on graphics from users.  This includes topics on graphs and reports using SG Procedures, GTL and SAS/GRAPH.    I also noticed that SGF 2013 online proceedings are already posted.   Here are some of the titles that I noticed that I plan to  attend.  You can view the contents here already.

Graph papers by users:

Posters:

Graph papers presented by SAS authors:

Super Demos on new SAS 9.4 features:

  • #SD501-2013 Auto Charts with ODS Graphics Designer.
  • #SD502-2013 Clinical Graphs Using SG Procedures.
  • #SD503-2013 What's New with SAS® 9.4 SG Procedures.
  • #SD504-2013 New Features in SAS® 9.4 Graph Template Language.

Pre-Conference Course (Sunday):

  • Creating Statistical Graphics in SAS

Wow!  Now that's a solid line up to keep any graphics programmer fully subscribed for the entire conference.  If I left out something, please chime in with the title.

SAS Demo Area:   In addition, come visit us in the SAS demo hall.  You can directly ask the developers your questions, vent your frustrations or simply stop by and chat with us over a drink.

 

Post a Comment

Attributes Map - 3 Range Attribute Map

In the previous two articles we discussed Discrete Attribute Maps, and how these can be used to ensure that group attributes like color are consistently mapped to group values regardless of their position in the data.

Now, let us take a look at the attributes map that allows you to do something similar with numeric ranges, the Range Attributes Map.

Normally, when you assign a color to a numeric value for a heat map or contour plot, the default three color model from the active style is used to map the colors to the values.  The lowest numeric value of the variable is always mapped to the StartColor and the highest numeric value is always mapped to the EndColor.  The mean value is mapped to the NeutralColor.  For a range of values, here is what you will get:

This graph uses the  heat map parm statement in SAS 9.3 to draw the graph.  The color response is assigned the temperature value resulting in this graph.  Here, the smallest value in the data is mapped to a shade of blue and the highest to a shade of red.  The color white represents the mean value, whatever that may be.

When you run this graph again with different data, the color mapping will change to map to the new data range of the value column.  So, the question is:

  • How can we ensure that the color mapping stays consistent across different data ranges?
  • How can we ensure the value 32 is always discernible?

Here is where the Range Attribute Map comes in the picture.  Similar to the Discrete Attributes Map, it allows you to map specific color (or range) to specific data value range.  Using such a map for this data, here is what you will get:

Note the ranges of the response data are shown in the continuous color legend on the right.  In the range definition, we have set the color for 32 to white.  The adjascent ranges don't quite go all the way to white.  This seems to provide just enough hint to the eye, and the white color stands out in the graph.  In the legend it is labeled anyway.

Here is the GTL code snippet for the Range Attribute Map definition:

  rangeattrmap name='map';
    range -100 -< -60 / rangecolormodel=(purple purple);
    range -60 -< -32 / rangecolormodel=(purple blue);
    range -32 -< 0 / rangecolormodel=(blue lightblue);
    range 0 -< 32 / rangecolormodel=(lightblue cxf7f7f7);
    range 32 - 32/ rangecolor=white;
    range 32 -< 72 / rangecolormodel=(cxf7f7f7 gold);
    range 72 -< 100 / rangecolormodel=(gold red);
    range 100 -< 120 / rangecolormodel=(red darkred);
    range 120 -< 160 / rangecolormodel=(darkred darkred);
  endrangeattrmap;

Note, the option on the right side can be a single color for the numeric range, or a color model that can have more than one color.

This same graph can also be made using the bubble plot.

Some differences from the previous plot are:

  • A format is used to display 32 as 'Freezing'.
  • Note the difference in the axes and the markings on the continuous legend.

Discrete Attribute Maps can be used with SG Procedures using the DATTRMAP option.  However, SG Procedures do not as yet provide a way to set range attribute maps.  So, you have to use GTL for this feature.  If you are a user of SG Procedures and need this feature added to the procedures, please chime in with your use cases.

Full SAS 9.3 Program:  RangeAttrMap

Post a Comment

Attributes Map - 2

Last week I wrote about how you can use the Discrete Attributes Map to ensure that group values with specific names are represented in the graph with specific colors or other visual  attributes such as marker symbol or line pattern.

This attributes map also supports a special keyword "OTHER" which can be used to set the visual attributes for all other values that may be encountered in the data.  This is a very useful feature that can be used to highlight just the groups of interest in a large crowd.

In this example below, I have simulated a set of data for a large number of molecules, called "Drug-1" to 'Drug-100", along with three popular fictitious drug names "Astrin", "Typenol" and "Mostrin".   We want to see the response for these three, in relation to all the others.  We don't care about the names of the other molecules.  Here is the graph.

In this graph we have used GTL to define a discretre attributes map to assign specific visual characteristics to the three named drugs.  We have also defined a value OTHER that is used for all other names.

GTL Code Fragment:

discreteattrmap name='AttrMap';
  value 'Astrin' / lineattrs=(color=blue pattern=solid);
  value 'Typenol' / lineattrs=(color=red pattern=solid);
  value 'Mosarin' / lineattrs=(color=green pattern=solid);
  value OTHER / lineattrs=(color=cxe7e7e7 pattern=solid);
enddiscreteattrmap;

We have also populated the discrete legend directly from the discrete attributes map, so all defined values are displayed.

This exercise also exposed a few holes to us, which we will address in future releases.  Note, the legend has 1 pixel lines for the curves, even thought each series has higher line thickness.  This is because the values come from the attributes map, not from the series plot.  We'll have to address that (somehow).  Also, it would be nice if the VALUE statement accepted transparency.  Then, the OTHER curves can be made transparent, while the curves of interest can be kept fully opaque, thus avoiding the need to draw them last (see tip below).

Since the key is based on color, one way around is to use a fill color legend instead of a line color legend.  We can add FILLATTRS bundle to each VALUE statement, then use a DISCRETELEGEND with TYPE=FILL, as shown in the graph below.

 Tip:  You can do this trick when using the attributes map in the legend.  You cannot do this if the legend refers to the series plot.

Tip:  Be sure that the values of interest are drawn last, so the curves are rendered on top of the grey cloud.

Full SAS 9.3 GTL Code: DiscreteAttrMap

 

Post a Comment

Attribute Maps - 1

You created a graph of Response over Time by Severity where Severity has three levels, "Severe", "Moderate' and "Mild".  How do you ensure that "Severe" is always red in your graph, regardless of the data order?

Normally, when creating any graph with a GROUP role, the distinct group values are assigned the style elements GRAPHDATA1 - GRAPHDATA12 in the order the values are encountered in the data.  The first value gets GRAPHDATA1 and so on.

Here is what I get using a simple series plot:

SGPLOT code:

title 'Response over Time by Treatment';
proc sgplot data=SeriesGroup1;
  series x=date y=val / group=Severity lineattrs=(thickness=3);
  scatter x=date y=val2 / group=Severity markerattrs=(symbol=circlefilled size=11);
  xaxis display=(nolabel);
  yaxis label='Response';
  run;

In the data set, val2 has values only in every 30th observation, so we get scatter markers occasionally.

The data has observations with group="Mild" first , so it gets the first graph data element with the blue color for the default style.  Moderate is next, so it gets graph data 2 (red), and Severe gets graph data 3 (green).  These colors may not be ideal for the levels and we can derive a custom style.  But let us go with this for now.

Now, let us say the data for a different drug test only has Mild and Severe data.  Here is what you get:

In this graph, the series plot for Mild is still blue, but now the series plot for Severe is red instead of  green as it was in the first graph.  So, the colors used depend on the order of the data, and not on the value of the data.  More often than not, you would always want Severe to be the same color across all your graphs.  This can be fixed, but it requires some coding tricks.

The SAS 9.3 Discrete Attribute Map is the solution for this and other related issues.  You can use a discrete attributes map to define the attributes for a certain VALUE of the group variable.  So, you can say, for group=SEVERE, use linecolor=red.  By setting the visual attributes you want based on the value of the group variable, you can ensure that anytime "Severe" occurs in a group, it will be displayed using the line, fill or marker attributes you have defined.

With SG Procedures, you can define the discrete attributes map in a SAS data set by using specific column names and values to define the map.  Here is what a map looks like for our application:

The required columns are ID and VALUE.  Then, you need to provide the appropriate columns for the attribute you want to control.  We associate this attribute map with our plot to get this graph:

SAS 9.3 Code:

ods graphics / reset width=5in height=3in antialiasmax=1100 imagename='SeriesMap1';
title 'Response over Time by Treatment';
proc sgplot data=SeriesGroup1 dattrmap=AttrMap;
  series x=date y=val / group=Severity lineattrs=(thickness=3) attrid=Severity;
  scatter x=date y=val2 / group=Severity markerattrs=(symbol=circlefilled size=11)
        attrid=Severity ;
  xaxis display=(nolabel);
  yaxis label='Response';
  run;

Note, we have created a SAS data set called AttrMap, and provided that to the SGPLOT procedure using the DATTRMAP option.  Also, for the Series and Scatter plots, we have specified the ATTRID of "Severity" to associate the values with the map.  The AttrId allows us to multiple definitions for maps in the same data set.

In this graph, each value for Severity has been rendered with the color we specified.  Now, let us render this same graph with the data that has only Mild and Severe events.  Here is the graph:

Now, we only have plots for Mild and Severe, but they are rendered with the correct colors that we have specified.  Order of the data, or absence of some data does not have an adverse impact on our graph.  AttrMap also allows for value="OTHER".  Using that, you only need to define the values of interest to you.

Now, it would be nice if we could include all values from the Attributes Map into the legend.  Even though we have only Mild and Severe events, it would be nice to know that we can also have Moderate events.

This feature is under development for SG Procedures.  So, at this time to do this with SGPLOT, you will have to force all values to be present in the data.  However, GTL does have this feature, and here is the GTL graph and code to do this.

SAS 9.3 GTL Code:

proc template;
  define statgraph seriesMap;
    begingraph;
      entrytitle 'Response over Time by Treatment';
      discreteattrmap name='AttrMap';
	 value 'Mild' / lineattrs=(color=green) markerattrs=(color=green)  fillattrs=(color=green);
	 value 'Moderate' / lineattrs=(color=orange) markerattrs=(color=orange)  fillattrs=(color=orange);
	 value 'Severe' / lineattrs=(color=red) markerattrs=(color=red) fillattrs=(color=red);
      enddiscreteattrmap;
       discreteattrvar attrvar=AttrSeverity var=Severity attrmap='AttrMap';
      layout overlay;
        seriesplot x=date y=val / group=AttrSeverity lineattrs=(thickness=3);
        scatterplot x=date y=val2 /  markerattrs=(symbol=circlefilled size=11)
            group=AttrSeverity;
        discretelegend 'AttrMap' / title='Severity:' type=fill;
      endlayout;
    endgraph;
  end;
run;
proc sgrender data=SeriesGroup2 template=seriesMap;
run;

Note, in GTL, the discrete attributes map is defined inside the GTL code itself.   Support for an external data set based attributes map is coming soon.

SAS 9.4 Sneak Preview:  A cool new feature that applies to rendering of all curves and bar charts is the high resolution rendering done by setting the option SUBPIXEL=ON on the BEGINGRAPH statement.  Here is the graph rendered using SAS 9.4.  Note the smooth rendering of the plots.

In the next installment, we will discuss Range Attribute Maps.

 Full SAS 9.3 code:  DiscreteAttrMap

Post a Comment

Custom Box Plots

A frequent question we get from users is how to create a box plot with custom whiskers lengths.  Some want to plot the 10th and 90th percentile, while other want the 5th and 95th percentiles.  The VBOX statement in the SGPLOT  procedure does not provide for custom whiskers.  Also, unlike GTL, there is no parametric box plot statement, where you can provide your own statistics.

Here is a standard VBOX of mileage by Type grouped by Origin using the SGPLOT procedure.

SGPLOT Code:

proc sgplot data=sashelp.cars(where=(type ne 'Hybrid'));
  vbox mpg_city / category=type group=origin grouporder=ascending;
  yaxis grid;
  xaxis display=(nolabel);
  run;

How can we create a custom box plot with 10th and 90th percentile whiskers?  With SAS 9.3, we have a way to create a parametric box plot using the new HIGHLOW plot statement.

First we have to run the MEANS procedure to obtain the necessary statistics for mileage by Type and Origin as follows:

proc means data=sashelp.cars(where=(type ne 'Hybrid')) noprint;
  class type origin;
  var mpg_city;
  output out=CarsMeanMileage
         mean=Mean
         median=Median
         q1=Q1
         q3=Q3
         p10=P10
         p90=P90;
run;
 
data CarsMeanMileage;
  set CarsMeanMileage(where=(_type_ eq 3));
  drop _type_ _freq_;
run;

The HIGHLOW plot statement comes in two flavors:  TYPE=LINE (default) and TYPE=BAR.  The first creates a floating line from low to high, and the second creates a floating bar from low to high.  We will use a combination of these to create the graph:

SAS 9.3 SGPLOT Program:

proc sgplot data=CarsMeanMileage nocycleattrs;
  highlow x=type high=p90 low=p10 / group=origin groupdisplay=cluster
      clusterwidth=0.7;
  highlow x=type high=q3 low=median / group=origin type=bar
      groupdisplay=cluster grouporder=ascending clusterwidth=0.7
      barwidth=0.7 name='a';
  highlow x=type high=median low=q1 / group=origin type=bar
      groupdisplay=cluster grouporder=ascending clusterwidth=0.7
      barwidth=0.7;
  scatter x=type y=mean / group=origin groupdisplay=cluster
      grouporder=ascending clusterwidth=0.7 markerattrs=(size=9);
  keylegend 'a';
  yaxis grid;
  xaxis display=(nolabel);
  run;

Here are the details of this program:

  • The first high low plot of type=line (default) plots the whisker from P10 to P90.
  • The second high low plot of type=bar draws the upper quartile.
  • The third high low plot of type=bar draws the lower quartile.
  • The scatter plot draws the mean marker.
  • This graph looks very similar to the standard VBOX except for the whiskers and outliers.

Since this graph is made up of all "Basic" plots, we can overlay any other basic plot we may want to display other features.  In this example, we have added the display of the mean value above each mean marker.

In this example, we lightened the fill color by making it 50% transparent.  So, have to use two highlow line plots, one from P90 to Q3 and one from Q1 to p10.  Then, we added a label to show the value of the mean in each box.  The code is shown in the program file attached.

Finally, here is another sneak preview of a SAS 9.4 feature: Jittering.  We have received many requests on this topic  so jittering will be supported with SAS 9.4.  In the example below, I have created a custom box plot using the technique above, and then added display of all the values using jittering.  To do this, I have to merge the summary data with the original data.  I will write up a detailed article with the code once SAS 9.4 is released.

Markers are jittered on the category axis (in this case horizontal) when their Y value is within the tolerance level.  Darker regions indicate more markers.  The "Mean" value is shown with a square marker.

Full SAS 9.3 Code: BoxParm

 

 

 

Post a Comment

Controlling Point Labels on Series Plot

SG procedures and GTL use a collision avoidance algorithm to position data labels for a scatter or series plot.  This is enabled by default.  The label is preferably placed at the top right corner of the marker.  The label is moved to one of the eight locations around the marker to avoid collision with other markers or labels.  If a collision is still detected, then the label is moved a bit away from the marker.

While this works for sparsely populated graphs, it does not work very well for dense plots, as the markers start drifting away from the data point, and soon it becomes hard to associate the label with the marker.

This gets worse when used with a series plot, as the data labels try to avoid other labels and the series plot itself.  Also, for a busy series (real data), the data points are close to each other anyway, even if there is a lot of space elsewhere in the graph.  The full code is in the attached file.

Here is a graph of a sparse series plot with data labels.  Here we have only plotted 18 observations (every 5th one) from the original series plot that has 91 observations.

This case works well.  Every vertex in the series plot is labeled.  Some labels are moved to avoid collisions.

Here is the graph for the original series plot with all 91 data points:

title 'Open Defects by Date';
proc sgplot data=series;
  series x=date y=count / datalabel=count lineattrs=(thickness=2);
  xaxis grid display=(nolabel);
  yaxis grid;
run;

As we can see here, this series plot has too many data labels to position without collisions.  The markers are moved away from the data point, and soon it becomes pretty useless.

Disable Label Collisions:  There is a way to disable the label collision entirely by using the LABELMAX option on the ODS Graphics statement.  The option name is a little misleading.  This option does not set the maximum number of labels to be displayed, but rather the point at which collision avoidance is switched off.

ods graphics / reset labelmax=0;
title 'Open Defects by Date';
proc sgplot data=series;
  series x=date y=count / datalabel=count lineattrs=(thickness=2);
  xaxis grid display=(nolabel);
  yaxis grid;
  run;

In the graph above, the label collision is completely disabled by setting LABELMAX=0.  But the resulting graph is not very useful as the labels overwrite, creating a mess.

In my use cases with such data, what I need is really a way to label the local extreme points on the curve, and not necessarily all the points.  One could use a simplistic approach and only label every 5th point, but that would be less than satisfactory.

So, what I do is use a LOESS fit behind the scene to compute a fit line and a confidence band.  Then, if the distance of the data point from the predicted fit value is larger than a tunable factor of the band width, I label that point.  Else I skip is (set to missing).  I used a factor of 0.4, but you can adjust that.  The full code is included in the attached file.  Here is the graph with reduced labels.

Here, only some of the extreme points are labels.  I also draw a scatter marker at each labeled point, we know which one is labeled.

In the graph above, there are regions where the curve is less jaggy, and so no points are labeled.  Same would be the case if we have a smooth sine curve.  To ensure at least some points are labeled, I added code to label a point if the previous 10 points did not get a label.  Here is the graph:

title 'Open Defects by Date';
proc sgplot data=loessLabels noautolegend;
  series x=date y=depvar / datalabel=loessPluslabel lineattrs=(thickness=2)
         datalabelattrs=(size=8);
  scatter x=date y=loessPluslabel / markerattrs=(symbol=circlefilled size=9);
  scatter x=date y=loessPluslabel / markerattrs=(symbol=circlefilled size=5
          color=white);
  xaxis grid display=(nolabel);
  yaxis grid label='Defects';
  run;

Now, this looks pretty good to me.   All extreme points are labeled, and some points along the smooth curve are also labeled.  Label collision avoidance is enabled, so if the points get close, like 312 and 319, they are moved.  I also increased the size of the data labels.  I am sure you can macrofy the code for more efficient usage.

Note:  An improvement to the collision avoidance algorithm will be released with SAS 9.4.  A new "Simulated Annealing" based alternative algorithm is also in the works.

Full SAS 9.2 Program: Vertex_Labels

Excel Data File: Series

 

Post a Comment

Box with Scatter Overlay

A common request we have been often hearing is for display of the distribution of data as a box plot, along with some detailed information overlaid.  For example, one may have ratings data of all the hospitals in a region by different specialty, and you want to view this distribution by specialty, and also overlay on top of this the actual data point for some specific hospitals of interest.

Here, I have simulated such data for multiple hospitals (by ID) for six specialties such as Pediatrics, Nephrology, etc., with a rating from 0.0 - 1.0.  Two of the hospitals (ID=1 and 2) are of interest to us, and I have named them "County" and "Memorial".

Here is the graph, showing the box plots of Rating by Specialty, overlaid with the specific values for the two hospitals.  A horizontal box plot is used due to the long category value names.  Click on the graph for a larger view:

Here is the data snippet.  A format is defined that sets the hospital names for id=1 & 2 and missing for other ids.  The actual data has 20 different hospitals.  Rating2 is same as Rating for id=1 & 2.

Users of the SGPLOT procedure will know that you cannot overlay a basic plot like SCATTER on a HBOX.  So, to do this, we have to use GTL.  The code is quite straight forward, and SGPLOT users will recognize the similarities:

/*--Distribution of Hospital Ratings with Specfic Overlay--*/
proc template;
  define statgraph Rating_Overlay;
    begingraph;
      entrytitle 'Hospital Rating by Specialty';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
                       yaxisopts=(display=(ticks tickvalues)) ;
	boxplot x=cat y=rating / orient=horizontal;
	scatterplot y=cat x=rating2 / group=name name='a'
                      includemissinggroup=false;
        discretelegend 'a';
      endlayout;
    endgraph;
  end;
run
 
/*--Create graph--*/
proc sgrender data=rating template=Rating_Overlay;
run;

The GTL code is verbose, but it is well structured.  We always need the PROC TEMPLATE step, with a DEFINE STATGRAPH statement to create the named template.  The template has a LAYOUT OVERLAY container that contains the BOXPLOT and the SCATTERPLOT statements with the usual options.  Then, we need the PROC SGRENDER step to bind the data with the template to create the graph.

Another way would be to actually label the hospital names on the graph itself, and avoid the legend.  The DATALABEL option for the scatter plot is used to place the labels.  The labels are automatically moved a bit to avoid collision.  The code is included in the attached program file.

Finally, here is a sneak preview of the same graph using SAS 9.4 (to be released soon).

Note the  long category value "Gastro-Enterology" has been split on two lines using a split character.  This is a new features included with the SAS 9.4 release, allowing us to use a vertical box plot instead.  Also, we have used optional "Skins" to render the boxes and also the scatter markers.  The label positions have been fixed to "Top".  Click on the graph for full resolution view.  I will show you the SAS9.4 code as soon as it is released.

SAS 9.4 also includes a nifty new "Jitter" feature to place a large number of coincident markers.  More on this soon.

Full SAS 9.3 Code:  Rating

Post a Comment

Parametric Bar Charts

A parametric bar chart in SG Procedure and GTL parlance is a simplified version of the regular bar chart, where the data is assumed to be summarized prior to its usage inside the SG procedures or GTL.  So, multiple occurrences of the same category and / or group combination is not expected.  If this does occur, multiple "bar" elements are overdrawn in the same location.

The benefits of this are many, especially with the SG procedures, where the VBAR statement is very useful as it summarizes the data when multiple observations are encountered for the classifiers.  But this also restricts the ways in which it can be used in combination with other plot statements.  A VBAR statement can only be combined with other VBAR or VLINE that have exactly the same combination of category and group variables.   The same applies to the HBAR statement.

Let us use the data shown below for Sales, Cost, Profits and Units Sold by Product and Qtr.  The data set has multiple observations, one for each combination of Product and Qtr.

Here is a bar chart of Sales, Cost and Profit by Product.  The VBAR and VLINE statements are used, which will summarize the data for each product.

SAS 9.3 SGPLOT code:

proc sgplot data=revenue;
  vbar product / response=sales dataskin=gloss nostatlabel fillattrs=graphdatadefault;
  vbar product / response=cost dataskin=gloss nostatlabel barwidth=0.6 fillattrs=graphdata1;
  vline product / response=profit nostatlabel lineattrs=graphdata2(thickness=5 pattern=solid);
  yaxis offsetmin=0 display=(nolabel) grid;
  run;

The VBAR and VLINE statements summarize the data by Product, so the multiple observations for each quarter are handled by the statement.

If you want to be creative in the way the data is represented as bars, lines and markers, etc., you cannot use the VBAR statement as it does not allow combinations with basic plot statements like scatter.  However, you can use the VBARPARM statement, as long as you summarize the data yourself using the MEANS procedure or in some other way.  Once summarized, this same data can be represented more creatively using the VBARPARM statement.

This graph shows the information as a Bar Chart and Series Plot with Outlined Markers.  Bar Label display the Units Shipped at the top of each bar:

SAS 9.3 SGPLOT code:

/*--Summarize the data using Proc MEANS--*/
proc means data=Revenue sum noprint;
  class product;
  var Sales Cost Profit;
  output out=revenueSum
         sum(Sales Cost Profit Units) =  Sales_sum Cost_sum Profit_sum Units_sum;
  run;
 
/*--Build the label for Units Shipped--*/
data revenueSum2;
  label sales_sum='Sales' cost_sum='Costs' profit_sum='Profit' units_sum='Units';
  set revenueSum(where=(_type_ eq 1)) end=last;
  retain ymax 0;
  UnitLabel=cat('Units =', put(units_sum, comma8.0));
  ymax=max(ymax, sales_sum);
  if last then call symput("YMAX", ymax);
  run;
 
/*--Create graph--*/
proc sgplot data=revenueSum2;
  title 'Sales, Costs and Profits with Units Sold';
  vbarparm category=product response=sales_sum / dataskin=gloss datalabel=UnitLabel
            datalabelattrs=(size=8) fillattrs=graphdatadefault name='s';
  vbarparm category=product response=cost_sum / dataskin=gloss barwidth=0.6
           fillattrs=graphdata1 name='c';
  series x=product y=profit_sum / lineattrs=graphdata3(thickness=9 pattern=solid)
           transparency=0.4 name='p';
  scatter x=product y=profit_sum / markerattrs=graphdata3(size=15 symbol=circlefilled);
  scatter x=product y=profit_sum / markerattrs=(size=9 symbol=circlefilled color=white);
  yaxis offsetmin=0 display=(nolabel) grid values=(0 to 400000 by 100000);
  keylegend 's' 'c' 'p';
  run;

Once the data is summarized, you can use any combination of the parametric bar statement with other basic plots to create the graph.  Here is the same data displayed as a horizontal bar chart.  The units shipped are shown at the right side of the graph using the SCATTER statement.  The code is included in the file attached at the bottom:

In the graph below, we show the same data, this time showing the Units Shipped as another series plot mapped to the Y2 axis on the right hand side.  The color of the series plot and the Y2 axis are made the same to reinforce the association.  The code is included in the file attached at the bottom:

Note, the axis scaling for the Y (left) and Y2 (right) axis are independent of each other.  The Y axis has bar charts associated with it, and hence has a baseline of zero.  This is not the case for the Y2 axis.  However, it would be nice to set the values for the Y2 axis such that it does have the same zero baseline, and the tick values match the grid lines.  We can do that by setting the VALUES option appropriately.  The new graph is shown below and the code is included in the SAS program file attached below.

Conclusion:  If you summarize your data yourself by the required classifiers yourself, you can use the VBARPARM and HBARPARM in combination with all basic plot statements to build creative displays of your data.

Full SAS 9.3 Code: VBarParm

 

Post a Comment