Proportional Euler Diagram

The topic of VENN diagrams had come up a while ago.  At that time, I thought it may be interesting to build a proportional VENN diagram.  But, reading up on VENN Diagrams, I learned that VENN diagrams represent all intersections of N sets, regardless of whether there are actually any observations in one of the regions.  So, there did not seem any purpose to make a proportional VENN diagram, and maybe the term itself is an oxymoron.

Euler_30_20_0_SheenI was interested in a graphical representation of the number of different types of subjects in a study, say subjects with Diabetes, or Hypertension or both.   It turns out, Euler Diagrams do represent the real world data, and not all theoretical combination.  So, it would make sense to draw a Proportional Euler Diagram.

I started with the simple 2-Set case, as it seems achievable.  The results are shown on the right.  The values for N1, N2 and NI are also Euler_30_0_20_sheenshown in the footnote, along with the value of the convergence error.  The two special cases are shown on the right, and are straightforward.  Click on the graphs for a higher resolution image.

The two cases with intersecting circles are shown below.  For the first one, the numbers are such that the intersection point of the two circles lies in-between the centers of the two circles.   For the second case, the intersection lies to the right of the smaller circle.

Euler_30_20_10_NoneIn all cases, the radius of the larger circle is set to 10 (arbitrary), and I compute the area of the smaller circle proportional to the number of observations in the circles.

Here are the details of my program:

  • N1, N2 and NI are the number of observations in Set 1, Set 2 and intersection ONLY.
  • So, N1+NI is first circle, and N2+NI is the 2nd circle, and NI is the intersection.
  • Euler_30_10_30_PressN1 >= N2.
  • Special case #1 -> NI=0.  This means the two circles are non-overlapping.
  • Special case #2 -> N2=0.  This means the circle 2 is fully inside circle 1.
  • Case #3 -> the intersecting vertical line is between center 1 and center 2.
  • Case #4 -> the intersection vertical line is to the right of centers #2.

Here is the algorithm:

  • First, I assign v - height of the intersection above centerline = 1.
  • Compute the three different areas.
  • Compute the area per observation in each section.
  • Then, based on the ratio of ANI / AN1, I adjust v by the error ratio.  V is kept < r2.
  • I repeat this while the error is > 0.001 and number of iterations < N.
  • Now, if the error is still > 0.001, convergence is not reached and the intersection is to the right of the center 2.
  • Now, set v=0.99999*r2 and repeat the same computations above, with reducing v.

I assume convergence is reached, and based on this value of v, I compute the horizontal distance from center of each circle to the intersection, d1 and d2 and other numbers needed to plot the details.

I can use the ELLIPSEPARM or BUBBLE (RelativeScale=False)  statement to draw the plot.  However, SGPLOT procedure does not support these statements (not in the 80-20 range for simple plots).  So, I used GTL, with the BubblePlot because I wanted to use skins.

I made it into a macro, with three parameters N1, N2 and NI.  Skin is optional.  If you have a need for Proportional Euler Diagrams in your work, please chime in and let me know if this is useful to you.  Maybe you have made one of your own and I would love to hear how you went about solving for the intersection areas.

VENN diagram shapes for 2, 3, 4 and more sets are available on the web, would be possible to make these using EllipseParm statement for both circles and ellipses.

I plan to tackle the case of the 3 set Proportional Euler Diagram.  This same algorithm may not extend to this case.  I would love to hear your ideas.

Full GTL Macro program:  Euler_Bubble_Macro

Post a Comment

Graphs are easy in SAS University Edition

By now you have heard all about the SAS(R) STUDIO software that provides access to the power of SAS analytics in a Web browser.  The SAS(R) University Edition is also available free for higher education teaching, learning and  research.

This software includes ODS Graphics software for creating graphs.  You can use the familiar program window to write your own SAS data step and procedures.  An example of running your own SGPLOT program is shown in Robert's recent article on How to create a Histogram using the SAS University Edition.

Studio_UI_2Making graphs gets even easier in SAS Studio by using the graph tasks that are included with the software.  When you first launch the software, you will see the user interface shown on the right.  Click on the "Tasks" button on the left, and you will see a list of tasks by category.

Here I have highlighted the Tasks folder and the Graph subfolder under it.

Multiple graph tasks are available including Bar Chart, Bar-Line Chart, Box Plot, Histogram, Line Chart, Pie Chart, Scatter Plot, Series Plot and more.

Histogram_TaskEach of these tasks presents you with a form to set the data and various options as shown on the right.  Here, I have launched the "Histogram" task as shown highlighted in blue.  This starts the Histogram task.

Each task presents you with an easy to use visual interface to set the parameters and options necessary to make the graph.  These are all collected under two tabs - The Data tab and the Options Tab.  In the image on the right, the Data tab for the Histogram task is highlighted in yellow.

Each graph task allows you to provide the name of the data set and the required variables to create the graph.  In case of Histogram, you need to provide only one numeric variable for the Analysis Role.  In the example above, we have selected the SASHELP.CARS data set and the MPG_CITY column for graphing.

Histogram_ResultsOnce the required parameters are provided, you can submit the task by pressing the run button or 'F3'.  The task will render the histogram using the default settings for styles and present the graph to you in the Results window as shown on the right.

Each task also supports optional settings which are included under the "Options" tab.  These options can be used to customize your graph, including setting of titles, footnotes and  graph size.  In this case, I have set the graph size to 4" x 3" to fit the small region.

Each graph task generates the required SGPLOT code needed to render the graph.  This code is available in the Code window under the "Code / Results" tab.  This code is built and updated as you apply the settings in the Data and Options panels.  So, this is a good way to get started with learning the SGPLOT procedure.

The tasks cover many of the features available in the SGPLOT procedure, but not all.  So, you can cut and paste the code into the program window and customize it to your own needs.

You can learn more about creating graphs using SG procedures right here in this blog.  Learn all about the procedures themselves in the book on Statistical Graphics Procedures by Example.

 

 

 

Post a Comment

Swimmer plot

At PharmaSUG 2014 in San Diego, I had the pleasure of attending "Swimmer Plot: Tell a Graphical Story of Your Time to Response Data Using PROC SGPLOT", by Stacey Phillips.  In this paper, Stacey presented an interesting graph showing the effects of a study drug on patients' tumor size.

Swimmer_StaceyStacey says in her paper that often "investigators prefer to dig deeper and look at an individual subject’s pattern of response. A swimmer plot is a graphical way of showing multiple pieces of a subject’s response 'story' in one glance."    The final graph includes a bar showing the length of treatment duration for each patient, classified by the disease stage at baseline, one for each patient in the study.  Graph also includes indicators for the start and end of each response episode, classified by complete or partial response, and an indicator showing whether the patient is a "Durable responder".

Stacey uses a combination of HBarParm, Scatter and annotations to create this graph.  The annotation is used for adding the "Continued response" arrow, and for the display of the inner legend for decoding of the various symbols in the graph.

Along with many of the attendees of the presentation, I was impressed and intrigued by this visual.  I was curious if its creation could be simplified using some of the new features released with SAS 9.3.  In particular, I wanted to see if I could make this graph without any annotation.

DataSAS 9.3 includes some versatile plot statements and features to create graphs.  Two of these are the HIGHLOW plot and the Discrete Attributes Map for controlling the color of the group values.

The data used to create the graph is is eyeballed from Stacey's graph and shown above.  Updated graph is shown below. Click on the graph for a higher resolution image.

Swimmer_93Here are the features of this graph:

  1. This graph uses the High Low plot to draw the bar representing the duration of the response for each subject.
  2. The bar has a arrow on the right side to indicate a continued response.  This is explained in the 1st footnote.
  3. Each response episode is represented by start and end events joined by a line classified by the type of response - Complete or Partial.  Connecting the start and end event and using a common classification color groups these together as one event, and is easier for the eye to consume.  Continuing response does not have an end event on the right.
  4. All event lines and markers are included in the inner legend.

Swimmer2_94It is also possible to place the indicator for continued event into the key legend using a "TriangleRightFilled" marker in the graph.  This marker is drawn outside the plot region, but is included in the legend.  Some items in the legend are shown in grey, to indicate the meaning of the shape since the actual marker will have different colors in the graph based on other criteria.

The graph on the right uses SAS 9.4 with a few aesthetic features for bar skins and filled, outlined markers.  Note the shorter line segments in the legend.

Note, the marker for the right arrow in intentionally made bigger to match the right arrows of the HighCap of the HighLow plot.

SAS 9.3 Code:

footnote  J=l h=0.8 'Each bar represents one subject in the study.';
footnote2 J=l h=0.8 'A durable responder is a subject who has confirmed response for at least 183 days (6 months).';
proc sgplot data= swimmer dattrmap=attrmap nocycleattrs;
  highlow y=item low=low high=high / highcap=highcap type=bar group=stage fill nooutline
          lineattrs=(color=black) name='stage' nomissinggroup transparency=0.3;
  highlow y=item low=startline high=endline / group=status lineattrs=(thickness=2 pattern=solid) 
          name='status' nomissinggroup attrid=status;
  scatter y=item x=start / markerattrs=(symbol=trianglefilled size=8 color=darkgray) name='s' legendlabel='Response start';
  scatter y=item x=end / markerattrs=(symbol=circlefilled size=8 color=darkgray) name='e' legendlabel='Response end';
  scatter y=ymin x=low / markerattrs=(symbol=trianglerightfilled size=14 color=darkgray) name='x' legendlabel='Continued response ';
  scatter y=item x=durable / markerattrs=(symbol=squarefilled size=6 color=black) name='d' legendlabel='Durable responder';
  scatter y=item x=start / markerattrs=(symbol=trianglefilled size=8) group=status attrid=status;
  scatter y=item x=end / markerattrs=(symbol=circlefilled size=8) group=status attrid=status;
  xaxis label='Months' values=(0 to 20 by 1) valueshint;
  yaxis reverse display=(noticks novalues noline) label='Subjects Received Study Drug' min=1;
  keylegend 'stage' / title='Disease Stage';
  keylegend 'status' 'd' 's' 'e'  'x' / noborder location=inside position=bottomright across=1;
  run;

The part that I believe makes this version easier to consume is the continuity of the response events.  Joining the start and end events with a line segment, all having the same color as per the event classification allows the eye to see each event and its duration clearly.

The part I like best is the graph uses no annotation.

Full SAS 9.3 Code:Swimmer_93

 

 

 

Post a Comment

Grouped Timeline

Recently, a user posed a question on how to plot stacked frequencies on a time axis.  The data included frequencies of different viruses by week.  The data is modified to preserve confidentiality and is shown below.

DataThe user's first instinct was to use a bar chart with stacked groups.  This works for automatically computing frequencies by week and group and also stacked the group values.  Except, the x axis is made discrete and the bars are only drawn where data exists.  However, the user wants to see all weeks positioned correctly the x axis, with gaps where there is no data for some weeks.  The data starts in April 2013 and goes to March 2014, so plotting by week displays the data out of order.

Here is the graph, created using the bar chart.  The graph shows the frequencies for the two viruses by week, using stacked groups.  The data for week numbers 1-14 are listed first even though these actually for 2014.  The weeks are drawn as discrete values, and there are no gaps for weeks that are missing because the bar chart treats the Category axis as discrete.  However, the VBAR statement makes it easy to see the stacked frequencies.

Virus_BarChart

To get this kind of graph on a scaled time axis, one would need to use a Needle plot or a HighLow plot.  However, neither of these will automatically compute the frequencies by date and group for a stacked display.

HighLow_DataSo, I used the MEANS procedure to compute the frequencies by week and virus.  Then, I ran a data step by year and week to compute the low and high values for each virus in a given week.  I also compute a "date" value for each week of the year.  Here is the data set:

Now, I use the HighLow plot to draw the bar segments for each virus value by date.  The low and high values for each group segment are already computed.

HighLow_Timeline

proc sgplot data=stacked dattrmap=attrmap;
  format week 2.; 
  highlow x=dateOfWeek low=low high=high / group=virus name='a' type=bar
          lineattrs=graphdatadefault attrid=virus; 
  yaxis display=(nolabel) offsetmin=0 grid;
  xaxis display=(nolabel);
  keylegend 'a' / title='Virus' location=inside position=topright across=1;
run;

As you can see, the SGPLOT code is very simple:

  • We use a HighLow plot by dateOfWeek and GROUP=VIRUS.
  • We used the previously defined discrete attributes map for each virus name.
  • We set other details like legend and axis properties.

The user wanted to see the week values displayed, which can be easily done using the LOWLABEL option of the HighLow plot.

HighLow_Timeline_Label

The full SAS code is snown below, however, I cannot share the data as it is confidential.  You can see the structure of the data above and if you simulate similar data, you can run the code.

Full SAS 9.3 program (not including data): HighLow_Timeline

Post a Comment

Lab Values Panel

It was almost two weeks ago that I got started making a display for lab tests for a subject, based on a graph I saw on the web for an article on this blog.  KPI_Panel_Crop

This graph is a part of a larger panel display of the lab values for a subject.  The panel includes the display of multiple lab values, including a gradient range of the percentile values for the general population.  The lab value for the subject is shown in the box on the left and also in the gradient range.  The graph is shown on the left.

Cruise_Crop_SmallWhile working on this article, I ran into a few issues including the minor issue of a long planned vacation to Hawaii that included a cruise around the islands.  Suffice it to say, the the islands are fabulous, and the cruise lived up to all the expectations one can imagine.  Here is a picture I took of the boat, when anchored off Kona on the Big Island.

Then, it was time for PharmaSUG in San Diego.  The conference was a resounding success, and I had the opportunity to meet with SAS users interested in creating graphs using ODS Graphics.  The presentations were excellent, with users much more likely to be persuaded by the experiences of fellow SAS users rather than hearing from SAS staff.

Back from these two diversions, I finally got back to this project.  Here is the step by step progression to making this graph.

Data_PanelFirst, on the right is the data I gleaned from the web image, and with Rick's help, created this data set of the values in the graph.  Now, the expectation is that when you make such a graph, you have all the pertinent data in hand.  Note that each value V1, V2, etc. are for the 0, 25th, 50th, 75th and 100th percentile of the data.  Note, for all tests, the "better" numbers are on the left, and the "worse" numbers are on the right.  I use the column "Rev" to indicate the ranges are reversed, with higher numbers to the left.

Lipid_DashboardThis graph uses SAS 9.4 but all significant feature of the graph can be created using SAS 9.3.  Here is the simple graph showing the test, values and the percentile ranges.

For each test, the percentile values for the larger population are shown on the right, with the percentile values above the box, and actual test values inside the box.  The actual test value for the subject is also shown at the correct percentile location on each bar.

 

title 'Lipid Panel for Subject XXX-XX-XXXX';
proc sgplot data=Lipid noautolegend nowall noborder;
  highlow y=test low=low high=high / type=bar outline nofill barwidth=0.5 ;
  hbarparm category=test response=vn / barwidth=0.2  dataskin=gloss 
           fillattrs=(color=gray) nooutline baselineattrs=(thickness=0);

  scatter y=test x=vn2 / markerchar=v2 markercharattrs=(color=lightgray);
  scatter y=test x=vn3 / markerchar=v3 markercharattrs=(color=lightgray);
  scatter y=test x=vn4 / markerchar=v4 markercharattrs=(color=lightgray);

  scatter y=test x=vnl / markerchar=lvn1 markercharattrs=(color=gray) discreteoffset=-0.35;
  scatter y=test x=vn2 / markerchar=lvn2 markercharattrs=(color=gray) discreteoffset=-0.35;
  scatter y=test x=vn3 / markerchar=lvn3 markercharattrs=(color=gray) discreteoffset=-0.35;
  scatter y=test x=vn4 / markerchar=lvn4 markercharattrs=(color=gray) discreteoffset=-0.35;
  scatter y=test x=vnh / markerchar=lvn5 markercharattrs=(color=gray) discreteoffset=-0.35;

  scatter y=test x=vn / markerattrs=(symbol=trianglefilled size=12) discreteoffset=0.2
          filledoutlinedmarkers markerfillattrs=(color=white) dataskin=gloss;
  scatter y=test x=vn / markerchar=value discreteoffset=0.4 
          markercharattrs=(size=8 weight=bold);

  xaxis display=none offsetmin=0 offsetmax=0;
  yaxis display=(nolabel noticks noline);
  run;

The program above uses a HighLow plot to draw the box of ranges, and a scatter plot with markerchar option to display the percentile values above the box and the actual values in the middle.  An offset triangle marker is used to denote the percentile location of the actual value, and the value itself is displayed below the marker.

Lipid_Dashboard_Box_NameThe test names in the original graph are left aligned, and the values are displayed in a box next to the test name along with the units of the values.  I added this information using additional HighLow plots with HighLabel option to display the test name, the test value and the units.

The only unit that needs improvement is the "muMol/L", where it would be better to use the greek symbol for "mu".

title 'Lipid Panel for Subject XXX-XX-XXXX';
proc sgplot data=Lipid noautolegend nowall noborder;
  highlow y=test low=boxL high=boxH / type=bar nofill outline lineattrs=(color=black) barwidth=0.6;
  scatter y=test x=boxM / markerchar=value discreteoffset=0 markercharattrs=(size=8 weight=bold);
  scatter y=test x=boxM / markerchar=units discreteoffset=-0.4 markercharattrs=(size=7 color=gray);
 
  highlow y=test low=nameL high=nameH / type=bar nooutline barwidth=0.6 fillattrs=(transparency=1);
  scatter y=test x=nameL / datalabel=test datalabelattrs=(size=8 weight=normal) datalabelpos=right
          markerattrs=(size=0);
 
  highlow y=test low=low high=high / type=bar outline nofill barwidth=0.5 ;
  hbarparm category=test response=vn / barwidth=0.2  dataskin=gloss 
           fillattrs=(color=gray) nooutline baselineattrs=(thickness=0);
 
  scatter y=test x=vn2 / markerchar=v2 markercharattrs=(color=lightgray size=7);
  scatter y=test x=vn3 / markerchar=v3 markercharattrs=(color=lightgray size=7);
  scatter y=test x=vn4 / markerchar=v4 markercharattrs=(color=lightgray size=7);
 
  scatter y=test x=vnl / markerchar=lvn1 markercharattrs=(size=7 color=gray) discreteoffset=-0.35;
  scatter y=test x=vn2 / markerchar=lvn2 markercharattrs=(size=7 color=gray) discreteoffset=-0.35;
  scatter y=test x=vn3 / markerchar=lvn3 markercharattrs=(size=7 color=gray) discreteoffset=-0.35;
  scatter y=test x=vn4 / markerchar=lvn4 markercharattrs=(size=7 color=gray) discreteoffset=-0.35;
  scatter y=test x=vnh / markerchar=lvn5 markercharattrs=(size=7 color=gray) discreteoffset=-0.35;
 
  scatter y=test x=vn / markerattrs=(symbol=trianglefilled size=12) discreteoffset=0.2
          filledoutlinedmarkers markerfillattrs=(color=white) dataskin=gloss;
  scatter y=test x=vn / markerchar=value discreteoffset=0.4 markercharattrs=(size=8 weight=bold);
 
  xaxis display=none offsetmin=0 offsetmax=0;
  yaxis display=none;
  run;

Now, let us get to the display of the gradient green-yellow-red ranges in the display.  There is no plot statement in SG or GTL that can draw a gradient color across three colors.  Some plot statements support a Color Response option, but essentially the entire entity is rendered with the color derived from the color gradient.

Lipid_Dashboard_Grad_Name_ValueOnce again, we resort to using the versatile  HighLow plot to draw the gradient.  HighLow plot does not support a color gradient option, but does support a GROUP option that colors each segment with the group color from the style, or a Discrete Attributes Map.  Here, we will use the DAttrMap option of the SGPLOT procedure to draw the ramp.

We create Low and High columns for 100 HighLow segments for each test name.  Each segment is 1 unit, in a do loop from 0 to 99 by 1.  Each segment has an id - the loop variable.

We also create a DAttrMap data set, such that each value 1-99 has a corresponding color that gradiates from green to yellow to red.  See the code in the full program attached at the bottom.  The result is the gradient ranges as shown in the graph above.

Lipid_Dashboard_Grad_AnnoFinally, we use some simple annotations to add the information at the top of the graph.   Five observations in the SGANNO data set describe the way to draw the four text strings and the arrow object.

Once again, this exercise has exposed the need for some more features that will make this task easier such as support of ColorResponse for bar charts and Highlow plot.  We will look into adding such options in a future release.

The technique to creating such non standard and complex graphs using SG or GTL is to analyze the graph, and break it down in to its component parts.  Then use the appropriate plot statement "creatively" to build the graph l layer at a time.   Some details that cannot be done using plot statement can be handled by annotate.

Full SAS9.4 Code without the Gradients:  Lipid_Dashboard

Full SAS9.4 Code with Gradients: Lipid_Dashboard_Gradient

Full SAS9.3 Code with Gradients:  Lipid_Dashboard_Gradient_93

Post a Comment

Report from PharmaSUG 2014

Just getting back from PharmaSUG 2014 in San Diego.  The conference was great, both inside and outside.  The organizers ordered up some great weather for the Padres game and also for dinner on the flight deck of the Midway Carrier.

DG01_Time_To_Event_PanelOur focus here being on graphics, we were all extremely gratified by the presentations in the Data Presentation section.  Amos Shu got us started with graphs for Adverse Event timeline graphs and panels in his paper Techniques of Preparing Datasets for Visualizing Clinical Adverse Events.

Wu, Dai and Gau presented a Graphical Representation of Patient Profile for Efficacy Analyses in Oncology  with Efficacy Patient Profile graphs using GPLOT and ANNOTATE:

DG02_Efficacy

Mayur Uttarwar and Murali Kanakenahalli proposed Developing Graphical Standards: A Collaborative, Cross-Functional Approach to ensure the correct list of Symbols and colors for the plots in the graph.

DG07_SwimmerStacey Philips presented a Swimmer Plot: Tell a Graphical Story of Your Time to Response Data Using PROC SGPLOT, displaying disease stages for each subject with additional information on the events.

Kriss Harris presented Napoleon Plot for PharmaSUG and I Am Legend for PharmaSUG , presenting displays for assessing treatment safety, and ways to create just a legend, when the number of entries in the legend are too many to be included in one graph.

Jeffery Meyers presented Kaplan-Meier Survival Plotting Macro %NEWSURV which used the GTL layouts in creative ways to display loads of information in one plot or panel.

BB13_NewSurv

SP14_SurvivalWarren Kuhfeld presented ways to customize the popular Survival Plot graph created by the LIFETEST Procedure for SAS 13.1 using a combination of %ProvideSurvivalMacros, Customization macros, %CompileSurvivalTemplates to create the customized templates, and then run the LIFETEST procedure to produce the customized graph output.

 

DG14_GTL_LayoutsFinally, I presented my paper from SGF 2014 -Up Your Game with Graph Template Language Layouts using GTL layouts to create complex custom graphs.  This paper will get you started using the GTL layouts to go beyond the graphs you can create using the SGPLOT procedure.

As usual, PharmaSUG lived up to its reputation of taking care of its attendees by providing fabulous food for breakfast, lunch and dinners.  In addition to all the knowledge, I feel like I also gained 5 pounds.

For me, the highlight is always meeting and interacting with SAS users, who bring so much enthusiasm to the conference.  One quote that I took back to my team from a presentation was "Making graphs with SAS is FUN".  It is nice to get validation of our efforts to provide you the tools you need to easily create beautiful and effective graphs with SAS.

Post a Comment

The HIGHLOW Plot

DataArrowSG Procedures and GTL provide you with a large set of plot statements, such as BarChart, ScatterPlot, BoxPlot and more.  You can use them for the intended purpose, and all is well and good.  However, the real fun starts when you leverage a plot to do something that was not obvious.  One such plot that is designed to be used in creative and flexible ways is the HIGHLOW plot statement.

Let us start with the MEANS procedure to compute mean mileage values by Origin and Type.  We have added a few columns  as shown in the table above.

proc means data=sashelp.cars(where=(type ne 'Hybrid'));
  class origin type;
  var mpg_city mpg_highway;
  output out=carmeans(where=(_type_ &gt; 2))
    mean(mpg_city mpg_highway) = City Highway;
  run;

BarWe now have what we call as "Chart Ready" data.  We can create a bar chart using the SGPLOT procedure.  I used the VBARPARM statement since the data is already summarized.  The graph shows the mean highway mileage by Origin and Type.  The mileage value of each bar is displayed on top.  Click on the graph for a high resolution image.

 

SGPLOT using VBARPARM:

proc sgplot data=carmeans;
  vbarparm category=origin response=highway / group=type datalabel 
           outlineattrs=graphoutlines dataskin=matte;
  xaxis display=(nolabel noticks);
  run;

HighLowBarVertZeroExactly the same graph can be created using the HIGHLOW plot statement, introduced in both SGPLOT and GTL in SAS 9.3.  This statement can be used with one of these two syntax specifications for the SGPLOT procedure.

HIGHLOW X=<var> High=<num-var> Low=<num-var> / <options>;

HIGHLOW Y=<var> High=<num-var> Low=<num-var> / <options>;

In the first case, vertical bar segments are drawn from Low value to the High value for each value of X  .  In the second case horizontal bar segments are drawn from Low value to the High value for each value of Y .

SGPLOT using HIGHLOW Plot:

proc sgplot data=carmeans;
  highlow x=origin high=highway low=zero / group=type type=bar 
          groupdisplay=cluster highlabel=highway lineattrs=graphoutlines
          dataskin=matte;
  xaxis display=(nolabel noticks);
  yaxis offsetmin=0;
  run;

Note in the program above, we have used the following required parameters:

HighLow X=Origin High=Highway Low=Zero / Type=Bar GroupDisplay=Cluster 
        highlabel=high;

HighLowBarVertSetting LOW=zero makes sure all bar segments are drawn to the baseline in the graph shown above.  Using Low=City and High=Highway allows us to draw floating bar segments depicting the mileage range by Origin and Type.  We can use both the LowLabel and HighLabel to display both the low (City) and high (Highway) value for each bar.

 

 

 

HighLowBarHorz

Using the parameter Y instead of X, allows us to create horizontal bar segments as shown on the right.  In this case, we have enabled the drawing of horizontal bands to help visually cluster the groups within each category.

Another useful feature of the HighLow plot is the display of caps at the end of each bar.  This can be useful to indicate certain characteristics for each bar such as direction (increasing or decreasing) or a continuation of an event in either direction.

HighLowBarHorzCaps

The graph on the right displays a horizontal HighLow plot with caps.  The columns LOWCAP and HIGHCAP from the data set are used to display the caps.  The values are set in the columns when certain conditions are met.

In this case, a low cap is drawn for observations where City mileage is < 18 and a high cap is drawn when Highway mileage > 27.

 

HighLowBarHorzClipCaps

Another interesting feature is drawing of a "Clip Cap".  This feature automatically draws a cap to indicate the bar is clipped when the bar value exceeds the min or max value of the axis.

In the example on the right, x axis Max is set to 28.  We have used the option CLIPCAP which draws a special cap at the end of any bar segment that is clipped by the axis min or max value.  Here, we have drawn a reference line at x=28 to display the max setting.

SGPLOT code with Clip Caps:

proc sgplot data=carmeans;
  highlow y=origin high=highway low=city / group=type type=bar groupdisplay=cluster
          lowlabel=city highlabel=highway clipcap
          barwidth=1 clusterwidth=0.8 lineattrs=graphoutlines dataskin=matte;
  refline 28 / axis=x lineattrs=(pattern=dash);
  xaxis display=(nolabel) max=28 values=(12 to 30 by 4);
  yaxis display=(nolabel noticks) colorbands=even colorbandsattrs=(transparency=0.5);
  run;

 

HighLowLinePlotAllSo far we have seen the use of the HIGHLOW plot with TYPE=BAR.  This plot statement also supports TYPE=LINE (default).  This plot type is useful to display a stock plot and can also be used to display four response values per line.  The example on the right displays monthly stock values, showing the high, low, open and close values.

 

The HighLow plot can be used where ever you want to display some events of certain duration, such as a Schedule Plot or an Adverse Event Plot.  These examples are shown below.

Schedule_94_3

Combined_AE_CM_94

Full SAS 9.4 Program for High Low Plots:  HighLow

Post a Comment

Multi-Group Series Plots

The series plot is a popular way to visualize response data over a continuous axis like date with a group variable like treatment.   Here is some data I made up of a response value by date, treatment, classification and company that makes the drug.  The data is simulated as shown in the attached program (see bottom of article).

DataThe data includes the columns VALUE, DATE, DRUG, CLASS and COMPANY.  The columns LABEL and VALUEL are computed at every 5th observation per drug for labeling.

 

Series_94We can use the GTL SERIESPLOT to display Value by Date and Drug as shown on the right.  Click on the graph to see a higher resolution graph.  The drug name for each curve is displayed at the right end of the curve, and also in the legend below.  We could turn off the legend if needed.

If a GROUP variable is not provided, the entire data is plotted as one series.  When a GROUP variable is provided, the data is plotted as one curve for each group value.  Each curve gets the display attributes such as color and line pattern from one of the GraphData01 - 12 style elements, in the order the group values are encountered in the data.  Alternatively, one can also use a Discrete Attributes Map to assign specific color and line pattern values by group value.

Sample code shown here is for SAS 9.4.  While some new options may not work, the basic ideas discussed below also works at SAS 9.3 or earlier.

SAS 9.4 GTL code for series with group:

proc template;
  define statgraph Series;
    begingraph / subpixel=on;
      entrytitle 'Values by Date and Treatment';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
                       yaxisopts=(griddisplay=on);
	 seriesplot x=date y=value / group=drug name='a' lineattrs=(thickness=2) 
                    smoothconnect=true;
        discretelegend 'a' / title='Drug:';
      endlayout;
    endgraph;
  end;
run;
 
proc sgrender data=SeriesGroup template=Series;
run;

SeriesLabel_94Note, the curve labels drawn at the end can get cluttered, as happened above for groups B and C.  To improve this situation, we can label each curve along its length at frequent intervals.  We do this by using the columns LABEL and VALUEL, which have non missing values at every 5th observation per group.

We can use these columns to overlaying a scatter plot with the marker character option.  To reduce clutter of overlaid text, we add a white marker behind each letter.  We discussed such ideas in Labeled Curves.

SAS 9.4 GTL code for curves with inline labels.

proc template;
  define statgraph SeriesLabel;
    begingraph / subpixel=on;
      entrytitle 'Values by Date and Treatment';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
                       yaxisopts=(griddisplay=on);
	seriesplot x=date y=value / group=drug name='a' lineattrs=(thickness=2) 
                   smoothconnect=true;
	scatterplot x=date y=valueL / group=drug 
                     markerattrs=(symbol=circlefilled color=white size=10);
        scatterplot x=date y=valueL / group=drug markercharacter=label; 
        discretelegend 'a' / title='Drug:' itemsize=(linelength=15px) 
                    location=inside across=1 halign=right valign=top;
       endlayout;
    endgraph;
  end;
run;

In the program above, we have also used ATTRPRIORITY=COLOR on the ODS GRAPHICS statement to delay the use of patterns till after all colors are exhausted.  See attached full program.  This option makes all regular styles behave like the HTMLBLUE style.  Each group is rendered by a different color from the style, using a total of four colors.

SeriesLineColorGroup_94Now, we want to be able to group the curves for each drug by another grouping variable like the drug class.  I assigned two classes "NSAID" and "Opioid".  Since each curve is labeled by the name of the drug, we want to use the color to depict the class of the drug.  We can do this by using a secondary group role called LINECOLORGROUP.  The graph is shown on the right where each curve is now colored either blue or red based on the drug class.  The legend contains a color swatch with its value.

SAS 9.4 GTL code for line color by a group:

proc template;
  define statgraph SeriesLineColorGroup;
    begingraph / subpixel=on;
      entrytitle 'Values by Date, Treatment and Class';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
                       yaxisopts=(griddisplay=on);
        seriesplot x=date y=value / group=drug name='a' lineattrs=(thickness=2) 
                   smoothconnect=true linecolorgroup=class;
        scatterplot x=date y=valueL / group=drug 
                  markerattrs=(symbol=circlefilled color=white size=10);
        scatterplot x=date y=valueL / group=drug markercharacter=label
                  markercharacterattrs=graphdatatext; 
        discretelegend 'a' / title='Drug Class:' type=linecolor location=inside 
                  across=1 halign=right valign=top;
      endlayout;
    endgraph;
  end;
run;

Note the features of the graph above:

  • We have labeled each treatment curve by its own label, so no need for a legend for this case.
  • We have assigned the color for each curve by a secondary group variable CLASS.
  • We have used a Discrete Legend of TYPE=LINECOLOR.  This displays only color swatchs.
  • The only requirement here is that the GROUP variable must be the lowest grouping factor for each curve.  The LINECOLORGROUP value must remain the same for all obs with same GROUP value.

The good news here is that LINECOLORGROUP has been available in GTL SERIESPLOT all along since SAS 9.2.  It is used by the POWER procedures, but the feature was tested only for the POWER procedures' use cases.  Hence, we did not feel confident we could document this feature as ready for general use.  Now, after hearing multiple users express the need for such use cases, we felt it was necessary to release this as production.  Now this feature has been well tested, and no problems have been found.  So, we feel the risk-to-reward ratio is in favor of exposing this feature to you.

In addition to LINECOLORGROUP, you can also use LINEPATTERNGROUP, MARKERCOLORGROUP and MARKERSYMBOLGROUP.  Each one can be used with the group variable and this value should not change withing a GROUP value.

SeriesLineColorPatternGroup_94

In the graph on the right, I have used COMPANY as the LINEPATTERNGROUP.  Now, each drug is colored by its CLASS and patterned by the COMPANY.  I have also added a discrete legend of TYPE=LINEPATTERN.  Both these legends are wrapped inside a LAYOUT GRIDDED and placed at the top right of the cell.

SAS 9.4 GTL code for series with line color and line pattern groups:

proc template;
  define statgraph SeriesLineColorPatternGroup;
    begingraph / subpixel=on;
      entrytitle 'Values by Date, Treatment, Class and Company';
      layout overlay / xaxisopts=(display=(ticks tickvalues))
	                         yaxisopts=(griddisplay=on);
        seriesplot x=date y=value / group=drug name='a' lineattrs=(thickness=2) 
                  smoothconnect=true linecolorgroup=class linepatterngroup=company;
	scatterplot x=date y=valueL / group=drug 
                  markerattrs=(symbol=circlefilled color=white size=10);
        scatterplot x=date y=valueL / group=drug markercharacter=label
                  markercharacterattrs=graphdatatext; 
        layout gridded / halign=right valign=top columns=2 columngutter=5;
          discretelegend 'a' / title='Drug Class' type=linecolor location=inside 
                        across=1 halign=right valign=top;
          discretelegend 'a' / title='Company' type=linepattern location=inside 
                        across=1 halign=right valign=top itemsize=(linelength=30);
	 endlayout;
      endlayout;
    endgraph;
  end;
run;

Note the features of the graph above:

  • We have labeled each treatment curve by its own label, so no need for a legend for this case.
  • We have assigned the color for each curve by a secondary group variable CLASS.
  • We have assigned the pattern for each curve by a secondary group variable COMPANY.
  • We have used a Discrete Legend of TYPE=LINECOLOR.  This displays only color swatches.
  • We have used a Discrete Legend of TYPE=LINEPATTERN.  This displays patterns without color.
  • The only requirement here is that the GROUP variable must be the lowest grouping factor for each curve.  The LINECOLORGROUP and LINEPATTERNGROUP variables must remain the same for all obs with same GROUP value.

While you can display many different classifications in the graph at the same time, the graph can become complex very quickly.  You can  turn on the display of the markers for the series plot, and then control the visual attribute of the markers using MARKERCOLORGROUP and MARKERSYMBOLGROUP.

In the process of making the graphs for this article I noticed the lack of a way to make the scatter markercharacter color by group, to match the color of the drug names to the line when using LINECOLORGROUP.  There is no matching MARKERCOLORGROUP in the SCATTERPLOT.  I will see what we can do about that.  Please chime in with your comments and observations.

I certainly look forward to see the ways in which you can leverage these features.

Full SAS 9.4 Code:  MultiGroup_94

Full SAS 9.3 Code:  MultiGroup_93

Post a Comment

Labeled curves

Often, the topic of an article is motivated by a question from a user.  A satisfactory resolution of the situation is usually a good indication of a topic that may be of interest to other users.  On such question was posed to me by a user this weekend.  He wanted to display fit curves in a graph by group with the curve labeled all along its length by a one letter identifier.

CurveRegThis seems like a useful way to label a curve, as sometimes placing a label at just the end of the curve can be less than optimal.  When using a simple series plot, this is straightforward, and a short, one or two character label can be placed at intervals along the series.

But, what if the curves are fit plots, say a PBSpline or a regression?  Now, the plotted data is not the same as the original data.   Here is an example of curves of Mileage by Horsepower by Type.  We have used DEGREE=3 just for illustration.  The legend provides the decoding information, but it is less than ideal to refer back and forth to the legend.  Click on the graph for a higher resolution image.

SAS 9.3 SGPLOT Code:

title 'Mileage by Horsepower and Type';
proc sgplot data=cars_type_label_hp;
  reg x=horsepower y=mpg_city / degree=3 nomarkers group=type name='s';
  keylegend 's' / title='';
  run;

CurveRegScatter
Now, we want to add a short code representing the vehicle type at equal intervals along the curves.  First, we extract the label to be displayed as a 2 character abbreviation of the type.  Now we use the Scatter plot with marker character option to display the short label at each observations:

SAS 9.3 SGPLOT Code:

title 'Mileage by Horsepower and Type';
proc sgplot data=cars_type_label_all_hp;
  reg x=horsepower y=mpg_city / degree=3  nomarkers group=type name='s';
  scatter x=horsepower y=mpg_city /  group=type markerchar=label;
  keylegend 's' / title='';
  run;

Clearly, this is not acceptable, as every original observation is labeled, creating a cloud of labels around each fit plot.  What we need are the observations that are used to draw the fit curves, and not the original observations used to create the fit curves.  Now, the points to draw the fit lines are internally generated by the procedure, and not directly available to us.  How to do this?

To do this, we have to use a two-pass process.  First, we run the SGPLOT procedure to draw the fit curves, and also request output of the generated data using the ODS OUTPUT data set as follows:

SAS 9.3 SGPLOT Code:

ods output sgplot=RegData;
title 'Mileage by Horsepower and Type';
proc sgplot data=cars_type_label_all_hp;
  reg x=horsepower y=mpg_city / degree=3 nomarkers group=type name='s';
  scatter x=horsepower y=mpg_city / group=type markerchar=label;
  keylegend 's' / title='';
  run;

Note the use of the statement ODS OUTPUT SGPLOT=RegData.  This statement outputs the generated data to the data set name RegData.  This data set has the generated fit data points in addition to the original data.  The variable names are often long and convoluted. This is so the new generated names do not collide with the original column names, and are known to the renderer which generated columns to use to plot the data.  Such as: "Regression_Horsepower_mpg_cit__x" and so on.  See the generated data set for the generated variable names and values.

CurveRegLabelFor ease of use, we rename these generated column names to something simple like X, Y and Group.  Now, we know the data used to plot the curves, so we can use this data to display the 2 character code along the curve.  We ensure the data is sorted by Group and Horsepower, and create a 2 character code for every 30th observation.  Then, we use the scatter plot with marker character to plot the labels.

SAS 9.3 SGPLOT Code:

title 'Mileage by Horsepower and Type';
proc sgplot data=RegCurves;
  reg x=x y=y / degree=3 nomarkers group=group name='s';
  scatter x=x y=y / group=group markerchar=label2;
  keylegend 's' / title='';
  run;

CurveRegLabel_2While we have achieved what the user wanted, but the overlaid curve labels look a bit cluttered.  One way to improve the appearance would be to add small scatter markers at each location, and draw the label inside it, as shown below.

Now each abbreviated label is clearly visible, and the graph does not look cluttered.

SAS 9.3 SGPLOT Code:

title 'Mileage by Horsepower and Type';
proc sgplot data=RegCurves;
  reg x=x y=y / degree=3 nomarkers group=group name='s';
  scatter x=x y=y2 / group=group markerattrs=(size=14 symbol=circlefilled)
                     filledoutlinedmarkers markerfillattrs=(color=white);
  scatter x=x y=y2 / group=group markerchar=label2 
                     markercharattrs=(size=5 weight=bold);
  keylegend 's' / title='';
  run;

Full SAS 9.3 SGPLOT code: CurveLabels

Post a Comment

The BLOCK Plot

When you hear of a Scatter Plot or a Series Plot, you have a picture in your mind what we are talking about.  But one of the plot statements available in GTL, and soon with SGPLOT, is the BLOCK plot.  I am sure this leaves many users scratching their heads, wondering what in heaven's name is a BLOCK plot?  So, in this article we will shed some light on this unique and useful plot.

BlockDataMiss The block plot is a one dimensional plot with the syntax as shown below:

BLOCKPLOT X=var BLOCK=var < / options>;

For the data set shown on the right, we will use X=DATE and BLOCK=WINDOWS.  The plot will create contiguous horizontal rectangular "blocks" along the x axis while the block variable value is the same.  So, in this case, a rectangular block will be created from '01Jun1990' to '01Sep1995' with the block value of "3.0".  Then, when the new block value of "95" is encountered, a new block will be created while the block value stays as "95" till '01Jul1998', when it changes to the new value.

Thus, the BlockPlot statement creates such horizontal blocks and displays them in the plot.  Now, for convenience, a missing value can be considered to be a continuation of the previous block value.  So, the column  "WINMISS" would have a similar result.  As you can imagine, the plot needs the X variable to be sorted.

BlockThe graph shown on the right is created using this data.  For each contiguous range along the x axis where the block variable is the same, a block is displayed in the graph.  Each successive block gets the attributes from the GraphData1 to GraphData12 style elements.  The GTL code for this is shown below.

SAS 9.3 GTL code:

/*--Basic BlockPlot--*/
proc template;
  define statgraph Block;
    dynamic _display _type;
    begingraph;
      entrytitle 'Windows OS Releases';
      layout overlay / xaxisopts=(timeopts=(minorticks=true));
        blockplot x=date block=windows / display=_display 
                  valuehalign=center filltype=_type;
      endlayout;
    endgraph;
  end;
run;
 
/*--Basic BlockPlot--*/
proc sgrender data=Windows template=Block;
run;

Note, in the program above, we have provided for a couple of dynamics "_DISPLAY" and "_TYPE" that are not defined in the SGRENDER step so far.  So, these options are ignored, as if they are not even coded.  This results in the most basic of block plot outputs.

BlockValuesAltClearly, displaying the "Block" values in each block is often useful, and this can be enabled by specifying the "Values" in the DISPLAY option.  Additionally, the plot supports a different fill type called "Alternate".  In this case, instead of using a unique color per block, the blocks are drawn using alternating colors as shown in the graph on the right.  Here is the code for this graph, using the same template we have define above.

SAS 9.3 GTL code:

/*--Block Values Alternate colors--*/
proc sgrender data=Windows template=Block;
  dynamic _display='fill values' _type='Alternate';
run;

In the use case above, we have set _DISPLAY='fill values' and _TYPE to 'Alternate'.  ValueHAlign is set to CENTER and ValueVAlign is CENTER by default.  So the block values are displayed at the center of each block.  The blocks now get alternating colors.

BlockValuesAttrsTransIn the alternating color band case, the attributes of the bands can be set using the FillAttrs and the AltFillAttrs option.  As for all fill attributes, transparency can be used inside the fill attribute.  In the example on the right, we have used a pink color in the FillAttrs, and set the AltFillAttrs to fully transparent, so whatever is behind it will show through.  In this case, the wall.  We have also move the value labels to the top of the block using the ValueVAlign option.

Normally, the Block Plot is used in conjunction with some other plot statement.  In the example below, we have used this same data along with a SERIES plot of the monthly closing value for the Microsoft stock price from the SASHELP.STOCKS data set.  The data in the stocks data set only goes up to about April 2005, so I have restricted the data to that range.

BlockStockFirst, we extract the data for STOCK='Microsoft' from the SASHELP.STOCKS data set and sort it by Date.  Then, we merge the block data with the stock data by date.  See the attached program for full details.  Now, we add the SERIESPLOT statement to the GTL template to create the graph on the right.

Note the use of EXTENDBLOCKONMISSING to allow missing values to be used a continuation of previous block value.

SAS 9.3 GTL code for Stock Plot with Blocks:

/*--Block and Series Overlay plot--*/
proc template;
  define statgraph BlockStock;
    begingraph;
      entrytitle 'Microsoft Stock Price with Windows OS Releases';
      layout overlay / xaxisopts=(timeopts=(minorticks=true) 
                                  display=(ticks tickvalues));
        blockplot x=date block=windows / display=(fill values) 
                  valuehalign=center valuevalign=top 
                  filltype=alternate altfillattrs=(transparency=1)
                  extendblockonmissing=true;
        seriesplot x=date y=close / lineattrs=graphfit;
      endlayout;
    endgraph;
  end;
run;
 
/*--Block and Stock--*/
proc sgrender data=Series (where=(date &lt; '01apr2005'd)) template=BlockStock;
run;

As we saw in the previous examples, when a block plot is placed in a LAYOUT OVERLAY, the plot fills the entire height of the overlay region inside the axes.  The width of each block is determined by contiguous values of the BLOCK role.  That is why we call this a one-dimensional plot.

If multiple block plots are overlaid, each will fill the full height, and the last one will over write the previuous.  Using transparency for the FillAttrs can help in such cases.  However, this plot is one of the few the can also be placed in the INNERMARGIN of a layout overlay.   The INNERMARGIN is a region at the bottom of each overlay container.

BlockInnerWhen placed in the inner margin, this plot occupies only the height needed to accommodate the value of the block, about the height of the font.  So, each block plot occupies only a small part of the wall, and multiple block plots are STACKED and not overlaid.  This allows you to see all the blocks as shown in the graph on the right.  Here, we are displaying the release dates of the OS for both Windows and Mac.  Note the class values displayed on the left.  The GTL program for this graph is shown below.

SAS 9.3 GTL code for Block Plot in Inner Margin:

/*--Block plot with Inner Margin--*/
proc template;
  define statgraph BlockInner;
    begingraph;
      entrytitle 'Windows and Mac OS Releases';
      layout overlay / xaxisopts=(timeopts=(minorticks=true));
        innermargin;
	 blockplot x=date block=windows / display=(fill values label) valuehalign=center 
                  valuevalign=top filltype=alternate altfillattrs=(color=_color) 
                  fillattrs=graphdata1 extendblockonmissing=true valueattrs=(size=7);
	  blockplot x=date block=mac / display=(fill values label) valuehalign=center 
                  valuevalign=top filltype=alternate altfillattrs=(color=_color) 
                  fillattrs=graphdata2 extendblockonmissing=true valueattrs=(size=7);
        endinnermargin;
      endlayout;
    endgraph;
  end;
run;

BlockClassSeriesFinally, block plots also support the CLASS role.  This is similar to the GROUP role, except in this case a separate strip of block plot is created for each class value and stacked on the previous, not overlaid.  This behavior is both in the INNERMARGIN and in the layout itself.

In the graph on the right, we have changed the multi-variable data for Windows and Mac into a grouped data structure using the variable GROUP that has "Windows" or "Mac" and a REL variable that contains the release name, such as "WIN7" or "OSX".  Then, we merged this data with the stock data as before, and created this graph using CLASS=GROUP.

SAS 9.3 GTL code for Block Plot with CLASS:

/*--Group data with series--*/
proc template;
  define statgraph BlockClassSeries;
    dynamic  _color _trans;
    begingraph;
      entrytitle 'Microsoft Stock Price with OS Releases';
      layout overlay / xaxisopts=(timeopts=(minorticks=true) 
                                 display=(ticks tickvalues));
	 blockplot x=date block=rel / class=group 
                  display=(fill values outline) valuehalign=center valuevalign=top 
                  includemissingclass=false filltype=alternate 
                  altfillattrs=(color=_color) outlineattrs=(color=gray) 
                  extendblockonmissing=true valueattrs=(size=7);
        seriesplot x=date y=close / lineattrs=graphfit;
      endlayout;
    endgraph;
  end;
run;

Just as we did with the stock plot data, the block plot can be used to display the number of subjects at risk for a survival plot, or some other clinical graphs where it is important to display textual data that is axis aligned with the plot.  While the ability to draw axis aligned text is now available using the new AXISTABLE (SAS 9.4), the Block Plot can be effectively used to display segments over a linear or time axis, such as severity of an adverse event or more.  We look forward to seeing creative usage of this unique plot in your graphs.

Full SAS 9.3 code for all the examples:  BlockPlot

Post a Comment