Legend Order in SGPLOT Procedure

This article is by guest contributor Lelia McConnell, SAS Tech Support.

Several users have called recently to ask the question, “Can I reorder the legend entries on the bar chart that I created with PROC SPLOT?”

Although there is no option that does this directly in PROC SGPLOT, the answer to this question is “YES, you can define the order of your legend entries.”

Graph_1In this post, I present an example that illustrates the syntax that you would use to define the order of your legend entries.  For this example, we begin by sub setting the data set SASHELP.CARS to include only observations in which TYPE is not equal to HYBRID.  To do this, we use the WHERE option in the SET statement, thus creating the data set CARS.

By bringing the CARS data set into PROC SGPLOT, I can create a vertical bar chart of the values of ORIGIN, where the height of the bars is based on the mean values of the variable MPG_CITY and the group variable is TYPE.

data cars;
  set sashelp.cars(where=(type ne 'Hybrid'));  
proc sgplot data=cars;
  vbar origin / response=mpg_city  group=type 
  groupdisplay=cluster stat=mean ;
  xaxis display=(nolabel);
  title 'Mileage by Origin and Type';

Instead of the default order of SUV, Sedan, Sports, Truck, and Wagon, I want the order of the legend entries to be Wagon, Sports, SUV, Truck, and Sedan.  To make this change, I create a numeric variable that contains the values 1-5, based on the order in which I want my vehicle types to be displayed.

I need to create a format in order to display the values of TYPE in the legend instead of 1-5.  The most efficient way to do this is to create a control data set that I can use with PROC FORMAT.  Since I need only one observation for each value of MYTYPE in this data set, I will sort the data by MYTYPE so that I can use the FIRST logic in the DATA step that follows.  In the DATA step, I need to create the columns FMTNAME, START, and LABEL.  These are used to define the format name, original value, and format values, respectively.  When you create a format, the automatic variable TYPE defines the variable as numeric or character, so we need to include the statement DROP TYPE to remove the variable TYPE from the control data set.

Graph_2The CNTLIN option in the PROC FORMAT statement specifies the SAS data set from which PROC FORMAT builds the format.

Now I can resubmit my original PROC SGPLOT code along with the FORMAT statement to create the legend in the correct order and the KEYLEGEND statement with the TITLE option in order to keep the original title in my legend.

data newcars;
set cars;
if type='Wagon' then mytype=1;
else if type='Sports' then mytype=2;
else if type='SUV' then mytype=3;
else if type='Truck' then mytype=4;
else if type='Sedan' then mytype=5;
proc sort data=newcars out=sortcars;
by mytype;
data myfmt;
set sortcars;
  by mytype;
  if first.mytype then do;
    drop type;
proc format cntlin=myfmt;
proc sgplot data=newcars;
   vbar origin / response=mpg_city group=mytype 
                 groupdisplay=cluster stat=mean;
   xaxis display=(nolabel);
   keylegend /title='Type';
   format mytype typefmt.;
   title 'Mileage by Origin and Type';

Full SAS 9.3 SGPLOT code:  Legend_93

Post a Comment

Overlay Bar Charts

A couple of days back, Rick Wicklin forwarded me a link to an article on the BadHessian Blog on creating a Bar Chart using six different freeware packages in R, Python and Julia.   The target bar chart was one produced by the Jetpack stat module with WordPress.  The graph is shown below.


The unique feature of this graph that had caught the eye of the author was the overlaying of two bar charts, one within the other.  The author's goal was to investigate the capabilities of other graphics packages to create a similar graph, such as R base graphics package, GGPLOT2, Python - Matplotlib, Python - Seaborn, Julie - Gadfly and Julia - Plot.ly.

As users of SAS SG Procedures and GTL are aware, such graphs are very easy with the SGPLOT procedure, and examples of such graphs have been shown in this blog and in other places.  Here is the same graph created using the SGPLOT procedure.


SAS 9.3 SGPLOT program:

proc sgplot data=visits nowall noborder;
  styleattrs datacolors=(%rgbhex(140, 185, 202) %rgbhex(19, 85, 137));
  vbar month / response=views nostatlabel nooutline;
  vbar month / response=visitors nostatlabel barwidth=0.5 nooutline;
  keylegend / location=outside position=topright noborder valueattrs=(size=5);
  xaxis fitpolicy=thin display=(nolabel noticks) valueattrs=(size=6 color=gray);
  yaxis grid display=(noline noticks nolabel) valueattrs=(size=6  color=gray);

The %RGBHEX macro is supplied by Perry Watts, and converts a RGB value to CX color value.  It is included in the attached full code.  Many options used here are needed to make the graph visually similar to the original, and are not necessary if one was to accept the default settings for the procedure.  That would reduce the code by a large fraction.

The author of the post has set the X axis spacing of 5 months.  The reason for this is not clear, maybe it is to allow different months to be displayed.   For a discrete axis, SGPlot will try to show all the values on the axis, unless they don't fit cleanly.  Then, as in this case, the values are thinned symmetrically.  If the axis was numeric with a time format, you will get thinned axis tick values.

The author mentions a preference for the outer Y grid line (for Y=10000), and has made an extra effort to include this in the graphs.  For SGPLOT, the preferred default is to include a tick value outside the data range only if the extreme data point goes beyond 30% of the tick interval with inner ticks.  In this case, since the data does not seem to go very much past 8000, the tick value at 10000 is not shown by default.  This prevents wasteful white space outside the data.  Of course this can be changed to produce an outer tick value if a user really wants it using the Threshold option.

SGPLOT has a way to customize the tick values one wants to see on the discrete axis using TickValueList and TickDisplayList.  However it is clear we could use a simpler option to do this.  This can be useful when the discrete data has sequential numeric, time or some other predictable values.

Another noteworthy item in the SGPLOT graph is the outline on the color swatches in the legend.  This is done to allow swatches of very light color to be visible.  However, a case could be made to provide an option to suppress the outline to match the bar.

Users looking for a bit more aesthetic rendering can use skins and gradients without distorting the data as shown below.


For graphs with a smaller amount of data, it may be desirable (based on individual preference) to offset the two bars by a small amount to show overlapped bars.  This too is easily done with SGPLOT procedure by using the DiscreteOffset option as shown in the graph below.


Full SAS 9.3 Program:  Bars

Post a Comment

Group order in GTL

This post could be titled something like "Everything you wanted to know about Group Order in GTL - and more."   The group ordering shows up in three different ways in your graph.

  1. Assignment of attributes (color, marker symbol) to group values.
  2. Position of group values in the graph.
  3. Display of the group values in the legend.

Unique group values are assigned their visual attributes (color, marker symbol) from the GraphData1 - GraphDataN elements defined in the active style.  Most SAS shipped styles like LISTING or HTMLBLUE have 12 group elements.

Starting with SAS 9.3M1, assignment of attributes to group values is based only on the order in which these group values are present in the data  So, if the group value "B" is encountered first in the data, it gets the attributes from GraphData1 style element, and so on.  This happens even if an observation cannot actually be drawn in the graph due to other other reasons.  Also, missing group values get the "GraphMissing" element, and do not impact assignment of the other non-missing values.

I have created all graphs for this article using the GTL code shown below to keep the discussion simple.  The SGPLOT procedure uses the SAS summary object to compute some statistics under the covers.  This can change the order of the data received by the graph renderer.  Here is the GTL program.

proc template;
  define statgraph bar;
    dynamic _resp _footnote _order;
      entrytitle 'Mileage by Origin and Type';
      entryfootnote halign=left "Data Set = " _footnote;
      layout overlay / xaxisopts=(display=(tickvalues));
	barchart category=Origin response=_resp / group=Type groupdisplay=cluster 
                 stat=mean outlineattrs=graphdatadefault name='a' ;
        discretelegend 'a' / sortorder=_order;

Note, _resp, _footnote and _order are dynamics so we can use the same GTL template to produce all the graphs in this article.  These dynamics are set in the SGRENDER procedure.  If a dynamic value for an option is not set, that option is ignored.

The template itself uses the following:

  • A BARCHART statement, with Category, Response and Group options, using cluster groups and stat=mean.
  • A DiscreteLegend, with the sort option.

CarsIn the full data set (minus Hybrids), the first group value is SUV, but the first group value for the category Europe is Sedan.  We use this data set to create the graph using the SGRENDER step shown below.

proc sgrender data=cars template=bar;
dynamic _resp="mpg_city"

The  graph is shown below.   The group colors are assigned in the order the unique group values are encountered in the data.  "SUV" is the first value for TYPE in the data set, and hence gets GraphData1 as the style element, with the blue fill color.  "Sedan" is the second group value, so gets GraphData2, with the red fill color.  All of the group colors are displayed in the legend in the order they are encountered in the data.  Click on the graph for a higher resolution image.

BarCarsThe order of displaying each group value within each category is unique to each category.  So, for the category "Europe", Sedan is the first group value, and hence is displayed first.  The color used for Sedan is consistent in the whole graph.

Now, instead of letting the GTL BarChart summarize the data, let us summarize the data ourselves.  We will use the MEANS procedure to get the MeanMpg by Origin and Type as follows:

CarMeansproc means data=cars noprint;
class origin type;
var mpg_city;
output out=carmeans

Note, in the data shown above, the order of the type values has changed.  Now the first type value for Europe is also SUV.  This is different from the order of the original data.

Now, we use this data to plot the MeanMpg by Origin and Type using the same template and the SGRENDER code below.

BarCarMeansproc sgrender data=carmeans template=bar;
  dynamic _resp="meanMpg" _footnote="CarMeans"; 

Note in the graph above, the order for assigning the group colors has now changed.  Wagon and Truck have now swapped positions and colors.  Also, the position of each type value within each category has changed.  The colors are correct within the graph, but are no longer consistent with the first graph.

BarCarMeansByMpgThe purpose of this exercise was to order the group values within each category by descending value of the response.  Since the order within each category is retained as was in the data, we can now sort the data however we want, and display the values in our custom order.

The graph above shows the car types by origin sorted by the mean mileage within each category.  Now, since Sedan is first in the data, it gets the first color, and so on.  But, every time you do this kind of a custom sort (say by car counts), the incoming data order changes, and so does the color assignment.  How can we retain consistent group colors across all graphs?

AttrMapTo ensure the colors are consistent we can use a Discrete Attr Map.  Extract the original order of the group values using the MEANS procedure, and construct an attr map data set so that the colors are specified using the order.   The attr map data set is shown on the right.

BarCarMeansByMpgMapThe graph created using this attr map is shown on the right, with the legend entries sorted alphabetically.  The colors of each type are exactly as the original graph, though the positions in the legend are now in alphabetical order.

BarCarMeansByMpgMergedIf we are really picky, and want the colors assigned as per the original order AND get positions in the legend in the same data order, we have to play a little trick.   Instead of building an attr map, we use the extracted the original order, and prepend those values into the sorted data, with missing values for the Origin column.   See code in the attached program.

Remember, we said at the top of the article that colors are assigned based on the order of the group values, even if the observations cannot be drawn due to other reasons.  So, prepending the unique group values in the order you want, with other missing values will do this trick.  Now, the color values are assigned in the original data order (or, whatever order we want), and the observations are drawn in the order they are in rest of the data.

SAS 9.4 Code:  Group_Order_94


Post a Comment


Spirals are cool.  And useful.  We use them every day without thinking about it.  Every time the road turns from a straight line to a curve, we go through a transition spiral.  Spirals allow us to change curvature in a steady increasing or decreasing fashion.   Without a spiral, this transition would be abrupt.

Weather_SeriesFor our purpose, spirals can also be useful for visualizing data that is cyclical in nature.  If you are visualizing daily high temperature over a 2 year period, you could plot it on a straight X axis as shown on the right.

The cyclical nature of the data is evident in the graph.  But, this same cyclical behavior may be  easier to understand on a spiral with one cycle per year.   So, that is the plan for today.

A visit to Wikipedia page on spirals reveals there are many kinds of spirals including logarithmic, hyperbolic and many more.  Let us start with the simple Archimedean Spiral.  This has the simple formula:  R = A + B * theta.

Experimenting with this equation yields these different spirals for various values of A and number of cycles.

Spiral_N1A=0, Cycles=1

The x and y points are computed using the spiral equation, for theta values up to N * 360, where N is the number  of cycles.  The curve is plotted using the series statement of the SGPLOT procedure.

proc sgplot data=spiral nowall noborder pad=0;
  series x=x y=y / smoothconnect;
  xaxis min=-&max max=&max grid display=none;
  yaxis min=-&max max=&max grid display=none;

A=0.5, Cycles=2

The first spiral starts from the center (A=0) and turns through 360 degree cycle.  The second spiral starts at an offset of 0.5 the radius and turns through 720 degrees.

Once the spiral is drawn, now we need to map the data on to the spiral so that the time axis is along the spiral and the response values are drawn normal to the spiral, towards the center.

Spiral_N3_VThe graph on the right shows the basic idea.  At each point along the time axis, we compute the theta and then find the (x, y) point on the spiral.  The direction vector (cx, cy) for the response (the arrows) is towards the center of the spiral (0, 0).   In the example on the right, all arrow heights are half the spiral spacing.  So, we can compute the (x2, y2) location of the arrow heads from (x, y) and (cx, cy) as shown in the program.

I have used some SAS 9.4 features to draw smooth curves and remove background wall and border.   A SAS 9.3 version program is also included.

Spiral_N3_VS5With real time series data, we would normalize the response over the entire range, and plot the data one side of the spiral by using the abs() value and a color to signify sign.  Then, scale the vector by response and plot.  A simulated example is shown on the right.

We will cover the mapping real world time series data (as shown in the first graph on top) on to the spirals in next article on Spirals.

SAS 9.4 Program:  Spiral_Macro_94

SAS 9.3 Program:  Spiral_Macro_93

Post a Comment

Proportional Euler Diagram

The topic of VENN diagrams had come up a while ago.  At that time, I thought it may be interesting to build a proportional VENN diagram.  But, reading up on VENN Diagrams, I learned that VENN diagrams represent all intersections of N sets, regardless of whether there are actually any observations in one of the regions.  So, there did not seem any purpose to make a proportional VENN diagram, and maybe the term itself is an oxymoron.

Euler_30_20_0_SheenI was interested in a graphical representation of the number of different types of subjects in a study, say subjects with Diabetes, or Hypertension or both.   It turns out, Euler Diagrams do represent the real world data, and not all theoretical combination.  So, it would make sense to draw a Proportional Euler Diagram.

I started with the simple 2-Set case, as it seems achievable.  The results are shown on the right.  The values for N1, N2 and NI are also Euler_30_0_20_sheenshown in the footnote, along with the value of the convergence error.  The two special cases are shown on the right, and are straightforward.  Click on the graphs for a higher resolution image.

The two cases with intersecting circles are shown below.  For the first one, the numbers are such that the intersection point of the two circles lies in-between the centers of the two circles.   For the second case, the intersection lies to the right of the smaller circle.

Euler_30_20_10_NoneIn all cases, the radius of the larger circle is set to 10 (arbitrary), and I compute the area of the smaller circle proportional to the number of observations in the circles.

Here are the details of my program:

  • N1, N2 and NI are the number of observations in Set 1, Set 2 and intersection ONLY.
  • So, N1+NI is first circle, and N2+NI is the 2nd circle, and NI is the intersection.
  • Euler_30_10_30_PressN1 >= N2.
  • Special case #1 -> NI=0.  This means the two circles are non-overlapping.
  • Special case #2 -> N2=0.  This means the circle 2 is fully inside circle 1.
  • Case #3 -> the intersecting vertical line is between center 1 and center 2.
  • Case #4 -> the intersection vertical line is to the right of centers #2.

Here is the algorithm:

  • First, I assign v - height of the intersection above centerline = 1.
  • Compute the three different areas.
  • Compute the area per observation in each section.
  • Then, based on the ratio of ANI / AN1, I adjust v by the error ratio.  V is kept < r2.
  • I repeat this while the error is > 0.001 and number of iterations < N.
  • Now, if the error is still > 0.001, convergence is not reached and the intersection is to the right of the center 2.
  • Now, set v=0.99999*r2 and repeat the same computations above, with reducing v.

I assume convergence is reached, and based on this value of v, I compute the horizontal distance from center of each circle to the intersection, d1 and d2 and other numbers needed to plot the details.

I can use the ELLIPSEPARM or BUBBLE (RelativeScale=False)  statement to draw the plot.  However, SGPLOT procedure does not support these statements (not in the 80-20 range for simple plots).  So, I used GTL, with the BubblePlot because I wanted to use skins.

I made it into a macro, with three parameters N1, N2 and NI.  Skin is optional.  If you have a need for Proportional Euler Diagrams in your work, please chime in and let me know if this is useful to you.  Maybe you have made one of your own and I would love to hear how you went about solving for the intersection areas.

VENN diagram shapes for 2, 3, 4 and more sets are available on the web, would be possible to make these using EllipseParm statement for both circles and ellipses.

I plan to tackle the case of the 3 set Proportional Euler Diagram.  This same algorithm may not extend to this case.  I would love to hear your ideas.

Full GTL Macro program:  Euler_Bubble_Macro

Post a Comment

Graphs are easy in SAS University Edition

By now you have heard all about the SAS(R) STUDIO software that provides access to the power of SAS analytics in a Web browser.  The SAS(R) University Edition is also available free for higher education teaching, learning and  research.

This software includes ODS Graphics software for creating graphs.  You can use the familiar program window to write your own SAS data step and procedures.  An example of running your own SGPLOT program is shown in Robert's recent article on How to create a Histogram using the SAS University Edition.

Studio_UI_2Making graphs gets even easier in SAS Studio by using the graph tasks that are included with the software.  When you first launch the software, you will see the user interface shown on the right.  Click on the "Tasks" button on the left, and you will see a list of tasks by category.

Here I have highlighted the Tasks folder and the Graph subfolder under it.

Multiple graph tasks are available including Bar Chart, Bar-Line Chart, Box Plot, Histogram, Line Chart, Pie Chart, Scatter Plot, Series Plot and more.

Histogram_TaskEach of these tasks presents you with a form to set the data and various options as shown on the right.  Here, I have launched the "Histogram" task as shown highlighted in blue.  This starts the Histogram task.

Each task presents you with an easy to use visual interface to set the parameters and options necessary to make the graph.  These are all collected under two tabs - The Data tab and the Options Tab.  In the image on the right, the Data tab for the Histogram task is highlighted in yellow.

Each graph task allows you to provide the name of the data set and the required variables to create the graph.  In case of Histogram, you need to provide only one numeric variable for the Analysis Role.  In the example above, we have selected the SASHELP.CARS data set and the MPG_CITY column for graphing.

Histogram_ResultsOnce the required parameters are provided, you can submit the task by pressing the run button or 'F3'.  The task will render the histogram using the default settings for styles and present the graph to you in the Results window as shown on the right.

Each task also supports optional settings which are included under the "Options" tab.  These options can be used to customize your graph, including setting of titles, footnotes and  graph size.  In this case, I have set the graph size to 4" x 3" to fit the small region.

Each graph task generates the required SGPLOT code needed to render the graph.  This code is available in the Code window under the "Code / Results" tab.  This code is built and updated as you apply the settings in the Data and Options panels.  So, this is a good way to get started with learning the SGPLOT procedure.

The tasks cover many of the features available in the SGPLOT procedure, but not all.  So, you can cut and paste the code into the program window and customize it to your own needs.

You can learn more about creating graphs using SG procedures right here in this blog.  Learn all about the procedures themselves in the book on Statistical Graphics Procedures by Example.




Post a Comment

Swimmer plot

At PharmaSUG 2014 in San Diego, I had the pleasure of attending "Swimmer Plot: Tell a Graphical Story of Your Time to Response Data Using PROC SGPLOT", by Stacey Phillips.  In this paper, Stacey presented an interesting graph showing the effects of a study drug on patients' tumor size.

Swimmer_StaceyStacey says in her paper that often "investigators prefer to dig deeper and look at an individual subject’s pattern of response. A swimmer plot is a graphical way of showing multiple pieces of a subject’s response 'story' in one glance."    The final graph includes a bar showing the length of treatment duration for each patient, classified by the disease stage at baseline, one for each patient in the study.  Graph also includes indicators for the start and end of each response episode, classified by complete or partial response, and an indicator showing whether the patient is a "Durable responder".

Stacey uses a combination of HBarParm, Scatter and annotations to create this graph.  The annotation is used for adding the "Continued response" arrow, and for the display of the inner legend for decoding of the various symbols in the graph.

Along with many of the attendees of the presentation, I was impressed and intrigued by this visual.  I was curious if its creation could be simplified using some of the new features released with SAS 9.3.  In particular, I wanted to see if I could make this graph without any annotation.

DataSAS 9.3 includes some versatile plot statements and features to create graphs.  Two of these are the HIGHLOW plot and the Discrete Attributes Map for controlling the color of the group values.

The data used to create the graph is is eyeballed from Stacey's graph and shown above.  Updated graph is shown below. Click on the graph for a higher resolution image.

Swimmer_93Here are the features of this graph:

  1. This graph uses the High Low plot to draw the bar representing the duration of the response for each subject.
  2. The bar has a arrow on the right side to indicate a continued response.  This is explained in the 1st footnote.
  3. Each response episode is represented by start and end events joined by a line classified by the type of response - Complete or Partial.  Connecting the start and end event and using a common classification color groups these together as one event, and is easier for the eye to consume.  Continuing response does not have an end event on the right.
  4. All event lines and markers are included in the inner legend.

Swimmer2_94It is also possible to place the indicator for continued event into the key legend using a "TriangleRightFilled" marker in the graph.  This marker is drawn outside the plot region, but is included in the legend.  Some items in the legend are shown in grey, to indicate the meaning of the shape since the actual marker will have different colors in the graph based on other criteria.

The graph on the right uses SAS 9.4 with a few aesthetic features for bar skins and filled, outlined markers.  Note the shorter line segments in the legend.

Note, the marker for the right arrow in intentionally made bigger to match the right arrows of the HighCap of the HighLow plot.

SAS 9.3 Code:

footnote  J=l h=0.8 'Each bar represents one subject in the study.';
footnote2 J=l h=0.8 'A durable responder is a subject who has confirmed response for at least 183 days (6 months).';
proc sgplot data= swimmer dattrmap=attrmap nocycleattrs;
  highlow y=item low=low high=high / highcap=highcap type=bar group=stage fill nooutline
          lineattrs=(color=black) name='stage' nomissinggroup transparency=0.3;
  highlow y=item low=startline high=endline / group=status lineattrs=(thickness=2 pattern=solid) 
          name='status' nomissinggroup attrid=status;
  scatter y=item x=start / markerattrs=(symbol=trianglefilled size=8 color=darkgray) name='s' legendlabel='Response start';
  scatter y=item x=end / markerattrs=(symbol=circlefilled size=8 color=darkgray) name='e' legendlabel='Response end';
  scatter y=ymin x=low / markerattrs=(symbol=trianglerightfilled size=14 color=darkgray) name='x' legendlabel='Continued response ';
  scatter y=item x=durable / markerattrs=(symbol=squarefilled size=6 color=black) name='d' legendlabel='Durable responder';
  scatter y=item x=start / markerattrs=(symbol=trianglefilled size=8) group=status attrid=status;
  scatter y=item x=end / markerattrs=(symbol=circlefilled size=8) group=status attrid=status;
  xaxis label='Months' values=(0 to 20 by 1) valueshint;
  yaxis reverse display=(noticks novalues noline) label='Subjects Received Study Drug' min=1;
  keylegend 'stage' / title='Disease Stage';
  keylegend 'status' 'd' 's' 'e'  'x' / noborder location=inside position=bottomright across=1;

The part that I believe makes this version easier to consume is the continuity of the response events.  Joining the start and end events with a line segment, all having the same color as per the event classification allows the eye to see each event and its duration clearly.

The part I like best is the graph uses no annotation.

Full SAS 9.3 Code:Swimmer_93




Post a Comment

Grouped Timeline

Recently, a user posed a question on how to plot stacked frequencies on a time axis.  The data included frequencies of different viruses by week.  The data is modified to preserve confidentiality and is shown below.

DataThe user's first instinct was to use a bar chart with stacked groups.  This works for automatically computing frequencies by week and group and also stacked the group values.  Except, the x axis is made discrete and the bars are only drawn where data exists.  However, the user wants to see all weeks positioned correctly the x axis, with gaps where there is no data for some weeks.  The data starts in April 2013 and goes to March 2014, so plotting by week displays the data out of order.

Here is the graph, created using the bar chart.  The graph shows the frequencies for the two viruses by week, using stacked groups.  The data for week numbers 1-14 are listed first even though these actually for 2014.  The weeks are drawn as discrete values, and there are no gaps for weeks that are missing because the bar chart treats the Category axis as discrete.  However, the VBAR statement makes it easy to see the stacked frequencies.


To get this kind of graph on a scaled time axis, one would need to use a Needle plot or a HighLow plot.  However, neither of these will automatically compute the frequencies by date and group for a stacked display.

HighLow_DataSo, I used the MEANS procedure to compute the frequencies by week and virus.  Then, I ran a data step by year and week to compute the low and high values for each virus in a given week.  I also compute a "date" value for each week of the year.  Here is the data set:

Now, I use the HighLow plot to draw the bar segments for each virus value by date.  The low and high values for each group segment are already computed.


proc sgplot data=stacked dattrmap=attrmap;
  format week 2.; 
  highlow x=dateOfWeek low=low high=high / group=virus name='a' type=bar
          lineattrs=graphdatadefault attrid=virus; 
  yaxis display=(nolabel) offsetmin=0 grid;
  xaxis display=(nolabel);
  keylegend 'a' / title='Virus' location=inside position=topright across=1;

As you can see, the SGPLOT code is very simple:

  • We use a HighLow plot by dateOfWeek and GROUP=VIRUS.
  • We used the previously defined discrete attributes map for each virus name.
  • We set other details like legend and axis properties.

The user wanted to see the week values displayed, which can be easily done using the LOWLABEL option of the HighLow plot.


The full SAS code is snown below, however, I cannot share the data as it is confidential.  You can see the structure of the data above and if you simulate similar data, you can run the code.

Full SAS 9.3 program (not including data): HighLow_Timeline

Post a Comment

Lab Values Panel

It was almost two weeks ago that I got started making a display for lab tests for a subject, based on a graph I saw on the web for an article on this blog.  KPI_Panel_Crop

This graph is a part of a larger panel display of the lab values for a subject.  The panel includes the display of multiple lab values, including a gradient range of the percentile values for the general population.  The lab value for the subject is shown in the box on the left and also in the gradient range.  The graph is shown on the left.

Cruise_Crop_SmallWhile working on this article, I ran into a few issues including the minor issue of a long planned vacation to Hawaii that included a cruise around the islands.  Suffice it to say, the the islands are fabulous, and the cruise lived up to all the expectations one can imagine.  Here is a picture I took of the boat, when anchored off Kona on the Big Island.

Then, it was time for PharmaSUG in San Diego.  The conference was a resounding success, and I had the opportunity to meet with SAS users interested in creating graphs using ODS Graphics.  The presentations were excellent, with users much more likely to be persuaded by the experiences of fellow SAS users rather than hearing from SAS staff.

Back from these two diversions, I finally got back to this project.  Here is the step by step progression to making this graph.

Data_PanelFirst, on the right is the data I gleaned from the web image, and with Rick's help, created this data set of the values in the graph.  Now, the expectation is that when you make such a graph, you have all the pertinent data in hand.  Note that each value V1, V2, etc. are for the 0, 25th, 50th, 75th and 100th percentile of the data.  Note, for all tests, the "better" numbers are on the left, and the "worse" numbers are on the right.  I use the column "Rev" to indicate the ranges are reversed, with higher numbers to the left.

Lipid_DashboardThis graph uses SAS 9.4 but all significant feature of the graph can be created using SAS 9.3.  Here is the simple graph showing the test, values and the percentile ranges.

For each test, the percentile values for the larger population are shown on the right, with the percentile values above the box, and actual test values inside the box.  The actual test value for the subject is also shown at the correct percentile location on each bar.


title 'Lipid Panel for Subject XXX-XX-XXXX';
proc sgplot data=Lipid noautolegend nowall noborder;
  highlow y=test low=low high=high / type=bar outline nofill barwidth=0.5 ;
  hbarparm category=test response=vn / barwidth=0.2  dataskin=gloss 
           fillattrs=(color=gray) nooutline baselineattrs=(thickness=0);

  scatter y=test x=vn2 / markerchar=v2 markercharattrs=(color=lightgray);
  scatter y=test x=vn3 / markerchar=v3 markercharattrs=(color=lightgray);
  scatter y=test x=vn4 / markerchar=v4 markercharattrs=(color=lightgray);

  scatter y=test x=vnl / markerchar=lvn1 markercharattrs=(color=gray) discreteoffset=-0.35;
  scatter y=test x=vn2 / markerchar=lvn2 markercharattrs=(color=gray) discreteoffset=-0.35;
  scatter y=test x=vn3 / markerchar=lvn3 markercharattrs=(color=gray) discreteoffset=-0.35;
  scatter y=test x=vn4 / markerchar=lvn4 markercharattrs=(color=gray) discreteoffset=-0.35;
  scatter y=test x=vnh / markerchar=lvn5 markercharattrs=(color=gray) discreteoffset=-0.35;

  scatter y=test x=vn / markerattrs=(symbol=trianglefilled size=12) discreteoffset=0.2
          filledoutlinedmarkers markerfillattrs=(color=white) dataskin=gloss;
  scatter y=test x=vn / markerchar=value discreteoffset=0.4 
          markercharattrs=(size=8 weight=bold);

  xaxis display=none offsetmin=0 offsetmax=0;
  yaxis display=(nolabel noticks noline);

The program above uses a HighLow plot to draw the box of ranges, and a scatter plot with markerchar option to display the percentile values above the box and the actual values in the middle.  An offset triangle marker is used to denote the percentile location of the actual value, and the value itself is displayed below the marker.

Lipid_Dashboard_Box_NameThe test names in the original graph are left aligned, and the values are displayed in a box next to the test name along with the units of the values.  I added this information using additional HighLow plots with HighLabel option to display the test name, the test value and the units.

The only unit that needs improvement is the "muMol/L", where it would be better to use the greek symbol for "mu".

title 'Lipid Panel for Subject XXX-XX-XXXX';
proc sgplot data=Lipid noautolegend nowall noborder;
  highlow y=test low=boxL high=boxH / type=bar nofill outline lineattrs=(color=black) barwidth=0.6;
  scatter y=test x=boxM / markerchar=value discreteoffset=0 markercharattrs=(size=8 weight=bold);
  scatter y=test x=boxM / markerchar=units discreteoffset=-0.4 markercharattrs=(size=7 color=gray);
  highlow y=test low=nameL high=nameH / type=bar nooutline barwidth=0.6 fillattrs=(transparency=1);
  scatter y=test x=nameL / datalabel=test datalabelattrs=(size=8 weight=normal) datalabelpos=right
  highlow y=test low=low high=high / type=bar outline nofill barwidth=0.5 ;
  hbarparm category=test response=vn / barwidth=0.2  dataskin=gloss 
           fillattrs=(color=gray) nooutline baselineattrs=(thickness=0);
  scatter y=test x=vn2 / markerchar=v2 markercharattrs=(color=lightgray size=7);
  scatter y=test x=vn3 / markerchar=v3 markercharattrs=(color=lightgray size=7);
  scatter y=test x=vn4 / markerchar=v4 markercharattrs=(color=lightgray size=7);
  scatter y=test x=vnl / markerchar=lvn1 markercharattrs=(size=7 color=gray) discreteoffset=-0.35;
  scatter y=test x=vn2 / markerchar=lvn2 markercharattrs=(size=7 color=gray) discreteoffset=-0.35;
  scatter y=test x=vn3 / markerchar=lvn3 markercharattrs=(size=7 color=gray) discreteoffset=-0.35;
  scatter y=test x=vn4 / markerchar=lvn4 markercharattrs=(size=7 color=gray) discreteoffset=-0.35;
  scatter y=test x=vnh / markerchar=lvn5 markercharattrs=(size=7 color=gray) discreteoffset=-0.35;
  scatter y=test x=vn / markerattrs=(symbol=trianglefilled size=12) discreteoffset=0.2
          filledoutlinedmarkers markerfillattrs=(color=white) dataskin=gloss;
  scatter y=test x=vn / markerchar=value discreteoffset=0.4 markercharattrs=(size=8 weight=bold);
  xaxis display=none offsetmin=0 offsetmax=0;
  yaxis display=none;

Now, let us get to the display of the gradient green-yellow-red ranges in the display.  There is no plot statement in SG or GTL that can draw a gradient color across three colors.  Some plot statements support a Color Response option, but essentially the entire entity is rendered with the color derived from the color gradient.

Lipid_Dashboard_Grad_Name_ValueOnce again, we resort to using the versatile  HighLow plot to draw the gradient.  HighLow plot does not support a color gradient option, but does support a GROUP option that colors each segment with the group color from the style, or a Discrete Attributes Map.  Here, we will use the DAttrMap option of the SGPLOT procedure to draw the ramp.

We create Low and High columns for 100 HighLow segments for each test name.  Each segment is 1 unit, in a do loop from 0 to 99 by 1.  Each segment has an id - the loop variable.

We also create a DAttrMap data set, such that each value 1-99 has a corresponding color that gradiates from green to yellow to red.  See the code in the full program attached at the bottom.  The result is the gradient ranges as shown in the graph above.

Lipid_Dashboard_Grad_AnnoFinally, we use some simple annotations to add the information at the top of the graph.   Five observations in the SGANNO data set describe the way to draw the four text strings and the arrow object.

Once again, this exercise has exposed the need for some more features that will make this task easier such as support of ColorResponse for bar charts and Highlow plot.  We will look into adding such options in a future release.

The technique to creating such non standard and complex graphs using SG or GTL is to analyze the graph, and break it down in to its component parts.  Then use the appropriate plot statement "creatively" to build the graph l layer at a time.   Some details that cannot be done using plot statement can be handled by annotate.

Full SAS9.4 Code without the Gradients:  Lipid_Dashboard

Full SAS9.4 Code with Gradients: Lipid_Dashboard_Gradient

Full SAS9.3 Code with Gradients:  Lipid_Dashboard_Gradient_93

Post a Comment

Report from PharmaSUG 2014

Just getting back from PharmaSUG 2014 in San Diego.  The conference was great, both inside and outside.  The organizers ordered up some great weather for the Padres game and also for dinner on the flight deck of the Midway Carrier.

DG01_Time_To_Event_PanelOur focus here being on graphics, we were all extremely gratified by the presentations in the Data Presentation section.  Amos Shu got us started with graphs for Adverse Event timeline graphs and panels in his paper Techniques of Preparing Datasets for Visualizing Clinical Adverse Events.

Wu, Dai and Gau presented a Graphical Representation of Patient Profile for Efficacy Analyses in Oncology  with Efficacy Patient Profile graphs using GPLOT and ANNOTATE:


Mayur Uttarwar and Murali Kanakenahalli proposed Developing Graphical Standards: A Collaborative, Cross-Functional Approach to ensure the correct list of Symbols and colors for the plots in the graph.

DG07_SwimmerStacey Philips presented a Swimmer Plot: Tell a Graphical Story of Your Time to Response Data Using PROC SGPLOT, displaying disease stages for each subject with additional information on the events.

Kriss Harris presented Napoleon Plot for PharmaSUG and I Am Legend for PharmaSUG , presenting displays for assessing treatment safety, and ways to create just a legend, when the number of entries in the legend are too many to be included in one graph.

Jeffery Meyers presented Kaplan-Meier Survival Plotting Macro %NEWSURV which used the GTL layouts in creative ways to display loads of information in one plot or panel.


SP14_SurvivalWarren Kuhfeld presented ways to customize the popular Survival Plot graph created by the LIFETEST Procedure for SAS 13.1 using a combination of %ProvideSurvivalMacros, Customization macros, %CompileSurvivalTemplates to create the customized templates, and then run the LIFETEST procedure to produce the customized graph output.


DG14_GTL_LayoutsFinally, I presented my paper from SGF 2014 -Up Your Game with Graph Template Language Layouts using GTL layouts to create complex custom graphs.  This paper will get you started using the GTL layouts to go beyond the graphs you can create using the SGPLOT procedure.

As usual, PharmaSUG lived up to its reputation of taking care of its attendees by providing fabulous food for breakfast, lunch and dinners.  In addition to all the knowledge, I feel like I also gained 5 pounds.

For me, the highlight is always meeting and interacting with SAS users, who bring so much enthusiasm to the conference.  One quote that I took back to my team from a presentation was "Making graphs with SAS is FUN".  It is nice to get validation of our efforts to provide you the tools you need to easily create beautiful and effective graphs with SAS.

Post a Comment