Scatter with box

Previously, I discussed ways to create a Box Plot with Stat Table and Markers in the linked article.  One of the graphs showed a Box Plot of Cholesterol by Death cause along with the display of the actual observations.  The main goal for that article was display of statistics with a Box Plot.

scatter_box_2Often we want to view the data by a discrete variable along with its distribution.   Starting with SAS 9.40M3, we can overlay a VBOX on the Scatter plot, as shown on the right.  Here the box plot is offset to the right from the data.  This is a small variation on the graphs shown in the link above, but may provide a cleaner view of the observations and the distribution.

Note, when we use a scatter plot with an x-axis with a discrete variable (in this case Type) and the "Jitter" option, the graph automatically places the categories equally spaced on the axis, with an offset of half the midpoint spacing at the ends like in a bar chart.  We have overlaid a VBOX that is offset to the right, and reduced the box width to 0.2.   Click on the graph for a higher resolution view.  The code is shown below.

SAS 9.40M3 SGPLOT Code:

title 'Mileage by Type for Asian Cars';
proc sgplot data=cars noautolegend noborder;
  scatter x=type y=mpg_city / jitter jitterwidth=0.5;
  vbox mpg_city / category=type discreteoffset=0.4 boxwidth=0.2 nooutliers nofill;
  xaxis display=(nolabel noline);
  yaxis display=(noline noticks) grid gridattrs=(color=white);
run;

This is made possible with the DISCRETEOFFSET option available on all plot statements that support a discrete variable, including bar charts, box plots, scatter, series and more.  In the example above, I have left the scatter centered on the midpoint, but to make room for the box, I have reduced the JITTERWIDTH so the markers are not spread over the entire available spacing.

scatter_box_3cThe graph on the right provides an alternative appearance, with a reverse color scheme for background and wall, and removal of the axis lines and borders.  Note the following changes:

  • Use of Styleattrs to set the wall color.
  • Scatter and box are both offset, so tick value is in the middle of each cluster.
  • Use of x-axis color bands to indicate the cluster.
  • The x-axis tick values are now really an axis table so the color band can include the tick value.
  • See link for full code.

SAS 9.40M3 SGPLOT Code:  scatter_box

Post a Comment

Advanced ODS Graphics: Axis tables in PROC SGPLOT and the GTL

Axis tables enable you to combine tabular and graphical information into a single display. I love axis tables. My involvement with axis tables dates back over 30 years to their ancient predecessor, the table that contains an ASCII bar chart. In the mid 1980s, I created a table in PROC CORRESP consisting of 5 columns of numbers and a series of asterisks to graphically show the percentage column. In Version 7, as ODS was being developed and we supported destinations other than LISTING, this was among the last tables that the ODS team provided options to create. A few years ago, I happily replaced my ASCII bar chart table with a graph using axis tables. Since then, I have also added axis tables to replace tables with ASCII bar charts in PROC REG. Axis tables are also used in PROC LIFETEST and have numerous uses in clinical graphs. Sanjay has blogged about them often. Today, I am going to discuss some differences between axis tables in PROC SGPLOT and in the GTL. With only a little more work than writing PROC SGPLOT code, you can access some of the increased flexibility of the axis table in the GTL.

The following steps show how to make a simple graph that consists of an axis table and a scatter plot. The axis table consists of four groups of observations. Each group has a header and is separated from adjacent groups by blank lines.

data x(keep=n line value);
   do g = 1 to 4;
      Value = 'Head ';
      Line = catx(' ', 'Greater Than', 10  * (g - 1), 'A0A0'x);
      output;
      value = 'Body';
      do i = 1 to 10;
         k + 1;
         n = k;
         line = 'A0A0A0'x || put(k, words30.);
         output;
      end;
      value = 'Blank';
      line = repeat('A0'x, g - 1);
      n = .;
      if g lt 4 then output;
   end;
run;
 
data attrmap;
   id='mymap';    textcolor='Black';
   value='Head '; textsize=6; textweight='bold  '; output;
   value='Body '; textsize=5; textweight='normal'; output;
   value='Blank'; textsize=1;                      output;
run;
 
ods graphics on / height=500 width=250;
proc sgplot data=x noborder noautolegend dattrmap=attrmap tmplout='temp.temp';
   yaxistable line / position=left textgroup=value textgroupid=mymap;
   scatter y=line x=n / x2axis markerattrs=(symbol=circlefilled);
   yaxis reverse display=none;
   xaxis display=(noticks nolabel novalues);
   x2axis grid display=(noticks nolabel) valueattrs=(size=6px);
   label line='00'x;
   scatter y=line x=n / markerattrs=(size=0);
run;

axistable

The TMPLOUT= option writes the underlying template that PROC SGPLOT creates to a file. After adding indentation, the template is as follows:

proc template;
   define statgraph sgplot;
      begingraph / collation=binary;
         DiscreteAttrVar attrvar=MYMAP_VALUE var=VALUE
         attrmap="__ATTRMAP__MYMAP";
         DiscreteAttrMap name="__ATTRMAP__MYMAP" /;
            Value "Head"  / textattrs=(color=CX000000 weight=bold);
            Value "Body"  / textattrs=(color=CX000000 weight=normal);
            Value "Blank" / textattrs=(color=CX000000 weight=normal);
         EndDiscreteAttrMap;
         layout lattice / columnweights=preferred rowweights=preferred
             columndatarange=union rowdatarange=union columns=2;
            Layout Overlay /  yaxisopts=(reverse=true display=none
                              type=discrete) walldisplay=none
                              yaxisopts=(display=none griddisplay=off
                              displaySecondary=none)
                              y2axisopts=(display=none griddisplay=off
                              displaySecondary=none);
               AxisTable Value=Line Y=Line / labelPosition=min
                         textGroup=MYMAP_VALUE Display=(Label);
            endlayout;
            layout overlay / cycleattrs=true walldisplay=(fill)
                             xaxisopts=(display=(line) type=linear)
                             yaxisopts=(reverse=true display=none
                             type=discrete)
                             x2axisopts=(display=(tickvalues line)
                             TickValueAttrs=(Size=6px) type=linear
                             griddisplay=on);
               ScatterPlot X=n Y=Line / subpixel=off primary=true
                           xaxis=x2 Markerattrs=(Symbol=CIRCLEFILLED)
                           LegendLabel="" NAME="SCATTER";
               ScatterPlot X=n Y=Line / subpixel=off Markerattrs=(Size=0)
                           LegendLabel="" NAME="SCATTER1";
            endlayout;
         endlayout;
      endgraph;
   end;
run;

You can use it to create the same graph by submitting the template and calling PROC SGRENDER:

proc sgrender data=x template=sgplot;
   label line='00'x;
run;

The PROC SGPLOT YAXISTABLE statement is as follows:

   yaxistable line / position=left textgroup=value textgroupid=map;

The statement in the PROC SGPLOT graph template is:

   AxisTable Value=Line Y=Line / labelPosition=min
             textGroup=MAP_VALUE Display=(Label);

The YAXISTABLE statement requires one variable. This is the variable that is displayed in the graph. In contrast, the AXISTABLE statement in the GTL has two variables. The Y= option specifies the Y axis coordinates. By default, PROC SGPLOT sets this option in the template to specify the Y= variable from the primary plot, which in this case is the first SCATTER statement. In the GTL, you specify the Y= option before the slash. In PROC SGPLOT, you can specify this option after a slash.

The PROC SGPLOT example works because every line---every Y= value in the GTL---has a unique value. In particular, the first blank line consists of one nonbreaking space ('A0'x), the second blank line consists of two nonbreaking spaces ('A0A0'x), and the third blank line consists of three nonbreaking spaces ('A0A0A0'x). You must use unique patterns of nonbreaking spaces for this example to work. This could be come problematic in more complicated reports in which some lines might be the same.

PROC SGPLOT generates a LAYOUT LATTICE block that contains two LAYOUT OVERLAY blocks. The option COLUMNWEIGHTS=PREFERRED specifies that ODS Graphics is free to set the width of the axis table and the scatter plot based on the width of the axis table. In the GTL (but not in PROC SGPLOT) you can explicitly control the width by specifying a vector of column weights. For example COLUMNWEIGHTS=(0.75 0.25) creates an axis table that occupies 75% of the vertical space. For advanced uses of the axis table, you might want to explicitly control the axis table width, which means you need to use the graph template. Fortunately, PROC SGPLOT writes graph templates for you that you can save and modify.

This step creates a modified input data set:

data y;
   set x;
   RowID + 1;
   if value eq 'Blank' then line = ' ';
run;

This data set has a RowID variable that contains the row number. Blank lines consist of ordinary blanks and are no longer unique.

You can use a DATA step to edit the graph template. The following step changes the name to 'MyAxisTemplate', specifies weights to give both the axis table and the scatter plot 50% of the vertical space, and modifies the three Y=LINE options to specify Y=RowID.

data _null_;
   infile 'temp.temp';
   input;
   _infile_ = tranwrd(_infile_, 'sgplot', 'MyAxisTemplate');
   _infile_ = tranwrd(_infile_, 'columnweights=preferred',
                                'columnweights=(0.5 0.5)');
   _infile_ = tranwrd(_infile_, 'Y=Line', 'Y=rowid');
   call execute(_infile_);
run;
 
proc sgrender data=y template=MyAxisTemplate;
   label line='00'x;
run;

Except for the change in width, these steps create the same graph as is shown above. By modifying the graph template, you can control the width of each panel, which gives you greater flexibility in making complicated reports.

For more information about using CALL EXECUTE to modify graph templates, see the free web book Advanced ODS Graphics Examples.

The preceding examples work as they do because they have a discrete Y axis. The following example creates a first plot that has a TYPE=DISCRETE Y axis and a second plot that has a TYPE=LINEAR Y axis:

data x(keep=n line);
   do g = 1 to 4;
      do i = 1 to 10;
         k + 1;
         n = k;
         line = put(k, words30.);
         output;
      end;
      line = repeat('A0'x, g - 1); 
      n = .;
      if g lt 4 then output;
   end;
run;
 
data y;
   set x;
   rowid + 1;
   line = compress(line, 'A0'x);
run;
 
proc sgplot data=x noborder tmplout='a';
   yaxistable line / position=left;
   scatter y=line x=n;
   yaxis reverse display=none;
   xaxis display=none;
   label line='00'x;
run;
 
proc sgplot data=y noborder tmplout='b';
   yaxistable line / position=left;
   scatter y=rowid x=n;
   yaxis reverse display=none;
   xaxis display=none;
   label line='00'x;
run;

The RowID variable in the Y data set contains the row number. The SCATTER Y=ROWID variable sets the Y= option in the AXISTABLE statement in the graph template. The TYPE=DISCRETE AXISTABLE statement is the following:

AxisTable Value=line Y=line / labelPosition=min Display=(Label);

The TYPE=LINEAR AXISTABLE statement is:

AxisTable Value=line Y=rowid / labelPosition=min Display=(Label);

Using a numeric row number as the Y coordinate variable in your scatter plot can make it easier to construct axis tables in PROC SGPLOT. This works well when you are not displaying Y axis tick values.

Post a Comment

Hotel Text

hotel_300_5Yesterday, I published an article on Axis values display, where I mentioned the desire expressed by many users to get x-axis tick values in Hotel text orienttion.  The name comes from the way many hotel signs are displayed as shown on the right.  Such arrangement of text can also be very useful for many Asian languages.

There is no direct support for displaying this text in SGPLOT or GTL using some option.  So, I brought up the topic during lunch, and Lingxiao made a great suggestion - We could use the axis values splitting feature to get such arrangement of the x-axis tick values.

If you recall, we can use the FitPolicy=SplitAlways, and this means the value will be split on every occurrence of the split character, which is a space by default.  So, if we can add spaces between each character of the category values and use this FitPolicy, we should get x-axis tick values displayed as hotel text.

dataHere is the code for modifying the names and a view of the data.  First, I find the longest string for name (code not shown below), then I used the prxchange function to replace each character by the same character plus a space.

data class;
length newname $&maxlen;
set sashelp.class;
newname = prxchange('s/(.)/$1 /', -1, name);
run;

Now, we can use this new variable as the category and set FitPolicy=SplitAlways to get Hotel text as shown below right.  This method is very scaleable and can be applied to any use case.  Some details in the code are trimmed to fit.  See linked program below for full details.

barcharthoteltitle 'Height by Name';
proc sgplot data=class noautolegend noborder;
vbar newname / response=height;
xaxis display=(nolabel noline noticks)
fitpolicy=splitalways;
yaxis display=(noline);
run;

SGPLOT code:  hoteltext

Post a Comment

Axis values display

barchartsplit2Displaying nicely rendered axis values reduces clutter and makes the graph more readable.  With SAS 9.4, we added the ability for splitting x-axis tick values on white space to create a nice and readable x-axis as shown in the graph on the right.

It is always a challenge to fit long tick values on the x-axis.  With SAS 9.4, a new "FitPolicy" was added to GTL to split tick values on white space, and to fit these in the available space automatically.  SGPLOT went a step further and made it the default fit policy (FitRotate).  Now, knowing the space available on the x-axis for each tick value, first the procedure will try to split the long values on white space, and if the longest word still does not fit, we rotate the values.

barchartdiagonal2Prior to SAS 9.4, the default policy was to rotate tick values at 45 degrees if the entire value did not fit on one line on the x-axis.  This caused the tick values to be displayed as shown on the right.  While the graph on the right is not adversely affected, when values are long, this can take away a lot of space from the available height of the graph.

There are many cases of bar charts that have many category values on the x-axis, and users have asked for a way to display the x-axis values rotated vertically.

 

barchartvertical2With SAS 940M3, we added the ability to rotate the tick values vertically, as shown in the graph on the right. For SGPLOT, use VALUESROTATE=vertical.  For GTL use TICKVALUEROTATION in the DiscreteAttrs bundle.  Note the values are oriented bottom to top to match the axis label orientation on Y or Y2 axis.

title 'Height by Name';
proc sgplot data=sashelp.class
        noautolegend noborder;
  vbar name / response=height
        dataskin=pressed nostatlabel
        baselineattrs=(thickness=0)
        fillattrs=graphdata3
        filltype=gradient;
  xaxis display=(nolabel noline noticks)
         fitpolicy=rotate
        valuesrotate=vertical;
  yaxis display=(noline);
run;

hotel_160A few users have asked for an option to display the x-axis tick values such that the direction is vertical (top down), but each glyph is shown unrotated.  This is sometimes also referred to as "Hotel" text, as shown on the right.  This type of layout is not available at this time.

 

chinese3This layout may also be useful with many Asian languages.  If this is important for your applications, please chime in or call in the request to SAS Technical Support.

SAS Code:  tickvalueangle2

Post a Comment

ODS Graphics Designer

designer_thumbnailcoverSome observant readers may have noticed a new icon on the right sidebar of this blog announcing the release of the new SAS Press book on the ODS Graphics Designer, written in collaboration with Jeanette Bottitta.  Jeanette is a Technical Writer at SAS and has worked on various SAS Graphics products over the years, including the ODS Graphics Designer.

At SGF 2013, I spoke about the Designer with Chris Hemedinger on one of his "Tech Talk" sessions.  Click on the link to see the video, and a short live demo of the software starting at about 4:20 minute mark in the video.

The ODS Graphics Designer (often referred to informally as Designer or SG Designer) is an interactive software for building statistical graphs using GTL under the covers.  No knowledge of graph syntax is required. Designer can be started from the SAS Tools menu as shown on the right.

toolsSelecting "ODS Graphics Designer" from the SAS "Tools" menu will launch the Designer application that presents you with a GUI that can be used to create most commonly requested graphs.  Designer generates the GTL code for you as you create the graph interactively.  This code can be viewed using "View->Code" menu inside Designer.  You can copy and paste the code into the SAS Program Editor and run it to get the same results.

A new "Patent Pending" feature released with SAS 9.40 release of Designer is the ability to generate graphs in bulk based on variables selected by users from a data set.  See this feature under "Tools->Auto Charts" in Designer.  Graphs created in Designer can be saved as ".sgd" files, which can be run in batch from the Program Editor using the SGDESIGN Procedure with the same data or a different compatible data set.

Designer is useful for the following audiences:

  • SAS Analysts who want to make graphs without having to learn any graph syntax.
  • SAS Graph programmers who want to rapidly prototype graphs.
  • SAS Graph programmers who want to learn GTL.
  • SAS users who want to get bulk graphs based on selected variables.

Designer is very popular at conferences and user group meetings, and users love it.  However, most users are unaware of its existence.  I encourage you to take it for a spin.

Post a Comment

Financial graphs

Browsing on the web, I ran into a simple but visually interesting graph of financial data.  Really, it could be any data, but this one showed up under "Financial Graphs".  I thought this would give me an opportunity to speak about an interesting new feature added to SERIES plot with SAS 9.40M3 - Arrowheads.

financialThe graph I saw on the web at this site is shown on the right.  The illustration uses some 3D effects as can be seen from the bars and lines.  However, these effects are not consistent. The key item that caught my eye was the arrowhead at the end of each line, indicating a direction.

Now, clearly, ODS Graphics and SGPLOT are designed for presentation of data without "Chart Junk" as defined by Tufte, and with minimal ink.  However, it is possible to embellish the graph a bit to get effects as shown in the graph on the right, without compromising the integrity of the illustration.

seriesarrow1aThe first step is a simple graph with overlay of three plots as shown on the right.  Note the data I use is simulated, and may not match up exactly.  Note the use of the new options ARROWHEADPOS and ARROWHEADSHAPE.  Arrowheads can be displayed at either or both ends of the series.

proc sgplot data=financial noautolegend;
  vbarbasic cat / response=res1
dataskin=pressed;

  series x=cat y=res2 / arrowheadpos=end
        dataskin=pressed arrowheadshape=filled
        lineattrs=(thickness=10 color=blue);
  series x=cat y=res3 / arrowheadpos=end
       dataskin=pressed arrowheadshape=filled
       lineattrs=(thickness=10 color=red);
run;

seriesarrow2aNow, let us clean this up a bit, remove the axes and the wall border and set offset to zero.  We add the markers to the series.  This is supported as part of the SERIES statement.  Markers can be turned on, and also we ask for filled+outlined markers, and set the inside color to white.

proc sgplot data=financial
noborder noautolegend;
vbarbasic cat / response=res1
dataskin=pressed
baselineattrs=(thickness=0);
series x=cat y=res2 / dataskin=pressed
arrowheadpos=end arrowheadshape=filled
lineattrs=(thickness=10 color=blue)
     markers filledoutlinedmarkers
     markerattrs=(symbol=circlefilled size=14)
     markerfillattrs=(color=white);
series x=cat y=res3 / dataskin=pressed
arrowheadpos=end arrowheadshape=filled
lineattrs=(thickness=10 color=red)
     markers filledoutlinedmarkers
     markerattrs=(symbol=circlefilled size=14)
     markerfillattrs=(color=white);
  xaxis display=none;
  yaxis display=none offsetmin=0;
run;

seriesarrow3aThe above code gets us mostly there, but note there is marker at the end of the series at the arrowhead in the graph above.  We do not want this last marker.  So, we create another column in the data for plotting markers separately.  In this column, the values are same as the response values for the two line plots, but the last value is set to missing.

Now, instead of using the marker options on the SERIES plot, we use separate SCATTER plots to display the markers using these new columns in the data.  I adjusted the colors of the lines a bit and made the bars silver using the "gloss" skin.

proc sgplot data=financial noborder noautolegend;
  vbarbasic cat / response=res1 dataskin=gloss fillattrs=(color=silver)
              baselineattrs=(thickness=0) barwidth=0.7;
  series x=cat y=res2 / arrowheadpos=end arrowheadshape=barbed
              lineattrs=(thickness=10 color=cx3f8fdf) dataskin=sheen;
  scatter x=cat y=res2m /
              markerattrs=(symbol=circlefilled size=14)
              filledoutlinedmarkers markerfillattrs=(color=white);
  series x=cat y=res3 / arrowheadpos=end arrowheadshape=barbed
             lineattrs=(thickness=10 color=cxdf2f3f) dataskin=sheen;
  scatter x=cat y=res3m /
             markerattrs=(symbol=circlefilled size=14)
             filledoutlinedmarkers markerfillattrs=(color=white);
  xaxis display=none;
  yaxis display=none offsetmin=0;
run;

I avoided the urge to go all the way by adding a background image to show a shaded horizon.  This can be done using SG Annotation.

By default, SGPLOT will create displays that follow the principles of "effective graphcis" for delivery of information.  However, with a few tweaks you can spruce up the display when appropriate.

SAS 9.403 SGPLOT Code:   financial

Post a Comment

Getting started with SGPLOT - Part 3 - VBOX

This is the 3rd installment of the Getting Started series, and the audience is the user who is new to the SG Procedures.  Experienced users may also find some useful nuggets here.

box_key_sgThe Tukey box plot is popular among statisticians for viewing the distribution of an analysis variable with or without classifiers.  The figure on the right is from the SGPLOT Box Plot documentation showing all the features of the box.

The code shown below creates the simplest box plot graph which displays the distribution of the analysis variable Cholesterol.

title 'Distribution of Cholesterol';
proc sgplot data=sashelp.heart;
  vbox cholesterol;
run;

vbox1The graph on the right shows the results of the procedure step above and displays a box for the variable Cholesterol.  The display includes a box spanning the Q1-Q3 inter-quartile range, with a line drawn at the median value.  A marker is used to display the mean value.  Whiskers are drawn to the observation nearest to the "Fence" as defined in the doc mentioned above, and "outlier" observations are displayed above and below the fences.   See the online documentation for the GTL Box Plot for all the details of the various statistics that are displayed.

Box Plot by Category:  The code below creates a box plot graph by a category variable - DeathCause.  Note, we have used the XAXIS statement to remove the display of the label name on the axis.

vboxbycattitle 'Distribution of Cholesterol by Death Cause';
proc sgplot data=sashelp.heart;
  vbox cholesterol / category=deathcause;
  xaxis display=(nolabel);
run;

The graph on the right displays the distribution of the cholesterol values by death cause.  Note, by default the graph will try to split long axis tick values at the "white space" in the value.

vboxbycatconnectConnect:  A connect line is drawn connecting the mean statistic across the categories using the CONNECT=mean option.  The connect line can connect any statistic like mean, median, Q1, Q3 etc.

For this graph, we have also simplified the layout by dropping the frame border of the wall, the axis lines, and added y-axis grids.  This presents the data in an alternative visual manner that reduces clutter and is pleasing to the eye.  A DATASKIN is set for visual effect.

title 'Distribution of Cholesterol by Death Cause';
proc sgplot data=sashelp.heart noborder;
  vbox cholesterol / category=deathcause
            connect=mean fillattrs=graphdata3
            dataskin=gloss;
  xaxis display=(noline nolabel noticks);
  yaxis display=(noline noticks nolabel) grid;
run;

vboxbycatgroup_2Grouped Box Plot:  One additional classifier can be added - GROUP.  The graph on the right displays the distribution of Cholesterol by death cause and sex.  This is a common graph type useful in the Clinical Research domain where we want to view the results by category and treatment.

title 'Distribution of Cholesterol by Death Cause';
proc sgplot data=sashelp.heart noborder;
  vbox cholesterol / category=deathcause
          group=sex clusterwidth=0.5
         boxwidth=0.8 meanattrs=(size=5)
         outlierattrs=(size=5);
  xaxis display=(noline nolabel noticks);
  yaxis display=(noline noticks nolabel) grid;
run;

Cluster width can be set to make the cluster of boxes for each category tighter.  Here we have set CLUSTERWIDTH=0.5, making the boxes for each category are more tightly packed.  BOXWIDTH can also be used to make the individual boxed narrower or wider.  BOXWIDTH=1 will make the boxes within each cluster touch.  Attributes for the mean marker and outlier markers can be set using the appropriate ATTRS option.

vboxbycatnotchNotches:  Notches can be displayed by using the NOTCH option.  The graph on the right shows the result of the program shown below.  Notches are displayed and the box width is reduced to 20% of the available spacing.  The whisker cap is removed by setting CAPSHAPE.

title 'Distribution of Cholesterol by Death Cause';
proc sgplot data=sashelp.heart noborder;
  vbox cholesterol / category=deathcause
            boxwidth=0.2 meanattrs=(size=6)
            notches capshape=none ;
  xaxis display=(noline nolabel noticks);
  yaxis display=(noline noticks nolabel) grid;
run;

vboxbycatpctWhisker Percentile:  The graph on the right shows how to control the whisker percentile.  This is popular option requested by many users.  WHISKER=value (0-25) can be used to set the length of the whisker as a percentile.  WHISKER=1 creates a graph with 99% Whisker percentile.

By default, the box plot makes the category axis discrete.  This happens even if the category variable is numeric or time.  There are many cases where we want to see the distribution of some variable by a numeric x variable, such as weeks or over time.  In such cases, we want the boxes to be positioned on the x-axis with the correct scale.  This is supported and can be done by setting TYPE=LINEAR on the x-axis.  We will discuss this in more detail in a subsequent article.

Full SAS Code: getting_started_3_vbox_3

Post a Comment

Getting started with SGPLOT - Index

Index of articles on "Getting Started with SGPLOT Procedure".

  1.   Getting Started with SGPLOT - Part 1 - Scatter Plot.
  2.   Getting Started with SGPLOT - Part 2 - VBAR.
  3.   Getting Started with SGPLOT - Part 3 - VBOX.
Post a Comment

Mixing plots with different classification

One of the key benefits of creating graphs using GTL or SG Procedures is their support of plot layering to create complex graphs and layouts.  Most simple graphs can be created by a single plot statement like a Bar Chart.  Complex graphs can be created by layering appropriate plot statements to add the complexity needed like a Swimmer Plot.

When creating graphs with multiple VBAR statements, we sometimes run into a limitation on how VBAR statements can be layered.  In general, VBAR and VLINE statements can be layered only when all the layered statements have the same category variable.  If a group classification is in effect, all statements must have the same group variable.  So, it is not possible to layer VBAR and VLINE statements that have different category or group classification.

shoes_4Consider the example on the right.  This graph has a bar chart of Total Sales by Subsidiary for Canada.  Each subsidiary has multiple observations for the type of shoes sold, as seen by the table under the bar chart.

One would expect we could simply layer an XAXISTABLE with the VBAR statement to create such a graph.   The code is shown below.  Some options are thinned to fit.  See the linked code below for all the details.

title "Total Sales for &region by Subsidiary";
proc sgplot data=sashelp.shoes;
  vbar subsidiary /response=sales;
  xaxistable sales / x=subsidiary class=product;
  xaxis display=(nolabel noline);
  yaxis display=(noline) grid;
run;

shoes_2Unfortunately, this will not produce the desired results.  The reason is that while the VBAR statement has only one classifier (subsidiary), the XAXISTABLE has two classifiers, x=subsidiary and class=product. Each bar shows the summarized value of sales as one bar per subsidiary.  When you submit the code above, you will get the following warning in the log, and the graph on the right is produced.

WARNING: The CLASS option is ignored when the axis table is used with bar charts, line charts, or dot plots. The GROUP option from these charts is used as the CLASS variable for the axis table.

The CLASS option on the XAXISTABLE is ignored by the procedure, so the table has only one row of data, which is the summarized value for each bar.  Also, "x=subsidiary" is not required for the axis table as it is default.  I use it here for clarity.  I also used a different color for the graph just for variety.

So, how do we get around this to create the graph shown at the top?

The VBARBASIC statement was released with SAS 9.40M3 to address such use cases.  The VBAR statement does its own data processing to support additional features.  This requires that all layers used with VBAR have the same set of classifiers.  But the underlying GTL BarChart statement does not have such restrictions.  So, we decided to surface a way directly to the GTL BarChart using the VBARBASIC statement.  VBARBASIC can still summarize the data by subsidiary and does not have any restrictions on layering with other statements with different classifications.  The code is shown below.  Some options are thinned to fit.

shoes_4title "Total Sales for &region by Subsidiary";
proc sgplot data=shoes;
  vbarbasic subsidiary /response=sales;
  xaxistable sales / x=subsidiary class=product;
  xaxis display=(nolabel noline);
  yaxis display=(noline) grid;
run;

Now, the VBarBasic statement displays the bars summarized by subsidiary, while the XAxisTable can display the detailed information by subsidiary and product.

productAnother example of layering plot statements with different classifiers is shown on the right.  Here, we have displayed the summarized Actual Sales by Product shown by the blue bars.  On that, we have overlaid a graph of the Actual Sales by Product and Quarter.  This is possible using the VBARBASIC statement.  All the values by quarter add up to the total shown by the blue bar.  This allows us to compare across products, and by quarter within each product.

title "Actual Sales";
proc sgplot data=sashelp.prdsale noborder nocycleattrs;
  vbarbasic product / response=actual;
  vbarbasic product /response=actual group=quarter
          groupdisplay=cluster dataskin=matte;
  xaxis display=(nolabel noline);
  yaxis display=(noline) grid;
run;

Full SAS 9.4M3 Code:  vbarbasic

Post a Comment

Getting started with SGPLOT - Part 2 - VBAR

This is the 2nd installment of the "Getting Started" series, and the audience is the user who is new to the SG Procedures. It is quite possible that an experienced users may also find some useful nuggets here.

One of the most popular and useful graph types is the Bar Chart.  The SGPLOT procedure supports many types of bar charts, each suitable for some specific use case.  Today, we will discuss the most common type, the venerable VBAR statement.  In this article I will show you many small examples of bar charts with increasing information.

barchartfreqLet us start with the most basic case, as shown on the right.  This graph shows the frequency or counts by category with default settings.  Click on the graph for a higher resolution image.  The SGPLOT code needed to create is very simple, as shown below.

title 'Counts by Type';
proc sgplot data=sashelp.cars;
  vbar type;
run;

The graph above is rendered to the LISTING destination with default style and default setting for the axes.

barchartrespThe graph on the right shows the mean of city mileage by type.  The title already mentions "Mileage by Type", so there is no need to repeat that information as the label of the x-axis.  The label is suppressed by the x-axis option.

title 'Mileage by Type';
proc sgplot data=sashelp.cars;
  vbar type / response=mpg_city stat=mean
           barwidth=0.6 fillattrs=graphdata2;
  xaxis display=(nolabel);
run;

Note, we have specified RESPONSE=mpg_city, with STAT=MEAN.  This has to be set as the default STAT is SUM, and there is no point in viewing the sum of the mileage of all cars of one type.  Also, we have set BARWIDTH=0.6 and set the bar attributes to GRAPHDATA2 for a change of pace.

barchartresperrorNext, we create a bar chart of mean mileage by type, with display of the 95% confidence limits.  A legend is automatically created by the procedure to display the two items in the graph.  Also note, I have used GRAPHDATA4 for the bar attributes, and removed the display of the baseline to clean up the display.

title 'Mileage by Type';
proc sgplot data=sashelp.cars;
  vbar type / response=mpg_city stat=mean
            barwidth=0.6
            fillattrs=graphdata4 limits=both
            baselineattrs=(thickness=0);
  xaxis display=(nolabel);
run;

barchartresplabelThe graph on the right shows the mean mileage by type, using options to create a different look and feel.  We have also displayed the response value for each bar at the top.  A decorative skin is used to make the bars aesthetically pleasing using DATASKIN=matte.

In this graph I have suppressed the border around the data area.  The axis lines and ticks are removed and y-axis grids are added.  This results in a clean graph as shown on the right.  Click on the graph for a higher resolution image.

title 'Mileage by Type';
proc sgplot data=sashelp.cars noborder;
  format mpg_city 4.1;
  vbar type / response=mpg_city stat=mean
           datalabel dataskin=matte
           baselineattrs=(thickness=0)
           fillattrs=(color=&softgreen);
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline noticks) grid;
run;

barchartstackNow, let us add a group classifier using the GROUP=variable option.  The SGPLOT procedure summarizes the response data by category and group.  Values for each group are stacked for each category, creating a stacked bar chart as shown on the right.

title 'Sales by Type and Quarter for 1994';
proc sgplot data=sashelp.prdsale(where=(year=1994)) noborder;
  format actual dollar8.0;
  vbar product / response=actual stat=sum
           group=quarter seglabel datalabel
          baselineattrs=(thickness=0)
          outlineattrs=(color=cx3f3f3f);
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline noticks) grid;
run;

A stacked bar chart makes sense with STAT=SUM (default).  Now the bar height is the sum of all the observations for the category.  By default, SGPLOT stacks the segments for each group in a category.  Note, with SAS 9.4, the segments can be labeled with the value of each segment, and the bar itself can also be labeled with the total value for each bar.  Note, a legend showing the color used for each unique value of the group variable is shown.

barchartclusterAnother useful graph is shown on the right.  Here, we have used GROUPDISPLAY=CLUSTER which places the groups side-by-side within each category.  A group legend is displayed by default.

title 'Sales by Type and Year';
proc sgplot data=sashelp.prdsale noborder;
  vbar product / response=actual
          group=year groupdisplay=cluster
         dataskin=pressed
         baselineattrs=(thickness=0);
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline) grid;
run;

barchartclustergradientBar values can be shown for each group in a category, as shown on the right.  Note, the values are automatically rotated to a vertical orientation when the values will not fit in the space available.

Note the use of the STYLEATTRS statement to set the fill colors for the two group values to gold and olive.  This statement allows to control the attributes for the group values for fill colors, contrast colors, marker symbols and line patterns.  Also, note the use of FILLTYPE=Gradient to color the bars in an alpha gradient, from fully saturated at the top, to transparent at the bottom.

title 'Sales by Type and Year';
proc sgplot data=sashelp.prdsale noborder;
  styleattrs datacolors=(gold olive);
  vbar product / response=actual  
           group=year groupdisplay=cluster
          dataskin=pressed baselineattrs=(thickness=0)
          filltype=gradient datalabel;
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline) grid;
run;

You may have noted that the VBAR statement supports only one GROUP role, which can then be displayed as STACKED or CLUSTERED.  SGPLOT does not support a bar chart that has both a CLUSTER and a STACK group like the SAS/GRAPH GCHART statement.  Creating such a graph requires some complex layout of the category axis, and a decision was made to avoid such complex axis layouts as this combination is relatively rare.

barchartclusterstackBut, what to do if you do need a stacked + clustered bar chart?  The solution is to use the SGPANEL procedure as shown below.  The resulting graph is shown on the right.  Here we have a bar chart of actual sales by type, year and quarter.  The year values are side-by-side and the quarter values are stacked.

The SGPANEL procedure below uses the panel variable of product.  So, each "cluster" is really a cell in the panel.  Each cell contains a stacked bar chart with category of year and group=quarter.  Normally, the cell header is at the top of each cell, with a header border.  Here, we have moved the header to the bottom of the graph, and suppressed the cell borders, thus making the graph appear like a stacked+clustered bar chart.  Note use of COLAXIS instead of XAXIS and ROWAXIS instead of YAXIS.

title 'Sales by Type, Year and Quarter';
proc sgpanel data=sashelp.prdsale;
  styleattrs datacolors=(gold olive &softgreen silver);
  panelby product / onepanel rows=1 noborder layout=columnlattice
                 noheaderborder novarname colheaderpos=bottom;
  vbar year / response=actual stat=sum group=quarter barwidth=1
           dataskin=pressed baselineattrs=(thickness=0) filltype=gradient;
  colaxis display=(nolabel noline noticks) valueattrs=(size=7);
  rowaxis display=(noline nolabel noticks) grid;
run;

For all the examples above, the data contains one or more classifier variables with one response variable.  This is what is sometimes referred to as a "Tall" structure.  But often, the data structure is "Wide", like in an Excel table, with multiple response columns by category.

barchartoverlayIn such a case, it is possible to create a clustered bar chart without transforming the data, by layering the data for each column as shown on the right.  Here, we have layered two bar VBAR statements, one for mpg_city and one for mpg_highway, both for the same category variable.  Normally, the second layers would cover the first, but we have made the 2nd layer bars narrower, so we can see both.

title 'Mileage by Type';
proc sgplot data=sashelp.cars noborder;
  styleattrs datacolors=(olive gold);
  vbar type / response=mpg_city stat=mean
           dataskin=pressed baselineattrs=(thickness=0) ;
  vbar type / response=mpg_highway stat=mean
          dataskin=pressed baselineattrs=(thickness=0)
         barwidth=0.5;
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline) grid;
run;

barchartoverlayoffset_2Finally, the bars need not be overlayed on category centers, but can be "offset" to be side-by-side, or even a bit overlapped as shown on the right.  Here the bar widths are 0.6, and each VBAR is offset to left or right by 0.1, creating overlapping bars.

title 'Mileage by Type';
proc sgplot data=sashelp.cars noborder;
  styleattrs datacolors=(brown olive);
  vbar type / response=mpg_highway stat=mean
           dataskin=pressed barwidth=0.6 
           baselineattrs=(thickness=0)
           discreteoffset=-0.1;
  vbar type / response=mpg_city stat=mean
          dataskin=pressed barwidth=0.6 
          baselineattrs=(thickness=0)
          discreteoffset= 0.1;
  xaxis display=(nolabel noline noticks);
  yaxis display=(noline) grid;
run;

There is one restrictioin when layering multiple VBAR statements.  The category variables for all VBAR statements must be the same.  If a group is specified, it must be specified for all the VBAR statements in the same way.  If this is not the case, the program will stop with an error message in the log.  There are other ways to handle such cases that will be discussed later.

These examples give you an idea of the versatility of the SGPLOT VBAR statement.  You can create bar charts from the simplest to complex and with different aesthetic appearance.  I would encourage you to see other examples in this blog on creating bar charts with SGPLOT procedure.

Full code:  getting_started_2_vbar

Post a Comment