PharmaSUG-China 2014

PharmaSUG-ChinaThe Third PharmaSUG-China conference was held in Beijing last week, and I had the pleasure to attend this excellent conference along with a record number of attendees.

On Thursday, I presented two 1/2 day seminars on ODS Graphics.  One titled "Advanced Topics in GTL" and another titled "Complex Clinical Graphs using SAS".  The attendees were eager to learn and the sessions included much discussion, which is always a lot of fun.

The opening session included a presentation of using JMP Clinical for analysis of clinical data. DemographicsThe presentation included a graph of Study Demographics.  Later in the afternoon, I thought it would be appropriate to create the same graph in my presentation on ODS Graphics Designer. The graph is shown on the right.

Friday and Saturday were filled with many presentations on interesting topics in the Programming Techniques and Coder's Corner sections, especially from a graphics perspective.  Conference proceedings are now available.

GrowthThe afternoon also included an excellent presentation on the Essentials of PDV by Arthur Li and Napoleon Plot by Kriss Harris.    Unfortunately, the papers are not available on the proceedings page at this time.

Rajesh Moorakonda, Singapore Clinical Research Institute presented a paper on Monitoring Child Growth that included graphs that plot the anthropometric parameters on a growth chart using the GPLOT procedure as shown on the right.

Anno_SurvivalThe Saturday session included the "Coder's Corner" section which included many interesting papers including a fair share of papers on graphics techniques.

In my presentation titled "Annotate Your SGPLOT Graphs" I presented the basic techniques for annotating an SGPLOT graph using the SGAnnotation data set.   I demonstrated how to add a table of subjects at risk by class to a survival plot.  The paper contains the details on how to make this graph.

Cancer_HeatMapDebpriya Sarker of SAS Institute Pune, presented his paper on "Plotting Against Cancer:  Creating Oncology Plots using SAS".  This paper included the techniques for creating many graphs used in the analysis of data for Oncology, such as the HeatMap depicting correlations for Genes and Drugs.

Huashan Huo, PPD Beijing  presented the paper on "Using SAS SG Procedures to Create and Enhance Figures in Pharmaceutical Industry".  This paper included multiple graphs created using ODS Graphics Designer, GTL and SG Procedures, including the graph of Median of Lipid Profile over time, where the authors added alternate vertical bands to clearly indicate the results for specific days of the study.



Great_Wall_SanjayPresentations were done on "I am Legend" by Kriss Harris showing ways to create a stand alone legend for cases where the legend can get too big to fit in a graph and "Programming Figures beyond SGPLOT and GTL" where the author showed ways to create graphs beyond what can be directly created using SGPLOT or GTL plot statements.  Unfortunately, the papers for these are not available on the web page.

Beijing afforded a great venue for the conference.  A bustling city of historical and modern elements, it provides numerous attractions, ranging from the 2000 year old Great Wall to the majestic Forbidden City to the ultra-modern National Center for the Performing Arts.



Post a Comment

Binary Response Graph

Often we need to plot the response values for binary cases of a classifier.  The graph below is created to simulate one seen at web site of the shock index for subjects with or without a pulmonary embolism.  In this case, the data is simulated for illustration purposes only.

PulmonaryBox_93There are two levels for the classifier for presence of pulmonary embolism, "Absent" and "Present". The response values are plotted as a box plot.  I call this graph the "Binary Response Graph" as I could not find the common name for such a graph.  I would be happy if someone can provide the industry standard name for such a graph.

SAS 9.3 code for box plot:

proc sgplot data=Pulmonary;
  vbox shock / category=pulmonary boxwidth=0.2 fillattrs=(color=lightblue);
  yaxis display=(noticks nolabel noline) min=0 max=2 grid;

Note in the graph, the two class values "Absent" and "Present" are placed on the x axis with an offset of 1/2 the midpoint spacing on each side on the axis.  This is the standard placement of category (aka midpoint) values along a discrete axis for plots like Bar Charts, Box Plots and so on.

PulmonaryScatter_93Now, let us plot the mean, the 5th and the 95th percentile for the same data using the scatter plot.  I used the MEANS procedure to compute the mean, P5 and P95 values to create the data set for the graph shown on the right.  Note, something different happened here with the placement of the category values on the x axis.

Aside:  In this graph I have used two scatter plots just to simulate the filled and outlined mean marker. With SAS 9.4, this can be done with an option.  Click on the graph for a high resolution image.

SAS 9.3 code for scatter plot:

proc sgplot data=Pulmonary;
  scatter x=pulmonary y=mean / yerrorlower=p5 yerrorupper=p95 
          markerattrs=(symbol=circlefilled color=black);
  scatter x=pulmonary y=mean / 
          markerattrs=(symbol=circlefilled color=lightblue size=6);
  yaxis display=(noticks nolabel noline) min=0 max=2 grid;

In the graph above, the category values are displayed at the ends of the axis, with an offset of half the size of the marker at each end of the axis.  This is the standard behavior of the scatter plot on any type of axis.  Setting x axis Type=Discrete does not make any difference.  While we noticed this behavior, we could not change it because the scatter plot is the most extensively used plot type and such a change would create too many problems for many graphs.

However, in such cases, it is often desirable to get the discrete axis behavior similar to the first graph shown above.  How can we get that?  Well, as usual, there are multiple (simple) ways to get the result we want.

PulmonaryScatterHighLow_93First, recall we can (and are) using layers of plots to create the graph.   I can place a high low plot of the same data prior to the scatter plot.  The high low plot prefers a Bar Chart like category axis, and placing it first makes it the "Primary" plot, thus forcing the x axis to its liking and forcing other plots to follow its lead.

The high low plot also does not force a baseline of zero on the y axis, like the bar chart does.  So, it is the ideal choice in this case.  The low and high values of the high low plot are the same (mean), so a dot is drawn at this location that is overdrawn by the scatter marker. Note, the resulting graph is now the way we want as shown above.

SAS 9.3 code for scatter plot with high low:

proc sgplot data=Pulmonary;
  highlow x=pulmonary low=mean high=mean;
  scatter x=pulmonary y=mean / yerrorlower=p5 yerrorupper=p95 
          markerattrs=(symbol=circlefilled color=black);
  scatter x=pulmonary y=mean / 
          markerattrs=(symbol=circlefilled color=lightblue size=6);
  yaxis display=(noticks nolabel noline) min=0 max=2 grid;

PulmonaryScatterGroup_93Another way to achieve a similar result is to use a "dummy" group variable on the scatter plot with GroupDisplay=Cluster.  This forces the axis to what we want as shown on the right.

SAS 9.3 code for scatter plot with cluster group:

proc sgplot data=Pulmonary;
  scatter x=pulmonary y=mean / yerrorlower=p5 yerrorupper=p95 group=pulmonary
          groupdisplay=cluster markerattrs=graphdatadefault
  yaxis display=(noticks nolabel noline) min=0 max=2 grid;

Full SAS 9.3 code:  Pulmonary_93

Post a Comment

New Graphics Features in SAS 9.4M2 - Part 2

For far too long we have been using the venerable Scatter Plot to do the work of placing text strings in the graph.  For far too long we have used the Scatter Plot or the Block Plot to place axis aligned text in the graphs.   It is time to move on.

When we started down the ODS Graphics path over 10 years ago, little did we know how often we would need to do the above.  Almost every clinical graph needs text placed judiciously in the graph.  With SAS 9.4, we released the Axis Table to simplify the task of placing axis aligned text.  Now with SAS 9.4M2, we release the TEXTPLOT for general purpose text placement in a graph.

The Text Plot renders text in the graph in various different ways.  Freed from the Scatter plot, we can specialize this plot to render text in ways that did not make sense with the scatter plot.  Here is the basic syntax:

textplot x=var y=var text=var / group=var colorresponse=var sizeresponse=var;

TextPlotThis new statement makes it possible to create graphs with text alone, or add text in different ways to your graph.  Here are some examples.

Simple text plot:  In this case, we use the basic options on the text plot to display the name of each person in the class data set positioned by Height and Age classified by the variable 'Sex'.

Size_RespText Plot with Size and Color Response: In this example, the font size of the name of each person is proportional to the values in the variable used for the Size Response role.  The color of the text string is determined by the Color Response role.  In this case, both size and color are determined by the same variable "Weight".  Click on the graph for a higher resolution image.  You will also notice in the larger version that the text has a soft "backlight".  This helps in discerning text that has a color close to the background color, like the yellow text.

BMI_CurvesBut the Text Plot goes beyond such features to support rotated text, aligned to the 9 compass directions as shown in the graph on the right.  In this case, we have displayed the standard BMI curves as bands, and want to label them along the top.  Using horizontal text can be a problem for narrow bands.  So, in this case I have specified an angle of rotation independently for each string in the column.

To render rotated text, you can specify an angle of rotation in degrees for each string separately.  This works quite well in most cases, but in this case it can be a problem as the slope of the curve can change based on the aspect ratio of the graph.  So, specifying the angle in data coordinates instead of screen coordinates may work better.  We will be sure to add an option to do that soon.

BMI_NamesClearly, you can overlay markers on the BMI curves to display the values for each subject in a study.  One could use the scatter plot to display the value for each subject, but here we have use the Text Plot itself.

Finally, certain things are harder to do, where you need to know the exact dimensions of the text being rendered. For example, I was attempting to see how far I can get creating a "Word Cloud" graph using the Text plot.  I can size and color each string by a response value based on some statistic (say number of occurrences), and place a string where I want.  But, only the Java rendering code knows the exact string box sizes, which vary for each string.  I cannot know where a string ends for proportional fonts to exactly position the next string.

WordCloudGCalligraphyCRBottomAs an exploration of what could be possible, we created a feedback mechanism to allow the user to know the exact size of a text string (for any given font, weight, size or style).  The renderer can write this information to a file on disk, which can be read back by the user.  Now, using a two pass process, you can create a perfect word cloud yourself as shown on the right.

In the example on the right, I first rendered all the text strings with the correct size, font and style, but all at (x, y) = (0, 0).  We added a mechanism (still under development) to write the actual bounding box of each string in pixel and data space into a csv file.  I read back this information using proc import, and merged the text box information with the original data.  Now, I ran a data step to position each string in sequence, wrapping to the next line when I have reached the end of the data space.

WordCloudCalligraphicThe benefit here is you can implement your own specific algorithm to lay out the strings once you know their exact dimensions.  Instead of a linear word cloud, you could do a circular layout, starting from the middle.  Or, turn the text sideways to fit them closer like some of the examples on the web.  Here is the same data (with different size values) as a grouped word cloud.

So, we are looking for some feedback from you.  Do you see use cases in your work where you could use this information to layout strings exactly where you need? Would knowing the exact pixel dimensions of something rendered in a graph help you control some aspects of the graph?   Please chime in with your opinions to help us determine if such a "feedback loop" could be useful and how you could leverage it.

SAS 9.4M2 Text Plot Code:  TextPlot

Post a Comment

New Graphics Features in SAS 9.4M2 - Part 1

SAS 9.4 maintenance release M2 was released early in August.  This release contains some exciting new features in GTL and SG Procedures.  In this article, I will describe some of the new options added to the existing plot statements.  Note, I will use the SG examples here, but these are also available in GTL.

Color_Response_SkinBubble Plot with Response Color:  The bubble plot now supports color response role, so you can color each bubble by a color that represents the level as shown in the graph on the right.  A gradient legend is also displayed by default.  Bubble labels can be displayed at one of the nine locations around the bubble.  Color response role is also supported for Scatter and Polygon plot.  In addition, a new GradLegend statement is added to display the color gradient.

The bubble plot supports both discrete and interval axis for both X and Y axis.  In this case, both the size and the bubble color are based on the frequency variable N, previously computed using the MEANS procedure.  Data skin is used to enhance the bubble rendering.

Box_WhiskerBox Whisker Percentile:  This is a feature often requested by users to control the percentile of the whiskers in the box plots.  A new option WhiskerPct=number is supported, where number is between 0 and 25.  The graph on the left uses WhiskerPct=1, thus displaying the 1st and 99th percentile whiskers.  Jittered markers can be overlaid on a box plot, and a new JitterWidth option is added to control the spread of the jittered markers.

VBarParm with Stacked Groups:  Stacked groups were previously disallowed in SGPLOT due to the complexity of getting uniform axes across BY variables.  Now, this has been added.

VBarParm_StackedSegment labels are also added both to VBar/HBar and VBarParm/HBarParm.  This allows us to display the labels for each segment of the bar (stacked or clustered) in the center of the bar.  DataLabels can still be displayed at the top.  SegmentLabelFitPolicy allows you to control whether the labels are clipped if they do not fit in the space.  Here, the policy=none, so labels are all displayed.

Grouped Histograms and Fill Type:   Another user requested feature is the grouped histogram and density plots.  Earlier, you could create Comparative Histograms using overlay of two variables, like Systolic and Diastolic in the SASHELP.HEART data set.  Now, you can do the same if these are group levels in the data set.

Grouped_HistogramIn this example, I have reformatted the heart data into a "grouped" structure, using Type=Systolic / Diastolic, with the BP column keeping the value.  Now, I can set the role GROUP=Type to create the histogram shown on the right.

Note, I have taken the liberty of using the new FillType=Gradient option to get a different aesthetic look for the Histogram.  Normally, one does not use such an option for histograms, but I used it mainly to avoid making another example with bar chart.  The gradient is an Alpha gradient, not a color gradient, so more of the backdrop shows through near the bottom.  In this case, it allows the shorter bins of the other group to show through, as can be seen near X=100.  This type of gradient fill is available for histograms and bar charts.

Line color, Line Pattern, Marker color and Marker Symbol groups are now added to the Series Plot.  This is a useful feature as explained in earlier articles on Spaghetti Plots.   Many more little features and enhancements have been added, too numerous to enumerate here.  You can review these in the Online Doc for SAS 9.4 and maintenance releases.

In the next article, I will discuss the whole new statement called TextPlot that has been added to both GTL and SG.

 Full SAS 9.4M2 program for new features:  SAS_94M2_Features


Post a Comment

Histograms on Log Axis

Often there are questions from users on creating histogram using a Log X axis.  One such question came up this weekend, where a user wanted a histogram of her data using log axis.  Before we get into her specific case, let us first clarify what we may want to see when we say "Histogram on Log Axis".  This could mean one of these two cases:

1.  Histogram of the linear values, displayed on a log x axis.  This histogram has equal width bins in linear data space.  When displayed on a log axis, the bins are drawn with varying pixel width.

Histogram_Cars_Log_AxisUsing the data set, the first case on the right shows a histogram of the original data in linear space, on a LOG x axis.  Note, each bin represents the same amount of the data, but the widths of the bins in pixels are reducing as we go to the left due to the log axis.

proc sgplot;
  histogram mpg_city;
  xaxis type=log;

Histogram_Cars_Log_Data2.  Histogram of the transformed values.  A transformed variable is used instead of the original variable.  Now, each bin has equal pixel width,  representing the transformed data.

On the right is a graph of the log transformed data on a default axis.  First, we create a new data column logMpg=Log10(mpg_city).  Then use logMpg as the analysis variable for the histogram. This will create the graph shown here, where each bin is now has a constant pixel size.  Note the x axis tick values and axis label.

data cars;

proc sgplot data=cars;
  histogram logMpg / fillattrs=graphdata1

Histogram_Cars_Log_Data_2In the graph on the right we have replaced the x- axis values (log scale) with their respective untransformed linear values at equal spacing using the ValuesDisplay option.  A tick value is displayed at each value provided in the Values option, but the actual text displayed is from the valuesDisplay option.  Axis label is now "MPG".

proc sgplot data=cars;
  histogram logMpg / fillattrs=graphdata1;
  xaxis fitpolicy=none valueattrs=(size=7) values=(0.90 1 1.3 1.47 1.6 1.7 1.78 1.85 1.90 1.954 2)
               valuesdisplay=(" " "10" "20" "30" "40" "50" "60" "70" "80" "90" "100") label='MPG';

Discussing with experts here at SAS, the second and third graphs above have more practical uses instead for the first one.  Often users wan to see if their data, or some transform, has a normal distribution.  So, it is useful to view the histogram of the transformed data.  Then, subsequent processing can be done on the transformed data.  It turns out that the user who asked the original question about using the Log axis  also really wants the last case shown above.

Now, let us talk about using TYPE=LOG on the x axis.  Often, in this case when the data has a range of over 2 or 3 orders of magnitude, you may see a Note in the log saying:

NOTE: Log axis cannot support zero or negative values in the data range. The axis type will be changed to LINEAR.

This happens despite the fact at all the data is positive and is a bit confusing.  What is going on is that the histogram is computing the BinStart and BinWidth values internally, and the default numbers can cause the lower edge of the first bin to have a negative x value.  This is the reason for the warning if you then ask for TYPE=log.   If you must set the TYPE=Log and get the graph like the first one above, make sure your BinStart and BinWidth combination satisfy the following criteria:

         zero < BS-BW/2 < min value in data

Full SGPLOT program:  

Post a Comment

More on Spaghetti Plots

In her article Creating Spaghetti Plots Just got Easy, Lelia McConnell has provided us a glimpse into some new useful features in the SAS 9.4M2 release.  The term Spaghetti plots generally refers to cases where time series plots have to be  identified by multiple group classifications.  The support for the GroupLC and GroupLP options, among others make it easy to create such graphs.

The key point to note here is that the GROUP variable is used to decide which observations in the plot should be connected.  So, the group variable should provide the finest grain classification for the series in the plot.  Normally, each individual series is rendered using one of the GraphData elements from the style, providing unique attributes to each series.

Spaghetti_GTLHowever, if multiple series in the graph represent one specific value from another classifier, such as treatment or study, we can provide higher classification roles using the GroupLC  (GroupLineColor) or GroupLP (Group Line Pattern), etc., as shown in the graph on the right.  In this example, the simulated data represents the adoption rate over time for some item classified by location (color) and year (pattern) using the following code for the SERIES plot statement:

series x=x y=y / group=id lineattrs=(thickness=2 pattern=solid)
grouplc=Location grouplp=year smoothconnect;

This graph, along with other graphs using SAS 9.4M2 features are shown in the samples in the Graph Focus page on the SAS Support web site.  Now that SAS 9.4M2 is released (Aug 5), we will be adding more samples demonstrating the features at this location.

Many of you who do not yet have the SAS 9.4M2 release are asking, what does this do for me?  Well, there is good new.  While this feature has now been included in the SGPLOT procedure, it has always been available in GTL. Here is the GTL code you can use at SAS 9.4.  Note the use of the subpixel and itemsize options.

proc template;
  define statgraph MultiClassSeries;
    begingraph / subpixel=on;
      entrytitle 'Adoption Rate over Time by Location and Year';
      layout overlay / yaxisopts=(offsetmin=0.1);
        seriesplot x=x y=y / group=id name='a' lineattrs=(thickness=2) 
                             linecolorgroup=Location linepatterngroup=year;
        discretelegend 'a' / title='Location:' type=linecolor location=inside 
             valign=bottom halign=right;
        discretelegend 'a' / title='Year:' type=linepattern location=inside 
             valign=bottom halign=left itemsize=(linelength=30px);

You can also run this with SAS 9.3, except for the subpixel and itemsize option.  Remove those, and you are good to go.

SAS 9.4 GTL code:  Spaghetti

Post a Comment

Creating Spaghetti Plots Just Got Easy

This article is by guest contributor Lelia McConnell, SAS Tech Support.

Creating Spaghetti Plots Just Got Easy

Sample 38076: “Response by patient and treatment group” illustrates how to generate a spaghetti plot using the SGPLOT procedure.  Sample 40255: “Plot of study results by treatment group” illustrates how to generate a spaghetti plot with PROC SGPLOT prior to SAS® 9.4 TS1M2.  In both samples, a custom style template is necessary in order to get the desired results.

Beginning in SAS 9.4 TS1M2, spaghetti plots are easier than ever to create, thanks to the SGPLOT and SGPANEL procedures and new syntax that was added to the Graph Template Language (GTL).

First, let’s look at how to generate this graph using GTL.  The following options were added to the SERIESPLOT statement in SAS 9.4 TS1M2:

  •  LINECOLORGROUP= column|expression
  • LINEPATTERNGROUP= column|expression
  • MARKERSYMBOLGROUP= column|expression
  • MARKERCOLORGROUP= column|expression

DataIn addition, the TYPE= option was added to the DISCRETELEGEND statement, giving you the ability to include any or all of the grouping information in your legend.  Here are the values for the TYPE= option:


Let’s now generate a spaghetti plot using GTL.  Data sample is shown on the right.


Spaghetti_GTLproc template;
  define statgraph grouping;
      entrytitle 'Study Results by Treatment Group';
     layout overlay;
        seriesplot x=time y=results/ group=subject
               linecolorgroup=trt_group name='grouping';
        discretelegend 'grouping' / type=linecolor;

proc sgrender data=one template=grouping;

 In the code above, notice that a separate line is to be drawn for each value of the variable SUBJECT and that the line color is determined by the values of the variable TRT_GROUP.    This program can be found in Sample 52962: “Create a spaghetti plot with the Graph Template Language (GTL).”

Now let’s create this same graph using PROC SGPLOT.

The following options were added to the SERIES statement in PROC SGPLOT and PROC SGPANEL:

  • GROUPLC - equivalent to LINECOLORGROUP in GTL

In addition, the following values are now supported in the TYPE option in the KEYLEGEND statement:

  • FILL
  • LINE

Spaghetti_SGThe following sample code illustrates how to produce a spaghetti plot with PROC SGPLOT:

proc sgplot data=one;
  title 'Study Results by Treatment Group';
  series x=time y=results / group=subject
               grouplc=trt_group name='grouping';
  keylegend 'grouping' / type=linecolor;

This program can be found in Sample 52964: “Create a spaghetti plot with the SGPLOT procedure.”

Full SAS 9.4M2 code: Spaghetti  

Post a Comment

Epidemic Curve Graph

A few weeks back I wrote an article on Grouped Timeline for creating a stacked timeline for onset of different virus.  The idea in that article was to display a stacked needle on a time axis using a HighLow plot. Such graphs are also referred to as EPI or Epidemic Curve Graphs.

ByDate_93In that article, I restricted the weeks in the year for onset to 52, and plotted each value on the equivalent location on a time axis.  That all works fine, but really, an year will have 53 weeks for onsets as shown in the graph on the right. Gaps are shown where the data is missing.

The problem is that a start or end week of the year may have smaller number of days.   This causes the bars (with fixed width) for these weeks to be overlapped by neighboring weeks.  Click on the graph to see this in the higher resolution image.  You will see near "Jan 2014", the bar for week 53 of 2013 is overlapped by the bar for week 1 of 2014.  This will happen if the X axis is a real time axis, and week 53 has only 1 or 2 days in it.

ByYearWeek2_93Another way to address this is to draw a BAR graph by the YearWeek variable.  This variable is a combination of the year and week values so as to avoid values form the two different years from being consolidated into one bar, as shown on the right.

Such a graph is easier to make, as the bar chart already does stacked groups using GROUP=Virus.  The X axis is suppressed, and the week values are shown below each bar using another overlaid bar chart.  If you click on the graph for a higher resolution image, you will notice that in this case (as expected) the axis is discrete, and only the weeks that are present in the data are displayed, without gaps for the missing weeks.  A bar or a gap for week 16 is not displayed in the graph.

Let us see if we can get the best of both worlds.  First, let us create a data set that has all weeks in the data with missing response values for the frequency.  Then, we merge this with the actual data.  This ensures all weeks are present in the data and are represented in the graph either with data or a gap.

Virus_BarChartLabelBelow93SAS 9.3 version of this graph is shown on the right.  Click on the graph for a higher resolution image and you will see that all weeks are now represented, with gaps where there is no data.  Week 53 and week 1 are not overlapped, and can be seen distinctly.  However, it is clear that the axis is not a scaled time axis, but is discrete, so the 53 weeks will take up more space than a real year on a time axis.  Also, the 53rd week may have less number of days, but has the same width as all other bars.

Epidemic_GTL_94The final graph is created using SAS 9.4 GTL, and I have added some labeling to indicate the year for the data.  Click on the graph for a higher resolution view.  I believe this should be doable with SG, but I ran into an issue with bar labels that needs investigation.

I used a reference line with scatterplot markercharacter to display the boundary between the 2013 and 2014 data.

Epidemic_Block_GTL_94As usual with SG or GTL, there are other ways to display such demarcation as shown in the graph on the right.

SAS 9.3 program: EPI_93

SAS 9.4 program: EPI_94

Data: Test_dataset

Post a Comment

Legend Order in SGPLOT Procedure

This article is by guest contributor Lelia McConnell, SAS Tech Support.

Several users have called recently to ask the question, “Can I reorder the legend entries on the bar chart that I created with PROC SPLOT?”

Although there is no option that does this directly in PROC SGPLOT, the answer to this question is “YES, you can define the order of your legend entries.”

Graph_1In this post, I present an example that illustrates the syntax that you would use to define the order of your legend entries.  For this example, we begin by sub setting the data set SASHELP.CARS to include only observations in which TYPE is not equal to HYBRID.  To do this, we use the WHERE option in the SET statement, thus creating the data set CARS.

By bringing the CARS data set into PROC SGPLOT, I can create a vertical bar chart of the values of ORIGIN, where the height of the bars is based on the mean values of the variable MPG_CITY and the group variable is TYPE.

data cars;
  set ne 'Hybrid'));  
proc sgplot data=cars;
  vbar origin / response=mpg_city  group=type 
  groupdisplay=cluster stat=mean ;
  xaxis display=(nolabel);
  title 'Mileage by Origin and Type';

Instead of the default order of SUV, Sedan, Sports, Truck, and Wagon, I want the order of the legend entries to be Wagon, Sports, SUV, Truck, and Sedan.  To make this change, I create a numeric variable that contains the values 1-5, based on the order in which I want my vehicle types to be displayed.

I need to create a format in order to display the values of TYPE in the legend instead of 1-5.  The most efficient way to do this is to create a control data set that I can use with PROC FORMAT.  Since I need only one observation for each value of MYTYPE in this data set, I will sort the data by MYTYPE so that I can use the FIRST logic in the DATA step that follows.  In the DATA step, I need to create the columns FMTNAME, START, and LABEL.  These are used to define the format name, original value, and format values, respectively.  When you create a format, the automatic variable TYPE defines the variable as numeric or character, so we need to include the statement DROP TYPE to remove the variable TYPE from the control data set.

Graph_2The CNTLIN option in the PROC FORMAT statement specifies the SAS data set from which PROC FORMAT builds the format.

Now I can resubmit my original PROC SGPLOT code along with the FORMAT statement to create the legend in the correct order and the KEYLEGEND statement with the TITLE option in order to keep the original title in my legend.

data newcars;
set cars;
if type='Wagon' then mytype=1;
else if type='Sports' then mytype=2;
else if type='SUV' then mytype=3;
else if type='Truck' then mytype=4;
else if type='Sedan' then mytype=5;
proc sort data=newcars out=sortcars;
by mytype;
data myfmt;
set sortcars;
  by mytype;
  if first.mytype then do;
    drop type;
proc format cntlin=myfmt;
proc sgplot data=newcars;
   vbar origin / response=mpg_city group=mytype 
                 groupdisplay=cluster stat=mean;
   xaxis display=(nolabel);
   keylegend /title='Type';
   format mytype typefmt.;
   title 'Mileage by Origin and Type';

Full SAS 9.3 SGPLOT code:  Legend_93

Post a Comment

Overlay Bar Charts

A couple of days back, Rick Wicklin forwarded me a link to an article on the BadHessian Blog on creating a Bar Chart using six different freeware packages in R, Python and Julia.   The target bar chart was one produced by the Jetpack stat module with WordPress.  The graph is shown below.


The unique feature of this graph that had caught the eye of the author was the overlaying of two bar charts, one within the other.  The author's goal was to investigate the capabilities of other graphics packages to create a similar graph, such as R base graphics package, GGPLOT2, Python - Matplotlib, Python - Seaborn, Julie - Gadfly and Julia -

As users of SAS SG Procedures and GTL are aware, such graphs are very easy with the SGPLOT procedure, and examples of such graphs have been shown in this blog and in other places.  Here is the same graph created using the SGPLOT procedure.


SAS 9.3 SGPLOT program:

proc sgplot data=visits nowall noborder;
  styleattrs datacolors=(%rgbhex(140, 185, 202) %rgbhex(19, 85, 137));
  vbar month / response=views nostatlabel nooutline;
  vbar month / response=visitors nostatlabel barwidth=0.5 nooutline;
  keylegend / location=outside position=topright noborder valueattrs=(size=5);
  xaxis fitpolicy=thin display=(nolabel noticks) valueattrs=(size=6 color=gray);
  yaxis grid display=(noline noticks nolabel) valueattrs=(size=6  color=gray);

The %RGBHEX macro is supplied by Perry Watts, and converts a RGB value to CX color value.  It is included in the attached full code.  Many options used here are needed to make the graph visually similar to the original, and are not necessary if one was to accept the default settings for the procedure.  That would reduce the code by a large fraction.

The author of the post has set the X axis spacing of 5 months.  The reason for this is not clear, maybe it is to allow different months to be displayed.   For a discrete axis, SGPlot will try to show all the values on the axis, unless they don't fit cleanly.  Then, as in this case, the values are thinned symmetrically.  If the axis was numeric with a time format, you will get thinned axis tick values.

The author mentions a preference for the outer Y grid line (for Y=10000), and has made an extra effort to include this in the graphs.  For SGPLOT, the preferred default is to include a tick value outside the data range only if the extreme data point goes beyond 30% of the tick interval with inner ticks.  In this case, since the data does not seem to go very much past 8000, the tick value at 10000 is not shown by default.  This prevents wasteful white space outside the data.  Of course this can be changed to produce an outer tick value if a user really wants it using the Threshold option.

SGPLOT has a way to customize the tick values one wants to see on the discrete axis using TickValueList and TickDisplayList.  However it is clear we could use a simpler option to do this.  This can be useful when the discrete data has sequential numeric, time or some other predictable values.

Another noteworthy item in the SGPLOT graph is the outline on the color swatches in the legend.  This is done to allow swatches of very light color to be visible.  However, a case could be made to provide an option to suppress the outline to match the bar.

Users looking for a bit more aesthetic rendering can use skins and gradients without distorting the data as shown below.


For graphs with a smaller amount of data, it may be desirable (based on individual preference) to offset the two bars by a small amount to show overlapped bars.  This too is easily done with SGPLOT procedure by using the DiscreteOffset option as shown in the graph below.


Full SAS 9.3 Program:  Bars

Post a Comment