A frequent question we get from users is how to create a box plot with custom whiskers lengths. Some want to plot the 10th and 90th percentile, while other want the 5th and 95th percentiles. The VBOX statement in the SGPLOT procedure does not provide for custom whiskers. Also, unlike GTL, there is no parametric box plot statement, where you can provide your own statistics.
Here is a standard VBOX of mileage by Type grouped by Origin using the SGPLOT procedure.
proc sgplot data=sashelp.cars(where=(type ne 'Hybrid')); vbox mpg_city / category=type group=origin grouporder=ascending; yaxis grid; xaxis display=(nolabel); run;
How can we create a custom box plot with 10th and 90th percentile whiskers? With SAS 9.3, we have a way to create a parametric box plot using the new HIGHLOW plot statement.
First we have to run the MEANS procedure to obtain the necessary statistics for mileage by Type and Origin as follows:
proc means data=sashelp.cars(where=(type ne 'Hybrid')) noprint; class type origin; var mpg_city; output out=CarsMeanMileage mean=Mean median=Median q1=Q1 q3=Q3 p10=P10 p90=P90; run; data CarsMeanMileage; set CarsMeanMileage(where=(_type_ eq 3)); drop _type_ _freq_; run;
The HIGHLOW plot statement comes in two flavors: TYPE=LINE (default) and TYPE=BAR. The first creates a floating line from low to high, and the second creates a floating bar from low to high. We will use a combination of these to create the graph:
SAS 9.3 SGPLOT Program:
proc sgplot data=CarsMeanMileage nocycleattrs; highlow x=type high=p90 low=p10 / group=origin groupdisplay=cluster clusterwidth=0.7; highlow x=type high=q3 low=median / group=origin type=bar groupdisplay=cluster grouporder=ascending clusterwidth=0.7 barwidth=0.7 name='a'; highlow x=type high=median low=q1 / group=origin type=bar groupdisplay=cluster grouporder=ascending clusterwidth=0.7 barwidth=0.7; scatter x=type y=mean / group=origin groupdisplay=cluster grouporder=ascending clusterwidth=0.7 markerattrs=(size=9); keylegend 'a'; yaxis grid; xaxis display=(nolabel); run;
Here are the details of this program:
- The first high low plot of type=line (default) plots the whisker from P10 to P90.
- The second high low plot of type=bar draws the upper quartile.
- The third high low plot of type=bar draws the lower quartile.
- The scatter plot draws the mean marker.
- This graph looks very similar to the standard VBOX except for the whiskers and outliers.
Since this graph is made up of all "Basic" plots, we can overlay any other basic plot we may want to display other features. In this example, we have added the display of the mean value above each mean marker.
In this example, we lightened the fill color by making it 50% transparent. So, have to use two highlow line plots, one from P90 to Q3 and one from Q1 to p10. Then, we added a label to show the value of the mean in each box. The code is shown in the program file attached.
Finally, here is another sneak preview of a SAS 9.4 feature: Jittering. We have received many requests on this topic so jittering will be supported with SAS 9.4. In the example below, I have created a custom box plot using the technique above, and then added display of all the values using jittering. To do this, I have to merge the summary data with the original data. I will write up a detailed article with the code once SAS 9.4 is released.
Markers are jittered on the category axis (in this case horizontal) when their Y value is within the tolerance level. Darker regions indicate more markers. The "Mean" value is shown with a square marker.
Full SAS 9.3 Code: BoxParm