I've noticed that a lot of people want to be able to draw bar charts with confidence intervals. This topic is a frequent posting on the SAS/GRAPH and ODS Graphics Discussion Forum and on the SAS-L mailing list. Consequently, this post describes how to add errors bars to a bar chart.
But frequencies don't have confidence intervals...
When I hear the words "confidence intervals on a bar chart," I experience momentary confusion. I think of bar charts as a graphical summary of frequencies (counts) for each of several categories. I use bar charts to plot sample counts, such as the numbers of males and females, or the percentages of people in various political parties. These plots do not have error bars.
But business analysts also use bar charts to show the means of quantities, such as the following graph from the SGPLOT procedure, which shows the mean mileage for cars built in Asia, Europe, or the US:
The following statements create the graph from the SASHelp.Cars data, which is distributed with SAS:
proc sgplot data=sashelp.cars; vbar Origin / response=MPG_City stat=mean limitstat=clm; run;
Notice that the VBAR statement creates a bar chart (with optional confidence limits) from raw (unsummarized) data. Creating the plot is as easy as 1-2-3:
- Use the VBAR statement to specify a categorical variable. (You can also use the HBAR statement to create a horizontal bar chart.) The levels of this variable form the categories for the bars. For example, the Origin variable has the values "Asia," "Europe," and "USA."
- Use the RESPONSE= and STAT=MEAN options to define Y variable. For example, RESPONSE=MPG_City specifies that the Y axis will contain the means of the MPG_City variable for each category.
- Use the LIMITSTAT= option to specify the "error bars" for the bar chart. For example, LIMITSTAT=CLM displays 95% confidence intervals for the mean values.
Bar charts for pre-summarized data
The bar chart is a graphical representation of a simple table that can be produced with PROC MEANS:
proc means data=sashelp.cars mean lclm uclm; class Origin; var MPG_City; output out=CarMPG mean=MeanVal lclm=LowerCLM uclm=UpperCLM; run;
In some situations, you might not have the original data, but only the summarized data, such as are contained in the table. In this case, you can use the SAS 9.3 VBARPARM statement to create the same plot:
proc sgplot data=CarMPG; vbarparm category=Origin response=MeanVal / limitlower=LowerCLM limitupper=UpperCLM; run;
The VBARPARM statement enables you to plot any quantities, not just means and confidence limits. For example, you can compute median values and confidence intervals for the medians, and the plot those quantities with the VBARPARM statement.
Should you even use a bar chart to display means and CIs?
I've shown how you can use the SGPLOT procedure to create bar charts that display the means and confidence intervals of categories. However, this is not necessarily the best way to display this information. In most cases, I prefer a scatter plot with error bars, (also called a dot plot) as shown below:
proc sgplot data=sashelp.cars; dot Origin / response=MPG_City stat=mean limitstat=clm; run;
A bar chart always starts at zero, but if the mean values are in the hundreds (or millions!), you probably don't want to use a bar chart to display the means. You can create a dot plot by using the DOT statement, which has the same options as the VBAR statement. I have used the dot plot to display means and confidence intervals for airline delays.
If the data are summarized, you can use the SCATTER statement with the XERRORLOWER= and XERRORUPPER= options to create a similar plot. This is useful when there are many categories. If there are few categories, as in the present case, you can also place the categories on the horizontal axis:
proc sgplot data=CarMPG; scatter x=Origin y=MeanVal / yerrorlower=LowerCLM yerrorupper=UpperCLM; run;