What Good Are Error Bars?

A cognitive psychology blog recently published an interesting post about an important and long-standing problem with scientific publications: Most researchers don't understand error bars.

Error bars are graphical elements included in a statistical plot to represent the uncertainty in a sample statistic. They are important to JMP users, as error bars are commonly mentioned when users are asked about what new features they want to see in the next version of the software.

They have been used for a long time, yet there is no real standard for them. Many publications have adopted some conventions over time, but no harmonization has occurred to date. The name "error bar" might imply that it represents one standard error, but in practice, error bars take various forms and represent other quantities, such as the sample standard deviation or a confidence interval of the sample mean. Different "rules of thumb" abound about how to make or interpret them. Whether you are an author selecting error bars for your plots or a reader interpreting error bars in an article, you might experience some confusion about their purpose and form.

The author of the blog, Dave Munger, polled readers before writing his entry to see if they knew how to use these statistical graphics and was not surprised to find that many participants had difficulty. He cited a large published study by Sarah Belia that found the same outcome for authors in psychology, neuroscience, and medical journals. How could intelligent, highly trained, and experienced researchers have difficulty with such an old, common, and seemingly simple device?

I think that the reason for this common professional malady is that we have used the same device for different purposes, we have used different graphics to represent the same thing, and we have used different formulae to compute them. To see this, compare error bars to another graphic device that is also used to represent uncertainty in sample statistics, the Shewhart control chart. This family of charts has an explicit model for each kind of sample statistic (e.g., individual data, sample average, sample range).

Control charts have rigorous rules for computing control limits and making the chart. Every chart is made of the same components so that if you learn how to read a sample means chart (Xbar chart), for example, you can also read a proportion defective chart (P chart). While the chart type depends on the data, the chart is generated and interpreted the same way, allowing these different control charts to keep a similar form and convey similar information. On the other hand, error bars do not have the same luxury; they do not have the same form or convey the same information, often making them ambiguous.

So for what do you use error bars? I can think of three common purposes, enough to illustrate my point.

1. Represent variation in a sample of data that is presented as a descriptive summary of the sample.

2. Represent uncertainty in a sample statistic that is used to infer something about the population.

3. Represent uncertainty in several sample statistics that are used to compare populations.

These purposes are very different, so there isn't one kind of error bar that works in every case. What does JMP do about it? JMP uses a distinct graphic for each purpose that is uniquely suited to each purpose. Examine the Big Class data set found in the Sample Data folder that accompanies JMP. Use the Oneway platform to explore the heights of different age groups. The key to eliminating this confusion is to use the correct graphic for each purpose.

1. JMP shows the spread of the data with standard deviation lines.

If you have a question about the variation in each age group, you can use the Unequal Variances command to get the answer. In addition to a numerical report that includes hypothesis tests about the standard deviations, JMP adds standard deviation lines to the plot that represent one standard deviation above and below the sample average. These group statistics are not pooled because you want to know about the variation in each group.

2. JMP shows the location of the sample average with a diamond.

If you have a question about the location of each group, you can use the Means command. Once again, you get a new graphic in addition to a numerical report. The means diamonds present the sample averages as point estimates (center lines) and as confidence intervals (top and bottom points of the diamonds). The default confidence level is 95%. At any time, you can change this level with the Set Alpha Level command, and all of the reports are updated. In addition, the width of the diamond is proportional to the size of the group. This graphic is distinct from the previous standard deviation lines, and it is richer and gives a better answer about the location of each group.

3. JMP compares sample averages with circles.

If you want to compare the locations of groups, you can use the means diamonds again. They include overlap lines near the top and bottom. If the interval between these lines for one group does not overlap the interval between these lines for another group, then the group means are significantly different. These comparisons use the same significance level as the confidence interval, which is alpha = 0.05 by default.

The Munger's blog post mentioned a "rule of thumb," suggesting: "the confidence intervals can overlap by as much as 25% of their total length and still show a significant difference between the means for each group." First, note that the purpose of a confidence interval as such is to show the uncertainty in the group mean. The overlap lines, on the other hand, are specifically for such comparisons.

Second, we must also know that the error bars represent a confidence interval. Third, the assumptions for this rule are not entirely clear nor always met in practice. Fourth, the extent of overlap is difficult to assess visually. Another rule of thumb is suggested for the cases when the error bars represent standard error instead, and this rule suffers from similar weaknesses. If we use software to make a plot, why not also use it to make a better graphic and produce the proper hypothesis test and p-value, which is a better way to determine significance?

When you have more than two groups, though, the overlap lines provide no protection against the inflation of the experiment-wise alpha that is inherent in multiple comparisons. JMP provides a second way of comparing group averages that accounts for the number of comparisons to be made. Use one of the commands in the Compare Means group, such as the All Pairs, Tukey HSD command. This command produces a new graphic for each group. Like the means diamonds, the comparison circles inform you about the location (center) and spread (radius) of the sample statistics, but they are further adjusted to account for the number of comparisons. To compare group means, select a group by clicking on its circle, and JMP responds graphically with the answer; it is not enough to ascertain whether circles overlap.

I selected the lowest circle to ask whether this group (age=12) is different from any other groups in this example. The circles for three groups (age=14,15,17) change to gray to indicate that they are significantly different.

You can also use the connecting letters to determine which group averages are the same or different. Groups that do not share a letter are significantly different. Again you see that (age=12, group B) is different from (age=14,15,17, group A).

So, what good are error bars? They are good when they clearly convey useful information in an unambiguous way. They might be over-taxed, though, and used for purposes for which better graphics have been invented. JMP considers the information that you want and uses the best graphic available to convey it.

Added 24Nov2008:
I found another relevant example of error bars in JMP. When you use Fit Model to analyze a linear model with standard least squares regression, you automatically get a leverage plot for each term in the model if the Emphasis is Effect Leverage. (If you use another emphasis, you can still obtain the leverage plot by clicking the red triangle at the top of the window next to Response and selecting Row Diagnostics > Plot Effect Leverage. Now open the Effect Details and the reports for individual effects.) A categorical effect will include a Least Squares Means Table beneath the leverage plot. This report includes the LS mean, standard error, and mean for each level of this effect. You can right-click in the report and select Columns > Lower 95% and Columns > Upper 95% to get the limits for the 95% confidence interval of the LS means estimate. The result will look something like this:


(Note: this result uses Big Class data set and the model is weight = age + sex + height. This picture is for the age effect.)

Now click the red triangle at the top of the leverage plot and select LS Means Plot. The plot shows a graphic of the LS means along with error bars for the 95% confidence interval for comparison. The plot looks something like this:

Let me know if you find any other examples of error bars in JMP!

Added 30Mar2009:
Error bars are optional in some JMP platforms. For example, you might want to show error bars in a bar chart. This example shows you how to add them.

1. Select Help > Sample Data. Open the Examples for Teaching section and select Big Class.

2. Select Graph > Chart. Select height > Statistics > Mean. Select age > Categories, X, Levels. Select OK.

3. Click the red triangle next to Chart and select Add Error Bars to Mean.

4. You have choices about the quantity to use and its multiple. Use one standard error of the mean to show the uncertainty in its estimation.

.

5. Select OK.

That's all there is to it.

tags: JMP - General

16 Comments

  1. Don Gregory
    Posted September 22, 2008 at 5:00 pm | Permalink

    The original blog post says “Standard errors are typically smaller than confidence intervals.” Assuming these are all normal distributions (not stated, but I think assumed), isn’t it true that CI are always greater than the std.err. of the mean. I was thinking that the std.err. of the mean was the Std.Dev. of the population of the means of many samples (which is estimated from the data); and that the 95% CI would therefore be a multiple (approx 2 sigma) of the std.err?

  2. Don Gregory
    Posted September 22, 2008 at 5:01 pm | Permalink

    Also, the blog post author mentions that with correlated samples, error bars are inappropriate. I can understand the requirement that two ‘samples’ be independent for anova, but if “before” vs “after” measurements are not suitable, then I think there is a whole bunch of folks using the anova platform (with compare means) incorrectly?
    I think there are a lot of folks who are using it for just that purpose; so the question is, if that’s not right, what’s the ‘correct’ way to evaluate a significance of a ‘before’ and ‘after’ set of measurements?

  3. mid-level biological researcher
    Posted September 23, 2008 at 6:21 pm | Permalink

    Thank you so much for this post. I have such a hard time explaining the meaning of error bars to coworkers, who are convinced that there is some sort of absolute standard for them. I am hoping that directing them to this post will help.

  4. Mark Bailey
    Posted September 30, 2008 at 12:56 pm | Permalink

    I agree that the 95% confidence interval will be wider than an interval of one standard error because the critical t value for alpha=0.05 will always be greater than 1. However, if you required less confidence, the multiplier might be less than one. For example, if you only required 60% when comparing the means from two populations with a sample of 3 from each, your multiplier is only 0.94. This case is not likely, however.

  5. Mark Bailey
    Posted September 30, 2008 at 2:38 pm | Permalink

    This is another case where the comparison involved paired observations. In this case, if the observations are highly correlated or the variation between blocks is large, then the confidence interval of the sample averages will be greater than confidence interval for the paired differences. This case happens frequently. Unfortunately, many investigators do not recognized the pairing and use the less powerful two-sample test of the mean.
    Once again, JMP provides a better solution: Analyze > Matched Pairs, if you recognize the pairing.

  6. dave
    Posted October 17, 2008 at 10:03 am | Permalink

    I'm a long time jump user but just recently found this blog. I want to say that posts such as this one are most interesting.
    I would love to see more posts on specific statistical issues/problems and how JMP addresses those.

  7. Mark Bailey
    Posted October 17, 2008 at 3:30 pm | Permalink

    We appreciate your comments. This blog is active and varied. We try to make it interesting to a wide audience. Please let us know if there are any particular issues or problems that interest you.

  8. Mark Bailey
    Posted November 24, 2008 at 10:35 am | Permalink

    Your comment is much appreciated. Help us to help you! You (and any other reader) are welcome to suggest topics that might not occur to us or to indicate that a given topic should be a priority for us to cover here.
    Again, thanks for your comments!

  9. Walter R. Paczkowski
    Posted December 23, 2008 at 11:55 pm | Permalink

    This was a good discussion of the error bars and the tools in JMP for making comparisons. I have two questions that are more for information or education. First, are there any references on the construction of the overlap regions of the diamonds? Second, the letter assignments can be confusing to interpret, so are there any guidelines for interpretation? For instance, are they transitive so that if X does not differ from Y and Y does not differ from Z, then does X not differ from Z?

  10. Mark Bailey
    Posted January 21, 2009 at 1:47 pm | Permalink

    There is information about how the overlap marks in the means diamonds are computed. Refer to the JMP Statistics & Graphics Guide in the Chapter 7, which is all about the Oneway platform. (The reference manuals are available as PDF documents through the Help > Books menu.) See the section Means Diamonds and x-Axis Proportional in this chapter for more details.
    Your transitivity example is not necessarily true, and I have seen exceptions to it. The letter groupings are meant to ease the interpretation of significant differences among the means for different levels. The letter designations work on the same principles as the underlying multiple comparison method upon which they are based. Simply put, if two groups share the same letter, then they are not signficantly different at the designated level. Otherwise, if two groups are assigned different letters, then they are signficantly different at the designated level. Some levels may be assigned to more than one letter.

  11. Florian
    Posted March 27, 2009 at 11:41 am | Permalink

    Does anyone know how I can add error bars on a bar graph in JMP?

  12. Mark Bailey
    Posted March 30, 2009 at 1:05 pm | Permalink

    Please see the addendum to the original blog entry.

  13. Mark Bailey
    Posted March 30, 2009 at 1:30 pm | Permalink

    Please see the answer added at the bottom of the blog entry.

  14. Anger Management
    Posted December 30, 2009 at 5:54 am | Permalink

    Helpful information. Thank you very much.

  15. Elizabeth Salas
    Posted April 29, 2011 at 9:06 pm | Permalink

    When using analyze-fit model to run a mixed-model 3-way ANOVA, all tables in the output shows Std Error. Is there any way of changing this to standard deviation?
    Any help would be greatly appreciated!
    Thanks

  16. Mark Bailey
    Posted May 2, 2011 at 2:57 pm | Permalink

    Your queston is a good one! The simple answer is that Fit Model is already presenting the standard deviation for your results.
    The long answer is that there are two kinds of populations involved in the analysis of your mixed effects model. The first kind is a data population, where we describe the spread of data values (e.g., response, predictor) using a standard deviation. The second kind is a statistic population, where we describe the spread of statistics (e.g., parameters, LS MEANS) using a standard error. The standard error is, in fact, the standard deviation of the sampling distribution of the estimated statistic.
    The standard deviation of the data does not directly indicate the uncertainty of the estimated statistic, although it is related. The more the data vary (larger standard deviation), the more the sample statistic varies (larger standard error). The exact relation, however, depends on the model, its parameterization, and the method used to estimate the statistic. So the best indicator of the uncertainty for the statistic is its associated standard error.

One Trackback

  1. By Top 10 JMP Blog posts of 2011 - JMP Blog on January 26, 2012 at 11:00 am

    [...] What Good Are Error Bars? (2008) [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>