What is a geometric mean?

There are several different kinds of means. They all try to find an average value from among a set of numbers. Although the most popular mean is the arithmetic mean, the geometric mean can be useful for problems in statistics, finance, and biology. A common application of the geometric mean is to find an average growth rate for an asset or for a population.

What is the geometric mean?

The geometric mean of n nonnegative numbers is the n_th root of the product of the numbers:
GM = (x₁ * x₂ * ... * x_n)^1/n = (Π x_i)^1/n
When the numbers are all positive, the geometric mean is equivalent to computing the arithmetic mean of the log-transformed data and then using the exponential function to back-transform the result: GM = exp( (1/n) Σ log(x_i) ).

Physical interpretation of the geometric mean

The geometric mean has a useful interpretation in terms of the volume of an n-dimensional rectangular solid. If GM is the geometric mean of n positive numbers, then GM is the length of the side of an n-dimensional cube that has the same volume as the rectangular solid with side lengths x₁, x₂, ..., x_n. For example, a rectangular solid with sides 1.5, 2, and 3 has a volume V = 9. The geometric mean of those three numbers are ((1.5)(2)(3))^1/3 ≈ 2.08. The volume of a cube with sides of length 2.08 is 9.

This interpretation in terms of volume is analogous to interpreting the arithmetic mean in terms of length. Namely, if you have n line segments of lengths x₁, x₂, ..., x_n, then the total length of the segments is the same as the length of n copies of a segment of length AM, where AM is the arithmetic mean.

What is the geometric mean good for?

The geometric mean can be used to estimate the "center" of any set of positive numbers but is frequently used to estimate an average value in problems that deal with growth rates or ratios. For example, the geometric mean is an average growth rate for an asset or for a population. The following example uses the language of finance, although you can replace "initial investment" by "initial population" if you are interested in population growth

In precalculus, growth rates are introduced in terms of the growth of an initial investment amount (the principle) compounded yearly at a fixed interest rate, r. After n years, the principle (P) is worth
A_n = P(1 + r)ⁿ
The quantity 1 + r sometimes confuses students. It appears because when you add the principle (P) and the interest (Pr), you get P(1 + r).

In precalculus, the interest rate is assumed to be a constant, which is fine for fixed-rate investments like bank CDs. However, many investments have a growth rate that is not fixed but varies from year to year. If the growth rate of the investment is r₁ during the first year, r₂ during the second year, and r_n during the n_th year, then after n years the investment is worth
A_n = P(1 + r₁)(1 + r₂)...(1 + r_n)
= P (Π x_i), where x_i = 1 + r_i.

What is the average growth rate for the investment? One interpretation of the average growth rate is the fixed rate that would give the same return after n years. That hypothetical fixed-rate growth is found by using the geometric mean of the values x₁, x₂, ..., x_n. That is, if GM is the geometric mean of the x_i, the value
A_n = P*GMⁿ,
which assumes a fixed interest rate, is exactly the same as for the varying-rate computation.

An example of the geometric mean: The growth rate of gold

Let's apply the ideas in the preceding section. Suppose that you bought $1000 of gold on Jan 1, 2010. The following table gives the yearly rate of return for gold during the years 2010–2018, along with the value of the $1000 investment at the end of each year.

According to the table, the value of the investment after 9 years is $1160.91, which represents a total return of about 16%. What is the fixed-rate that would give the same return after 9 years when compounded annually? That is found by computing the geometric mean of the numbers in the third column:
GM = (1.2774 * 1.1165 * ... * 0.9885)^1/9 = 1.01672
In other words, the investment in gold yielded the same return as a fixed-rate bank CD at 1.672% that is compounded yearly for 9 years. The end-of-year values for both investments are shown in the following graph.

Notice that the rate of return for gold is negative for some years, but that it is the quantity 1+r, that is important in this problem. The quantity 1+r is the year-over-year multiplier. It is always positive unless the investment loses all its value (for example, the bankruptcy of a company). If the multiplier is ever 0, the value of the investment will be 0 forever. That is why the geometric mean is 0 if any number in the set is 0.

The geometric mean in statistics

The geometric mean arises naturally in situations in which quantities are multiplied together. This happens so often that there is a probability distribution, called the lognormal distribution, that models this situation. if Z has a normal distribution, then you can obtain a lognormal distribution by applying the exponential transformation: X = exp(Z) is lognormal. The Wikipedia article for the lognormal distribution states, "The lognormal distribution is important in the description of natural phenomena... because many natural growth processes are driven by the accumulation of many small percentage changes." For lognormal data, the geometric mean is often more useful than the arithmetic mean. In my next blog post, I will show how to compute the geometric mean and other associated statistics in SAS.

4 Comments

David New on September 30, 2019 9:09 am

Very interesting article on this subject. I think not understanding a geometric mean is one reason for confusion by the public when you see economic and other statistics in the press.

- Rick Wicklin on September 30, 2019 9:18 am
  
  And not just the public. There is a lot of confusion among data analysts, too. I wrote this article (and the next) because I realized that I didn't understand geometric means, variance, etc, well enough to answer some of the questions posed by SAS users.
  
Gerald on October 2, 2019 11:31 am

Hi Rick,
I've also seen geometric mean used for computing averages in populations that might have some extreme outliers that could skew mean, such as incomes or the average length of hospital stay (LOS) Some terminal patients have LOS that could run into weeks and months.

- Rick Wicklin on October 2, 2019 11:43 am
  
  Yes, the geometric mean is sometimes used when the data span several orders of magnitude. It works because the log transformation is a normalizing transformation. By computing the mean of the log-transformed data, you negate some of the influence of the extreme observations.
  
  Some people claim that the geometric mean is robust to outliers, but that statement is not strictly true. There are better choices for robust estimates of a mean. However, the geometric mean is "resistant" to outliers, provided they are not too extreme, for the reason mentioned above.

Blogs