Definitions of moments in probability and statistics

The moments of a continuous probability distribution are often used to describe the shape of the probability density function (PDF). The first four moments (if they exist) are well known because they correspond to familiar descriptive statistics:

The first raw moment is the mean of a distribution. For a random variable, this is the expected value. It can be positive, negative, or zero.
The second central moment is the variance. The variance is never negative; in practice, the variance is strictly positive.
The third standardized moment is the skewness. It can be positive, negative, or zero. A distribution that has zero skewness is symmetric.
The fourth standardized moment is the kurtosis. Whereas the raw kurtosis, which is used in probability theory, is always positive, statisticians most often use the excess kurtosis. The excess kurtosis is 3 less than the raw kurtosis. Statisticians subtract 3 because the normal distribution has raw kurtosis of three, and researchers often want to compare whether the tail of a distribution is thicker than the normal distribution's tail (a positive excess kurtosis) or thinner than the normal distribution's tail (a negative excess kurtosis).

The purpose of this article is to point out a subtle point. There are three common definitions of moments: raw moments, central moments, and standardized moments. The adjectives matter. When you read a textbook, article, or software documentation, you need to know which definition the author is using.

A cautionary tale

If you re-read the previous list, you will see that it is traditional to use the RAW moment for the mean, the CENTRAL moment for the variance, the STANDARDIZED moment for the skewness, and the STANDARDIZED moment (sometimes subtracting 3) for the kurtosis. However, researchers can report other quantities. Recently, I read a paper in which the author reported formulas for the third and fourth moments of a distribution. I assumed that the author was referring to the standardized moments, but when I simulated data from the distribution, my Monte Carlo estimates for the skewness and excess kurtosis did not agree with the author's formulas. After many hours of checking and re-checking the formulas and my simulation, I realized that the author's formulas were for the central moments (not standardized). After I converted the formulas to report standardized moments (and the excess kurtosis), I was able to reconcile the published results and my Monte Carlo estimates.

Definitions of raw moments

For a continuous probability distribution for density function f(x), the nth raw moment (also called the moment about zero) is defined as
$\mu_n^\prime = \int_{-\infty}^{\infty} x^n f(x)\, dx$
The mean is defined as the first raw moment. Higher-order raw moments are used less often. The superscript on the symbol $\mu_n^\prime$ is one way to denote the raw moment, but not everyone uses that notation. For the first raw moment, both the superscript and the subscript are often dropped, and we use $\mu$ to denote the mean of the distribution.

For all moments, you should recognize that the moments might not exist. For example, the Student t distribution with ν degrees of freedom does not have finite moments of order ν or greater. Thus, you should mentally add the phrase "when they exist" to the definitions in this article.

Definitions of central moments

The nth central moment for a continuous probability distribution with density f(x) is defined as
$\mu_n = \int_{-\infty}^{\infty} (x - \mu)^n f(x)\, dx$
where $\mu$ is the mean of the distribution. The most famous central moment is the second central moment, which is the variance. The second central moment, $\mu_2$ is usually denoted by $\sigma^2$ to emphasize that the variance is a positive quantity.

Notice that the central moments of even orders (2, 4, 6,...) are always positive since they are defined as the integral of positive quantities. It is easy to show that the first central moment is always 0.

The third and fourth central moments are used as part of the definition of the skewness and kurtosis, respectively. These moments are covered in the next section.

Definitions of standardized moments

The nth standardized moment is defined by dividing the nth central moment by the nth power of the standard deviation:
${\tilde \mu}_n = \mu_n / \sigma^n$
For the standardized moments, we have the following results:

The first standardized moment is always 0.
The second standardized moment is always 1.
The third standardized moment is the skewness of the distribution.
The fourth standardized moment is the raw kurtosis of the distribution. Because the raw kurtosis of the normal distribution is 3, it is common to define the excess kurtosis as ${\tilde \mu}_n - 3$ .

A distribution that has a negative excess kurtosis has thinner tails than the normal distribution. An example is the uniform distribution. A distribution that has a positive excess kurtosis has fatter tails than the normal distribution. An example is the t distribution. For more about kurtosis, see "Does this kurtosis make my tail look fat?"

Moments for discrete distributions

Similar definitions exist for discrete distributions. Technically, the moments are defined by using the notion of the expected value of a random variable. Loosely speaking, you can replace the integrals by summations. For example, if X is a discrete random variable with a countable set of possible values {x1, x2, x3,...} that have probability {p1, p2, p3, ...} of occurring (respectively), then the raw nth moment for X is the sum
$E[X^n] = \sum_i x_i^n p_i$
and the nth central moment is
$E[(X-\mu)^n] = \sum_i (x_i-\mu)^n p_i$

Summary

I almost titled this article, "Will the real moments please stand up!" The purpose of the article is to remind you that "the moment" of a probability distribution has several possible interpretations. You need to use the adjectives "raw," "central," or "standardized" to ensure that your audience knows which moment you are using. Conversely, when you are reading a paper that discusses moments, you need to determine which definition the author is using.

The issue is complicated because the common descriptive statistics refer to different definitions. The mean is defined as the first raw moment. The variance is the second central moment. The skewness and kurtosis are the third and fourth standardized moments, respectively. When using the kurtosis, be aware that most computer software reports the excess kurtosis, which is 3 less than the raw kurtosis.

Blogs