What is kurtosis? What does negative or positive kurtosis mean, and why should you care? How do you compute kurtosis in SAS software?

It is not clear from the definition of kurtosis what (if anything) kurtosis tells us about the shape of a distribution, or why kurtosis is relevant to the practicing data analyst.
Mathematically,
the kurtosis of a distribution is defined in terms of the standardized
fourth central moment. The estimate of kurtosis from a sample also involves a summation that accumulates the fourth powers of a standardized variable. In SAS software, the formula for the sample kurtosis is given in the "Simple Statistics" section of the documentation.

This article presents an intuitive explanation of kurtosis that is often true in practice.

### What does kurtosis mean?

Let's compute the sample kurtosis for two variables in the Sashelp.Heart data set and see whether the kurtosis values tell us anything about the shape of the data distributions. The variables are AgeAtStart (the age of a patient when he or she entered the study) and Systolic (his or her systolic blood pressure).
The following statements call the MEANS procedure to compute estimates for the mean, median, standard deviation, skewness, and kurtosis:

proc means data=sashelp.heart mean median std skew kurtosis;
var AgeAtStart Systolic;
run;

The last column shows that the kurtosis of the AgeAtStart data is negative, whereas the kurtosis of the Systolic data is positive. What does that mean? Well, first recall that whereas the mean, median, and standard deviation are expressed in the same units as the data, kurtosis (like skewness) is a dimensionless quantity. The zero value corresponds to a standard reference situation. For kurtosis, the reference distribution is the normal distribution, which is defined to have a kurtosis of zero. (Technically, I am describing the *excess kurtosis*, since this is the value returned by statistical software packages. Researchers sometimes consider the *full kurtosis*, which is 3 more than the excess kurtosis.)

The interpretation of kurtosis in terms of the shape of the distribution can be tricky, but for many unimodal distributions the kurtosis compares the peak and tails of the distribution to the normal distribution:

- A data distribution with negative kurtosis is often broader, flatter, and has thinner tails than the normal distribution.
- A data distribution with positive kurtosis is often narrower at its peak and has fatter tails than the normal distribution.

Negative kurtosis for a sample indicates that the sample contains many observations that are a moderate distance from the center, but few outliers that are far from the center. The peak—if there is one—looks more like a butte or a rounded hilltop than a jutting spire. The canonical distribution that has negative kurtosis is the continuous uniform distribution, which has a kurtosis of –1.2.

In contrast, positive kurtosis indicates that many values will be near the center, but that you should also expect a certain proportion of values that are far from the center. A heavy-tailed distribution has large kurtosis. The canonical distribution that has a large positive kurtosis is the *t* distribution with a small number of degrees of freedom. Some heavy-tailed distributions have infinite kurtosis.

### Graphically testing the kurtosis interpretation

You can run a little experiment to see how the shape of the AgeAtStart and Systolic distributions compares with the normal distribution.
What would you expect to see if you overlay a normal curve on a histogram of those variables? For the AgeAtStart variable, which has negative kurtosis, you should observe a distribution that is "broad in the shoulders" and that has thin tails. For the Systolic variable, you should observe a distribution that has many observations clustered together, drops off faster than a normal distribution, and has at least one heavy tail. The following statements use PROC SGPLOT to overlay a histogram with a normal curve centered at the median:

ods graphics / width=330px height=250px;
proc sgplot data=sashelp.heart noautolegend;
histogram AgeAtStart / binwidth=5 binstart=30 showbins;
density AgeAtStart / type=normal(mu=43); /* center at median */
run;
proc sgplot data=sashelp.heart noautolegend;
histogram Systolic / binwidth=10 binstart=80 showbins;
density Systolic / type=normal(mu=132); /* center at median */
run;

The histogram for the AgeAtStart variable shows that the distribution is broad and flat. It does not have a pronounced central peak, and there is substantial mass along most of the range of the data. The distribution appears to be bounded: Apparently the heart study did not accept any patients who were younger than 28 or older than 62. This implies that there are no tails for the population distribution. Overall, the AgeAtStart variable provides a prototypical example of a distribution that has negative kurtosis.

The histogram of the Systolic variable has a narrow peak and a long right tail. There are many patients in the study with systolic blood pressure in the range 120–140. (The American Heart Association calls that range prehypertensive.) Compared to the normal distribution, there are relatively few observations in the ranges (80, 120) and (140, 170). However, the distribution has more extreme observations than would be expected for a normally distributed variable, so the distribution has a fat tail.

### Computing kurtosis in SAS software

There are three SAS procedures that compute the kurtosis of data distributions. PROC MEANS was used earlier in this article. Use PROC UNIVARIATE when you want to fit a univariate model to the data. PROC IML includes
the KURTOSIS function in SAS/IML 13.1, as shown in the following example:

proc iml;
varNames = {"AgeAtStart" "Systolic"};
use Sashelp.Heart; read all var varNames into X; close Sashelp.Heart;
kurt = kurtosis(X);
print kurt[colname=varNames];

### When the intuitive interpretation of kurtosis can fail

The relationship between kurtosis and the shape of the distribution are not mathematical rules, they are merely intuitive ideas that often hold true for unimodal distributions. If your distribution is multimodal or is a mixture of distributions, these intuitive rules might not apply.

There have been many articles about kurtosis and its interpretation. My favorite is an article by Balanda and MacGillivray (1988, *TAS*). They give examples of three different densities that each have the same kurtosis value but that look very different. They conclude that "because of the averaging process involved in its definition, a given value of [kurtosis] can correspond to several
different distributional shapes" (p. 114).

They recommend that kurtosis be defined as "the location- and scale-free
movement of probability mass from the shoulders of a
distribution into its center and tails. In particular, this definition
implies that peakedness and tail weight are best viewed
as *components* [emphasis mine] of kurtosis.... This definition is necessarily vague because the
movement can be formalized in many ways" (p. 116). In other words, the peaks and tails of a distribution contribute to the value of the kurtosis, but so do other features.

The tail of the distribution is the most important contributor.
Although Balanda and MacGillivray do not mention it, the kurtosis is a non-robust statistic that can be severely influenced by the value of a single outlier. For example, if you choose 999 observations from a normal distribution, the sample kurtosis will be close to 0. However, if you add a single observation that has the value 100, the sample kurtosis jumps to more than 800!

If you like pathological distributions and mathematically contrived examples of all the ways that kurtosis can fail to indicate the shape of a distribution, see
Balanda and MacGillivray's article. However, the intuitive notions in this article hold true for many unimodal data distributions that arise in practice.