Recall that the binomial distribution is the distribution of the number of successes in a set of independent Bernoulli trials, each having the same probability of success. Most introductory statistics textbooks discuss the approximation of the binomial distribution by the normal distribution. The graph to the right shows that the normal density (the red curve, N(μ=9500, σ=21.79)) can be a very good approximation to the binomial density (blue bars, Binom(p=0.95, nTrials=10000)). However, because the binomial density is discrete, the binomial density is defined only for positive integers, whereas the normal density is defined for all real numbers.

In this graph, the binomial density and the normal density are close. But what does the approximation look like if you overlay a bar chart of a random sample from the binomial distribution? It turns out that the bar chart can have large deviations from the normal curve, even for a large sample. Read on for an example.

The normal approximation to the binomial distribution

I have written two previous articles that discuss the normal approximation to the binomial distribution:

The normal approximation is used to estimate probabilities because it is often easier to use the area under the normal curve than to sum many discrete values. However, as shown in the second article, the discrete binomial distribution can have statistical properties that are different from the normal distribution.

Random binomial samples

It is important to remember that a random sample (from any distribution!) can look much different from the underlying probability density function. The following graph shows a random sample from the binomial distribution Binom(0.95, 10000). The distribution of the sample looks quite different from the density curve.

In statistical terms, the observed and expected values have large deviations. Also, note that there can be a considerable deviation between adjacent bars. For example, in the graph, some bars have about 2% of the total frequency whereas an adjacent bar might have half that value. I was surprised to observe the large deviations in a large sample.

This graph emphasizes the fact that a random sample from the binomial distribution can look different from the smooth bell-shaped curve of the probability density. I think I was surprised by the magnitude of the deviations from the expected values because I have more experience visualizing continuous distributions. For a continuous distribution, we use a histogram to display the empirical distribution. When the bin width of the histogram is greater than 1, it smooths out the differences between adjacent bars to create a much smoother estimate of the density. For example, the following graph displays the distribution of the same random sample but uses a bin width of 10 to aggregate the frequency. The resulting histogram is much smoother and resembles the normal approximation.

Summary

The normal approximation is a convenient way to compute probabilities for the binomial distribution. However, it is important to remember that the binomial distribution is a discrete distribution. A binomial random variable can assume values only for positive integers. One consequence of this observation is that a bar chart of a random binomial sample can show considerable deviations from the theoretical density. This is normal (pun intended!). It is often overlooked because if you treat the random variable as if it were continuous and use a histogram to estimate the density, the histogram smooths out a lot of the bar-to-bar deviations.