Santa Claus, statistics, and understanding uncertainty

1

As the International Year of Statistics comes to a close, I've been reflecting on the role statistics plays in our modern society. Of course, statistics provides estimates, forecasts, and the like, but to me the great contribution of statistics is that it enables us to deal with uncertainty in a quantitative manner.

To illustrate this point, I'll pose a seasonal question: What percentage of seven-year-olds in the US believe in Santa Claus?

Although a visit to your local mall might make you to think "all of them," to estimate this percentage you need to actually survey some children. How many children do you need to survey? That depends on how confident or certain you want to be with the result.

The field of statistics provide tools for understanding uncertainty. There are many sources of uncertainty, and Bob Rodriguez covers many of them in his excellent paper "It's All about Variation: Improving Your Business Process with Statistical Thinking." For this blog post, I'll restrict my attention to the so-called sampling variation.

Suppose that you ask 10 children from a local school whether they believe in Santa. If six say "yes," then 60% is an estimate for the percentage of the population that believes. However, this estimate is practically useless because it was based on such a small sample. If you had chosen a different set of 10 children, then by chance you might have encountered three, five, or seven children who believe. In fact, statistics tells us that—based on this small survey—a 95% confidence interval for the percentage of children in the US who believe in Santa lies somewhere between 26% and 88%. That is a lot of uncertainty!

In contrast, suppose that you interview 600 children across the country. If 426 children believe in Santa, then 71% is another estimate. There is still uncertainty in this estimate, but statistics tells us that the true percentage of seven-year-old children who believe in Santa Claus is likely to be in the range 67% to 75%. That range is within four percentage points of the estimate of 71%. The uncertainty is smaller because the second sample was larger than the first sample. (The larger sample can also use statistical techniques that account for variation due to geography and other socioeconomic factors.)

Ever since the 1948 fiasco of "Dewey defeats Truman," pollsters have taken care to use statistical techniques to express confidence in their estimates. Modern news reports often include the all-important statement, "the margin of error in the poll is ...." If you don't hear that magical phrase, then the estimate is like a Christmas tree without decorations: something is missing!

So statistics helps us to be confident in two ways. First, both 60% and 71% are valid statistical estimates, but 71% is better in the sense that we have more confidence that it is close to the true population value. Second, humans are notoriously poor at estimating uncertainty. (This explains the large number of Dad's who attempt to assemble little Johnny's new bike on Christmas Eve!) Making decisions that are based on data instead of gut feelings help us to correct for our natural tendency to be overconfident.

Yes, estimates are important, but the real value of statistics is in providing confidence for those estimates. As this International Year of Statistics comes to a close, celebrate the contributions that statistics offers our world! Chief among those, statistics provides tools for understanding uncertainty.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

1 Comment

Leave A Reply

Back to Top