Wisdom has built her house;
She has hewn out her seven pillars.
– Proverbs 9:1

At the 2014 Joint Statistical Meetings in Boston, Stephen Stigler gave the ASA President's Invited Address. In forty short minutes, Stigler laid out his response to the age-old question "What is statistics?" His answer was not a pithy aphorism, but rather a presentation of seven principles that form the foundation of statistical thought. Here are Stigler's seven pillars, with a few of my own thoughts thrown in:

1. Aggregation: It sounds like an oxymoron that you can gain knowledge by discarding information, yet that is what happens when you replace a long list of numbers by a sum or mean. Every day the news media reports a summary of billions of stock market transactions by reporting a single a weighted average of stock prices: the Dow Jones Industrial Average. Statisticians aggregate, and policy makers and business leaders use these aggregated values to make complex decisions.
2. The law of diminishing information: If 10 pieces of data are good, are 20 pieces twice as good? No, the value of additional information diminishes like the square root of the number of observations, which is why Stigler nicknamed this pillar the "root n rule." The square root appears in formulas such as the standard error of the mean, which describes the probability that the mean of a sample will be close to the mean of a population.
3. Likelihood: Some people say that statistics is "the science of uncertainty." One of the pillars of statistics is being able to confidently state how good a statistical estimate is. Hypothesis tests and p-values are examples of how statisticians use probability to carry out statistical inference.
4. Intercomparisons: When analyzing data, statisticians usually make comparisons that are based on differences among the data. This is different than in some fields, where comparisons are made against some ideal "gold standard." Well-known analyses such as ANOVA and t-tests utilize this pillar.
5. Regression and multivariate analysis: Children that are born to two extraordinarily tall parents tend to be shorter than their parents. Similarly, if both parents are shorter than average, the children tend to be taller than the parents. This is known as regression to the mean. Regression is the best known example of multivariate analysis, which also includes dimension-reduction techniques and latent factor models.
6. Design: R. A. Fisher, in an address to the Indian Statistical Congress (1938) said "To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of." A pillar of statistics is the design of experiments, and—by extension—all data collection and planning that leads to good data. Included in this pillar is the idea that random assignment of subjects to design cells improves the analysis. This pillar is the basis for agricultural experiments and clinical trials, just to name two examples.
7. Models and Residuals: This pillar enables you to examine shortcomings of a model by examining the difference between the observed data and the model. If the residuals have a systematic pattern, you can revise your model to explain the data better. You can continue this process until the residuals show no pattern. This pillar is used by statistical practitioners every time that they look at a diagnostic residual plot for a regression model.

I agree with Stigler's choice of the seven pillars. If someone asks, "What is statistics?" I sometimes replay "It is what statisticians do!" But what do statisticians do? They apply these seven pillars of thought to convert measurements to information. They aggregate. They glean information from small samples. They use probability to report confidence in their estimates. They create ways to quantify data differences. They analyze multivariate data. They design experiments. They build and refine models.

Share

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

1. David Chapman on

Statistical Independence should be one of the pillars. It could even be the floor on which all the pillars rest.

2. Thanks for reporting, Rick. I'm sick at home and couldn't make it. I like these, though I think "statisticians" tend to abuse aggregation, and I spend a lot of time undoing that so that I can aggregate correctly. And, FWIW, I think regression to the mean is (vastly) interesting for historical reasons, and thus to the historian Stigler, but maybe for the same reasons isn't such a great thing to pull up as an example of multivariate analyses.

3. Gheorghe Săvoiu on

Classical statistical thinking emphasizes the important role of variables, of its methods
and models, while modern statistical reflection permanently enriches them with the help of the
paradigm of variation, fuzzy thinking and iterative cycles thinking too. Any type of human thinking is subordinated to a certain way of scientific thinking, which motivates the fact that a statistician concealed most of his achievements in scientific research, academic teaching and applied activity under the cloak of modern statistical thinking, to keep alive the spirit of the scientific narrative.
The current missed pillar of your statistical thinking is the trans-, inter-, cross- and multidisciplinarity in the variables, methods and models and the modern statistical thinking becomes a fluent and relevant scientific context, which underlines the originality of statistical thinking contributions and the statisticians passion for the sheer, splendid beauty of statistical thinking, which becomes the main cause of their training and forming, initially practitioners and researchers, then theorists and teachers.
To underline the necessity of the VIII th pillar, I believe the modern statistics becomes already a trans-, inter-, cross- and multidisciplinary way of thinking, under the various influences of the
variation paradigm, the nuanced thought impacted by fuzzy or neutrosophic logic, under the auspices of the thinking cycles, as an ensemble of its reflection, to the conceptualization of modern science, in the multiverse of contemporary sciences and scientific research...
From the distinct role of historical, regionalised and structural variables in statistical thinking, to the “epsilon” variable, typical of trans-, inter- and multi-disciplinarity, or to statistical methods and methodologies in trans-, inter- cross- and multidisciplinarity,the originality of statistical thinking is and will remain reflected in its methods, and methodologies, as well as the complexity, iteration and cyclicality of modelling statistical thinking, in order to describe, in the end, the new areas of original application of statistical thinking in trans-, inter-, cross- and multidisciplinarity.

4. I would add the important "thing statisticians do" as "they help predict". I see that as equivalent to "they help make decisions". And, I'd add, they should make the best decision given the data available, and if that data is sparse, and if the model is uncertain, estimates of prediction error should generally -- but not always -- large.

5. Stigler's contribution was enlightening. With his added sense of humor, it made it a memorable event.

To put some perspectives into the 7 pillars let me share my views and perspective on all this.

In his 1922 fundamental paper in the Philosophical Transaction of the Royal Society (Ser. A, 222, pp. 309-368) Fisher states that "the object of statistical method is the reduction of data". He then identifies "three problems which arise in the reduction of data". These are:
1. specification - choosing the right mathematical model for a population
2. estimation - methods to calculate, from a sample, estimates of the parameters of the hypothetical population
3. distribution - properties of statistics derived from samples.
Later, Colin Mallows, a distinguished researcher at AT&T Bell laboratories, added a "zeroth problem" - considering the relevance of the observed data, and other data that might be observed, to the substantive problem (Mallows, 1998).

Stigler seems to have expanded on these 4 pillars and redesign 7 interesting pillars with relevance to the scientific. In a wider sense. more is needed. Some attempts to consider the contribution of statistics to business and industry, with a life cycle view, is available in http://ssrn.com/abstract=2315556