Did George Box say, "All models are wrong, but some are useful"?

0

Nearly every statistician has heard the aphorism, "All models are wrong, but some are useful." The quote is attributed to George Box, an early and influential thinker about statistics.

Did George Box actually say this quote? Yes, he did. The first part of the quote ("All models are wrong") appeared as early as 1976. The full quote appears on p. 424 of Box and Draper (1987, Empirical Model-Building and Response Surfaces). In the section "The Use of Approximating Functions," Box and Draper use a polynomial to approximate a true response function. They write, "The fact that the polynomial is an approximation does not necessarily detract from its usefulness because all models are approximations, Essentially, all models are wrong, but some are useful." (the emphasis is mine).

UPDATE 03APR2025: Matt Tenan informed me that the quote appears even earlier in a little-known and hard-to-obtain conference proceedings. It appears on p. 202 in the proceedings of the Robustness in Statistics Conference, held April 11—12, 1978, by the Army Research Office in RTP, NC. The article is Box (1978), "Robustness in the Strategy of Scientific Model Building." See the references for the full citation. I have updated a few sentences in my original post to reflect this earlier publication.

When did George Box first say it?

The internet is full of references to this quote. Some cite Box and Draper (1987) whereas others refer back to a JASA article that Box published in 1976.

Many people cite the source of the quote as Box's 1976 JASA article, "Science and Statistics" (Box, 1976, JASA, Vol 71, pp. 791-799), which I can access. In it, he discusses the scientific method as an iterative process in which "practice confronts theory, and theory, practice." Progress requires devising "parsimonious but effective models," which are then compared with data to expose their inaccuracies (p. 791).

Box writes (p. 792), "Since all models are wrong the scientist cannot obtain a 'correct' one by excessive elaboration. On the contrary, ... he should seek an economical description of natural phenomena." In the next paragraph, Box continues: "Since all models are wrong the scientist must be alert to what is importantly wrong."

Finally, having told us twice that all models are wrong, Box states, "in applying mathematics to subjects such as physics or statistics, we make tentative assumptions about the real world which we know are false but which we believe may be useful nonetheless."

He provides a statistical example: "the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world."

Thus, the 1976 paper contains the first part of the quote and the sentiment of the complete quote. If you liberally use ellipses marks and skip a few paragraphs, you can obtain the sentence, "All models are wrong, ... but ... may be useful nonetheless." However, I think it would be deceitful to claim that the entire quote is in the 1976 paper.

Matt Tenan pointed out that Box published the quote in a 1979 conference proceedings, based on a presentation from April 1978. See Appendix 1. It appears as a section header. When you read the paper in the proceeding, you can see that Box began by presenting the same ideas and arguments from his 1976 JASA article. You can also see the beginnings of the content that was eventually published in Box and Draper (1987).

So did George Box say it?

Yes, George Box did write the words, "All models are wrong, but some are useful."

  • The first part of the quote, "All models are wrong," appears twice in Box (1976), and the second clause is implied, although not stated explicitly.
  • The earliest complete quote is Box (1979), which is a paper presented at a 1978 conference. The conference proceedings are not easily accessible, which is why this citation is not better known.
  • The complete quote appear in the first edition (1987) of Box and Draper on p. 424. The quote also appears in full in the second print edition (2007) of Box and Draper. Many people cite this textbook as the source of the quote.

In conclusion, yes, George Box wrote the famous quote that is attributed to him. It seems to have been a part of his teaching and writing for at least a decade before it was finally printed in a textbook in its final, best-known form.

What did he mean by it?

Box's written words from 1976 are not meant to disparage models and modeling. Rather, he advocates parsimonious models and admonishes those who would overfit the data. Science benefits when a model is simple enough to describe, interpret, and apply. Scientists must carefully state the assumptions in a model and point out the deficiencies of the model. Box wanted to emphasize the notion that a model of reality is different from reality. If a straight line or a quadratic response is a good fit to data, that doesn't imply that the underlying data generating mechanism is equally simple, but it does imply that we can use the simple model to help us model, interpret, predict, and optimize our real-life processes.

An alternative version?

On several internet sites, I saw a claim that the quote on p. 424 in Box and Draper (1987) is, "Remember that all models are wrong: the practical questions is how wrong do they have to be to not be useful." That sentence is not on p. 424; it might appear elsewhere in the book, but I have not found it. Although the sentiment of this sentence is the same as the actual quote, the tone is more negative. This version of the quote talks about models that are not useful because they are "too wrong." Leave a comment and a complete citation if you can prove that this sentence was written by Box.

References

  • Box, G. E. (1976). "Science and statistics". Journal of the American Statistical Association, 71(356), 791-799.
  • Box, G. E., & Draper, N. R. (1987). Empirical model-building and response surfaces. John Wiley & Sons.
  • Box, G. E., & Draper, N. R. (2007). Response surfaces, Mixtures, and Ridge Analyses, John Wiley & Sons. Second Edition of earlier book, Empirical Model Building and Response Surfaces, Box and Draper, 1987). The quote is on p. 414.
  • Box, G.E.P. "Robustness in the Strategy of Scientific Model Building." Robustness in Statistics, edited by Robert L. Launer and Graham N. Wilkinson, Academic Press, 1979, pp. 201-36. ScienceDirect.

Appendix 1

Attached is a portion of p. 202 and the top of p. 203 of Box (1978), which is the earliest known publication of the complete quote:

Appendix 2

Attached is an image from p. 424 of Box and Draper (1987, Empirical Model-Building and Response Surfaces).

Appendix 3

Attached is an excerpt from Box's 1976 article, so that you can read his thoughts for yourself and compare the complete text to the quotes I used in my article. The complete text shows that Box was in favor of models, but wanted to emphasize that mathematical models differ from reality. The data are real; our models are the approximation.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top