Talk:George E. P. Box

From Wikiquote
Jump to: navigation, search

The importance of the purpose for which the model was intended[edit]

All this time, I thought this quote was an original from my graduate student advisor, David H. Gustafson. Both were at University of Wisconsin and no doubt, David heard it first from George.

The way David told it to me 30 years ago, it had extra significance. It went something like this:

"All models are wrong, and the value of any model is only to the extent to which it supports the purpose for which it was built."

The two keys lessons for me were: (1) don't look to any model for pure "truth and beauty" and (2) You are likely to go wrong if you try to make use a model for a purpose other than what it was designed. For example, if you try to use "ICD9" or "ICD10" as a model of classifying adverse drug reactions, rather than just for medical billing, you may be disappointed.


As documented in: Box, George E. P., J. American Statistical Assoc., Vol 74, Number 365, March 1979, "Some Problems of Statistics of Everyday Life" [1]

An early quote of this idea was: "Models, of course, are never true, but fortunately it is only necessary that they be useful."

The quote "ALL MODELS ARE WRONG BUT SOME ARE USEFUL" is the title of a section of the paper Box, G.E.P. (1979) "Robustness in the strategy of scientific model building" in Robustness in Statistics (R.L. Launer and G.N. Wilkinson, Eds.), Academic Press. [2] But since this is the proceedings of a meeting that took place in April 11-12 1978 [3] , this is likely to be the original quote.

Roots of the concept[edit]

After always hearing Box quoted in this way for so long, I became accustomed to thinking he must have said it a very long time ago with the assumption that those I've heard say it must have heard it themselves from him.

My primary source for the concept is Georg Rasch, who, like Box, was a student of Ronald Fisher's. Rasch spent a year in London at the Galton Laboratory over the 1934-5 academic year. Writing in 1959 and referring to the measurement model he developed that is now in wide use, Rasch says "It may be objected that this model cannot be true." He then explains:

"That the model is not true is certainly correct, no models are--not even the Newtonian laws. When you construct a model you leave out all the details which you, with the knowledge at your disposal, consider inessential--like in the pendulum model described in ch. I. Models should not be true, but it is important that they are applicable [this is italicized in the original], and whether they are applicable for any given purpose must of course be investigated. This also means that a model is never accepted finally, [but is] only on trial. In the case which we discuss, we may tentatively accept the model described, investigate how far our data agree with it, and perhaps find discrepancies which may lead us to certain revisions of the model" (Rasch, 1960, pp. 37-38).

Rasch makes similar comments in other places (for instance, Rasch, 1965, pp. 2, 3).

Perhaps both Box and Rasch learned the basic idea from Ronald Fisher?

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests (reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980). Copenhagen, Denmark: Danmarks Paedogogiske Institut.

Rasch, G. (1965). An individual-centered approach to item analysis with two categories of answers. [Mimeograph copy of ms. Research in process will determine if the paper appears in S. H. Sternberg & others (Eds.), Mathematics and social sciences: Proceedings of the UNESCO and École Pratique des Hautes Études seminars of Menthon-Saint-Bernard, France (1-27 July 1960) and of Gosing, Austria (3-27 July 1961). Paris and The Hague: Mouton.]