Saturday, May 16, 2020

All models are wrong, but some are useful

--- George Box, Robustness in the Strategy of Scientific Model Building, Technical Summary Report #1954, University of Wisconsin-Madison, Mathematics Research Center, May 1979 (h/t David Weinberger for the reference)

Some more great passages by Box, also via Weinberger, from George E. P. Box, Science and Statistics, Journal of the American Statistical Association, Vol. 71, No. 356 (Dec., 1976), pp. 791-799, http://www.jstor.org/stable/2286841

...he must not be like Pygmalion and fall in love with his model.

2.3 Parsimony

Since all models are wrong the scientist cannot obtain a "correct" one by excessive elaboration. On the contrary following William of Occam he should seek an economical description of natural phenomena. Just as the ability to devise simple but evocative models is the signature of the great scientist so overelaboration and overparameterization is often the mark of mediocrity.

2.4 Worrying Selectively

Since all models are wrong the scientist must be alert to what is importantly wrong. It is inappropriate to be concerned about mice when there are tigers abroad.

2.5 Role of Mathematics in Science

Pure mathematics is concerned with propositions like "given that A is true, does B necessarily follow?" Since the statement is a conditional one, it has nothing whatsoever to do with the truth of A nor of the consequences B in relation to real life. The pure mathematician, acting in that capacity, need not, and perhaps should not, have any contact with practical matters at all.

In applying mathematics to subjects such as physics or statistics we make tentative assumptions about the real world which we know are false but which we believe may be useful nonetheless. The physicist knows that particles have mass and yet certain results, approximating what really happens, may be derived from the assumption that they do not. Equally, the statistician knows, for example, that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world.

It follows that, although rigorous derivation of logical consequences is of great importance to statistics, such derivations are necessarily encapsulated in the knowledge that premise, and hence consequence, do not describe natural truth. It follows that we cannot know that any statistical technique we develop is useful unless we use it. Major advances in science and in the science of statistics in particular, usually occur, therefore, as the result of the theory-practice iteration...