Are all Models Wrong?
One of my favourite quotes comes from the accidental statistician George E. P. Box
Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.
It was interesting to find a blog article seeking to disprove this assertion.
Here is the crux of his argument:
Suppose Model A states, “X will occur with a probability that is greater than 0 or less than 1.” And let Model B state that “X will occur”, which of course is equivalent to “X will occur with probability 1″ (I’m using “probability 1″ in its plain-English, and not measure-theoretic, sense).
Now, if X does not occur, Model B has been proved false. The popular way to say it is that Model B has been falsified. If X does occur, Model B has been proved true. It has been truified, if you like.
How about Model A? No matter if X occurs or not, Model A has not been falsified or truified. It is impossible for Model A to be falsified or truified.
Now, I am not a statistician. However, I reject this argument as wrong (although I accept it is probably true.)
The reason why is explained in the comments:
I had the good fortune to study with Dr. Box, and I’m afraid you’ve misconstrued [h]is aphorism. You have somehow managed to conflate “All models are wrong” with “All models are false” and then went on your merry way skewering your strawman.
I can assure you from first hand interaction with Dr. Box, that “All models are wrong” means simply, “All models have error”. In the silly example you state, the “unfalsifiable” Model A isn’t really even a model.
However, while the argument is wrong, I feel it is still useful.
When Matt Briggs presents Model A
“X will occur with a probability that is greater than 0 or less than 1.”
He is actually saying “we now that this model is wrong.”
If the probability were 0.5 he would be saying “Half the time this model is wrong, half the time it is right.”
He is still saying that the model is wrong. He is just quantifying the number of times he expects it be prove wrong.
Which is exactly what George Box was talking about.
how wrong do they have to be to not be useful.
Is being wrong half the time good enough to be useful? How about sixty percent?
At this point the precision of the language does not suit the levels of uncertainty we are dealing with when people like myself, non statisticians, are talking.
Instead we are better off using less precise terms such as “unlikely”, “more often than not” or “rarely.”
He explained the distinction back in 1998:
In any system of this kind it is important to recognise that the machine is completely formal, while the world is almost invariably mostly informal. The machine has been carefully constructed so that its fundamentally
informal physical nature has been tamed and brought under control.
In software the machines we build are always models and nothing more. All models are wrong. Software will always have bugs.
Why does any of this matter?
Because lately it has become fashionable to reject formal disciplines and practices that have proven useful simply because they can be proven to be wrong. The response is to be become completely informal.
A good example of this is the NoEstimates movement.
Instead of depending on an accurate estimate for predictability we can take away the unknowns of cost and delivery date by making them… well, known.
I’m not criticising Neil Killick here. He makes a good argument. The problem I have is with the people who have tried to use his “no estimates” aphorism as a principle or even, sometimes, as fact.
Are estimates always wrong? Yes.
Are estimates sometimes useful? Yes.
The problem is that we need to know just how wrong our estimate are if we are to know when they are useful.