Link is here. I cannot tell you how many times I’ve had a conversation with a researcher, when we talk about statistics, and they quote me some high correlation coefficient as being evidence of causality.

Any physical scientist, chemist, engineer, … knows that you have to treat correlation coefficients very carefully, and you cannot substitute these for a real causal relationship with a backing theory that provides a testable model. That is, the causal relationship is fundamentally an aspect of the theory, with the latter able to guide you on making predictions. Without a firm theory rendering testable hypotheses, and providing predictions for measurement, a correlation as such lacks real theoretical meaning. It could be entirely accidental, and you’d have no real basis to discriminate between something real and accidental.

Oh yes, you have the next statistical layer where you can infer whether or not a correlation is “by chance” versus whether or not there is signal there. But without the underlying predictive power of an operational hypothesis, this “by chance” estimate is … an estimate at best. Built upon self reinforcing assumptions. You can drive that jalopy down as many levels as you wish, but it doesn’t change the fact that its a jalopy. You need an underlying predictive theory and working hypothesis to test, to use to compare the predictions to the measurements, to ascertain whether or not you have a real correlation that has underlying meaning behind it.

Its a good read, I recommend it.