See here which is the response to the arvix article here.

While the Facebook data scientists refer to their post as a debunking, using irrelevant metric (enrollment vs google rank? and the theory behind this is … what?), the paper points out something quite important.

Social networking success has been largely ephermal, and not sustainable. Its a transient phenomenon. Anyone remember Friendster? MySpace?

More to the point, the internet entities that dominated 15 years ago are largely gone. New entities emerged in their place. Is anyone still using AltaVista?

The point raised in the arxiv paper is quite valid, and has a theory from which testable hypotheses can be drawn. Trying to take it down with irrelevant statistical analysis, with no no theoretical basis behind it, does little to forward the concept of science, never mind data science.

Moreover, and this echos a point I’ve made many a time. Its a bias of mine, but it should be of all theoretical scientists. If your entire theory consists of a statistical model derived from training data, you don’t have a theory from which you can draw insight, or meaningful hypothesis. All you have are a set of introspective analyses, none of which provide a fundamental reason as to why your analyses, or choice of analytical tools is or should be considered correct.

In other words, don’t replace a real theory (as in the arxiv paper) with a statistical analysis (as in the takedown), as the statistical analysis isn’t what offers the insight, its the theory that does. And if you’ve built your theory correctly, you can compare it with the statistics to see how close or far from being representative of nature it is. You can provide counter arguments to the model’s predictions with the statistical analysis. You cannot provide counter arguments to the statistical analysis from the theory. That is the statistical analysis is not “disprovable”, and is therefore not a theory.

This is a very important distinction, and one I see run roughshod over every day.

I should also note that the criticism of using actual engagement trends versus the google searches is a perfectly valid and realistic criticism of their results. But they should have run this data back through the models to see if they gave the same answer. That would have been a good takedown.