… and anyone promising you one is selling you something. This is true everywhere, though especially so in massively overhyped markets.
There are no secret incantations that will tease actionable insights out of gargantuan bolus of data.
Yet, from all the “company X now has a hyper optimized, purple colored Hadoop distro, with a pony” announcements, one might think that it was a panacea … a panopticon with infinite ability to extract the most profound and profitable nuggets from mountains of steaming piles of bits.
At the moment, automated tools still can’t replace skull sweat, intuition, etc. You need a combination of tools, and very smart people with domain knowledge to tease useful information out of large information stores. You need to be able to have them help you understand how to filter out noise, and establish baseline signals. I am not sure if you’d call these people “data scientists”, but you would call them smart, expert, etc. And you’d hand them good tools, or give them enough of a runway to develop what the need. Hadoop is just one of many tools in the bandoleer of such smart people, and I’d argue that its not even the most important tool. Things like R, kdb+/q, Matlab/Octave, and other environments to crunch data, build and test models, and then run models are a bit more important to the process than the underlying data store.
Believing in silver bullets, or purple elephants as saviors may not be the best of all possible strategies.