Last year, Intel started building its own distro of Hadoop. Their argument was that they were optimizing it for their architecture (as compared to, say, ARM). Today came word (via InsideHPC.com) that they are switching to Cloudera.
This makes perfect sense to me. Intel couldn’t really optimize Hadoop by compiler options to use new instruction capability (part of their selling point), as Hadoop is a Java thing. And Java has its own VM, and many performance touch points that have nothing to do with processor architecture. Indeed, its very hard to optimize Java for a particular microarchitecture, as Java does its utmost to hide the details of that microarchitecture from you. And push you up stack. Fine for apps, not so fine for hard core high performance.
There is a bigger picture/thread that big data is not defined to be Hadoop, but we don’t need to touch that here. Hadoop is one of the tools used in large scale analytics. Optimizing is more a function of IO/network design and higher level job distribution/layout than it is of processor microarchitecture. Thus this tie-up makes perfect sense, as Intel can continue to do what it does best, and have the cloudera folks look at doing a better job in the core at making use of the microarchitecture (which, as I noted, is very hard on a system that tries to hide it from you).