The day job lives at a crossroads of sorts. We design, build, sell, and support some of the fastest hyperconverged (aka tightly coupled) storage and computing systems in market. We’ve been talking about this model for more than a decade, and interestingly, the market for this has really taken off over the last 12 months.
The idea is very simple. Keep computing, networking, and storage very tightly tied together, and enable applications to leverage the local (and distributed) resources at the best possible speed. Provide scale out storage and compute capability, and the fastest possible communication infrastructure.
Make it so that people with ginormous data and computing needs have a fighting chance of actually being able to do their work in a reasonable period of time.
This is really what tightly coupled is all about. Hyperconvergence is bringing all these aspects together, and enabling the software to make effective use of it.
To distill the essence, this is about reducing the barriers to performance at every level, and designing systems for higher performance efficiency (e.g. more cost effective to run at scale), while increasing the density (e.g. reducing the number of systems you need to get performance).
But this isn’t the only thing changing. People are enamored of Big Data. Though, if you read various analyses, it appears there is a significant effort of self designed/built big data systems versus vendor packaged. And more to the point, the footprint of big data systems is of OOM 103 systems. I don’t know what the distribution function is for this, but a 100% growth in these wouldn’t be terribly large in terms of system footprint.
Which, to a degree, begs the question as to why vendors are chasing such a ‘small’ market so hard.
I know, I know … its all the rage and Wikibon indicates that Hadoop is huge.
The estimate for 2013 was $2B USD, and with a 58.2% CAGR, prediction of an approximate $3.16B USD market in 2014. This is the complete market, not just the software side.
What Hadoop represents is a change in thought processes on how to gain insight from pools of data. How to build better data driven models.
This is not to say that Hadoop is alone. SAS, SPSS, and many other statistical analytics packages have been used, for decades, to construct and test models. What has changed has been the leveraging of new technologies to store and query data at effectively arbitrary scale.
This is, IMO, the fundamental genius of these tools. And this is in part where the value proposition sits.
To distill this to the essence, its about lowering the friction between data storage, modeling, and testing.
While the journalists are using Hadoop to mean the data analytics market, there is an unfortunate tendency to conflate the two. I am pretty sure that Kx, SAS, etc. are all well represented in the analytics market. Specifically, I am wondering if the 103 number is badly undercounting the real size of the market.
Have a look at this poll from KDNuggets. This shows where (a self selecting group, so likely significant biases are shown) people responding indicate they are spending their time for analytics. As you can see, Hadoop is pretty low on the list and growing slowly. It and SAS appear to dominate the growth (again, self selecting data, so there is definite bias).
But chances are there are many people using Hadoop on the back end. Things like the R Connector, and for that matter, many other Hadoop connectors, suggests it is being used as an analytic back end. Indeed, the push for a SQL interface, and now Spark (for in-memory distributed analytics), suggests that there is far interest in utilization of this than is represented by the number reported.
Big data is all about being able to use this data at whatever scale you need, with as little friction and as few barriers as possible.
This is also the essence of tightly coupled computing. Bring computing, storage, and networking together in such a way as to reduce friction.
This convergence is interesting to say the least …