Yeah, this show had lots of folks talking storage. Obviously we did too. Nicole from Datanami (she had a terrible cold running at the time, I hope she is feeling better), asked me to give a short set of non advertising type interviews. Below is what I did, given no prep, no forwarning, and about 30 seconds to mentally prepare (and that might be generous).
Part 1: Big Data in Media and Entertainment
and Part 2: Accelerating Big Data using Flash and caching with siFlash
What we are seeing, in general, is that there is far more data out there, and its growing faster, than you might think. We talked to current and potential customers and end users about petabyte sized data stores on a regular basis. Everyone is worrying about data growth rates. Everyone is worrying about having enough storage.
A few years ago, 100TB was a large storage system. Now an order of magnitude larger is a medium sized system.
IOP rates are critical. With enough IOs from different nodes coming down the channel, even sequential workloads will start to look like random workloads. Specific industries have hard requirements for sustained minimum bandwidths and delivery latencies.
Yeah, things are changing.
Had about 1/2 the visitors talking about Lustre. Divided into two camps. First camp was, Lustre is the only possible solution, what are you doing in it. Second camp was, we are tired of dealing with it, what are the alternatives. The split was about 25% in the first camp, and 75% in the second. Interesting. We support Lustre, we build siCluster’s with it. With Whamcloud backing it, I am hoping it (Lustre) eventually is renamed OpenSFS (seems obvious to me). Whamcloud will probably need to go for an acquisition with a larger company (Cray?) to continue growing quickly, as government contracts are something of a risk with the state of the budgets as they are now in the US. Hopefully Brett and team will get those things secured (remember Thinking Machines and their effectively one customer? Don’t want a redux of that). Not sure if the VCs will do another round (though they might).
Spoke to many people about Gluster (which we support), about FhGFS (which we do support and will make a formal offering for on siCluster). We’ve supported (and still support) PVFS2 and obviously OrangeFS. Will have to play more with the latter. There is some interest in Sector/Sphere, Tahoe-LAFS, and other projects. But most customers were interested in Gluster and FhGFS for new deployment (if they weren’t in the Lustre camp).
In the case of Gluster, their interest is in the fact that Red Hat (note the spelling Jeff!) owns Gluster now and many people think it will get much better over time. Its good now, the primary complaint from at least one customer is, at this moment, documentation. There is also some breakage we are working through, but I am hoping it will be fixed (and I’ll be diving into the code next week to see if I can trace the breakage). But there is lots of, and growing interest in, Gluster.
FhGFS as being looked at by a number of people as well. We are seeing more and more interest. Customers are open to it, especially as it considers open sourcing the full stack. They already have hard core users, and some good installs. Not to mention a small and very bright/efficient group of developers. Its a different target market (currently) than Gluster, aiming more towards the Lustre market. And the things they are working on are things it would be nice to see in Lustre roadmap … but aren’t.
Can’t and won’t forget Ceph. Ceph is targeted at an overlapping set of what Lustre and Gluster do. Its somewhat complementary to both, and it does many things right that some of the others either don’t do or punt on, or hand wave/provide lip service toward. Some of the best features depend upon btrfs, which is still not quite ready for production use without serious backups … it won’t eat your data, but if you have a file system go belly up, you still can’t repair it. Ceph can sit atop XFS and ext4. Which might be how we start deploying Ceph storage clusters. Stay tuned.
We see lots of customers talking to us about ZFS. Everyone doing so is telling us they are using OpenSolaris or Solaris for this. When asked about Nexenta, to a customer, they said “too expensive”. Its not cheap, so I understand some of the sentiment. At least one prospect, who came by to talk, discussed storage appliances with Solaris preloaded. He didn’t quite get the licensing and support model (was sold a bill of goods by a sales person, and bought it hook line and sinker … I hate seeing this, but it happens, to even the smart consumers in this industry). No one openly talked about zfsonlinux … ok, no one talked about using it. Apart from the government, which could make a specific argument as to why they can’t/shouldn’t be sued for doing this, pretty much no one else is touching that with a 10 meter pole. The risk of litigation is simply too high, and this is what we heard from the folks willing to say something. Even freebsd clusters or systems caused some of these folks great consternation.
When btrfs is ready, this will be far less of an issue. But its an issue now. Probably for the next year or so.
Many other observations. Coming soon.