"New" File systems worth watching
By joe
- 5 minutes read - 871 wordsThe day job currently has siClusters in the field with GlusterFS, Lustre, and a few other “older” parallel file systems. GlusterFS is a distributed file system with a very interesting and powerful design concept. It is under active development by a venture backed company, Gluster, Inc. I can’t say enough good things about it, and the company behind it. The day job is in a relationship with them, so you may take this information for what its worth, and weight it accordingly. Our view is, generally, that they have something very close to “the right design” going forward. There are occasional issues that pop up, usually connected with Infiniband, that we can’t necessarily fault Gluster for, but they do bear the brunt of errors in the transport stacks. We’ve seen this derail installs at one location, during effectively corner case testing … these weren’t Gluster issues per se, they were pretty definitively IB stack issues, but ones that couldn’t be easily worked around.
The day job uses Lustre as well. Well, getting back into using Lustre after a several year hiatus. Where Gluster is neat and simple in overall design and implementation, Lustre is a complex beast, with many moving parts. This is a recipe for problems unfortunately, and we have encountered our fair share during bringup. Lustre’s design is an older one, with several centralized servers. This effectively rate limits scalability, and makes stability a function of the least stable centralized resource. This is a concern for us, as we have seen customers blame file systems for every problem they encounter, regardless of the merits of such blame. Having a system with a designed in SPoF (single point of failure) is IMO a very bad idea. Sort of like a permanent storage on RAID0. Yeah its fast, no you really don’t want to do this. Lustre 2.0 looks like they will remove the SPoF of the MDS/MGS device. But their kernel dependencies will again limit their utility across lots of installations. Adding to our concerns has been the recent acquisition of Sun by Oracle. Cute/funny derisive versions of their combined name aside, we have very real business dependencies upon several of their HPC stack, and while Marc Hamilton (now VP of HPC sales at Oracle) indicates that they have a long life, we are … concerned … about some of these. Virtualbox is IMO an excellent tool. GridEngine … we’ve got a love/hate relationship with it … it works well when it works, and when it fails, it can be real annoying. Lustre for siCluster is definitely an option and something we can offer (same hardware, select the parallel file system of your choice if you don’t want the default). Even OpenSolaris, something we’ve not seen many requests for (more for that than Solaris itself), we have an interesting use case within siCluster. Needless to say, the changes make us (and our customers) nervous. These are the historical systems. But what about the “new” systems? First, there is ceph. Ceph is a distributed object store done right. We have set up a few test systems with it, and will get more aggressively into it later this year, including (likely) hosting it as a test option on an internal siCluster for customers to play with. They have a clustered MDS, will use btrfs as the backend data store. Btrfs is something like a better zfs than zfs. Btrfs is part of the linux kernel, and is being developed by Chris Mason and others, at Oracle. Some might point out the “missing raidz*” as a reason zfs is “better” than btrfs, but I’d not harp on that point too heavily, as btrfs will sit nicely upon the md/lvm/… bits, so it gets all the goodness of those as well. But Ceph isn’t the only one of interest for siCluster. We are also looking at tahoe-LAFS, and Twisted Storage among others. Very application dependent as to which makes the most sense. With Twisted, we see whole new vistas of possible offerings opening up … some nice business models we can enable. Tahoe-LAFS is interesting in that it provides something akin to provably secure distributed data storage, something we think is going to become tremendously important in any cloud storage scenario, where data can span multiple legal regimes, some of which might not be friendly to the content stored. This latter issue, spanning legal regimes for cloud storage, is one that hasn’t seen any testing in any court case that I am aware of (chime in if you know otherwise). Moreover, being able to deal with a loss of data from that legal regime’s confiscation of servers is going to become just as important. I won’t go into what Twisted will let us do right now, the day job had an initial, very good call, with them, and we have some very interesting ideas on this. Of course, some of those ideas require a bit of cash to make happen, and this isn’t a friendly time to be raising capital (long story here). These are some of the options that we are working on for storage going forward. We think that some of these concepts could be quite interesting to the market going forward.