Lustre's future, part 1 of a few
By joe
- 6 minutes read - 1086 words[update] Jeff said substantially the same thing last year. Go figure :O I haven’t written up my thoughts after seeing the slides, speaking with some of the support team, seeing John West and John Leidel’s discussion of Lustre 2.0 on InsideHPC … … but I need to. So here is the first (very brief) comment. Here are a set of slides (hat tip to Chris S) which neatly summarizes what we see customers thinking. Ignoring their relatively low performance for a moment … (7GB/s writes? we were seeing 1.5GB/s average per OSS across our 8 OSSes …. but thats for a later discussion), their concerns are what we hear quite frequently. Customers do not want vendor lock in. Period. They would like to avoid something proprietary. Fear of bricking is huge. Anything that is proprietary is (permanently) brickable, taking lots of data/time/effort with it. This increases risks. Oracle’s strategy around Lustre, as John noted in the summary, and from what I gleaned from the slides, is to make it more proprietary, with features/functionality being decidedly non-GPL, and using it as a lever to sell Oracle hardware, rather than a stand-alone product. There are organizations that will benefit from this (not Oracle). Clusterstor does Lustre support, and I can vouch that they know what the heck they are doing. They set us straight very quickly (when we couldn’t even get the time of day out of Oracle). Their take is that 2.0 is GPL, so it is supportable, and there is a strong future to it. I don’t doubt their enthusiasm, as I believe that Oracle just significantly increased the demand for Clusterstor’s services.
But the other aspects of the announcement are what trouble me. Oracle is looking to use Lustre to drive Oracle hardware. So Terascala is basically over then. As are the products at other Lustre appliance providers who are not using Oracle hardware. This is a risk when you base your product around something that can go away. Your business strategy can be bricked … and in the case of Terascala, if your entire strategy is around this … well … I could go into more depth, but the BP presentation is the important thing. This is what customers think. They tied CXFS, but it (like other systems of similar design) isn’t that good for NFS or clusters as it turns out. They are getting comfortable with Lustre, but they have deep and grave concerns over the directions. They like and use GPFS (we’ve heard this from other customers), and they use Panasas. Going forward, it wouldn’t surprise me to see them play with GlusterFS (and we’d be happy to help them with this :) ), Fraunhofer GFS, and a few others. I think customers like options, and don’t like getting boxed into a single monolithic stack. How do I know? Its stated, multiple times, in that presentation. GPFS isn’t open, and it isn’t easy to get on non-IBM hardware yet, but it is possible. PANFS is completely tied to a small startup, and is proprietary. Lustre is open source, but it is very hard to rebase on other than supported OS/kernels, which makes things like following update streams extremely hard for customers. Which is why the support aspect is so critical for it. GlusterFS is open source, and very easy to move to new kernels, it only requires a working network stack. Which, as we have discovered, its very easy to have a broken IB stack, which you discover by running a parallel file system atop it. Fraunhofer is not open source, but the licensing costs are very reasonable, and it doesn’t have the rebasing problem Lustre has. Also seems to have better performance. The emerging file systems, such as Ceph, now in the kernel, may effectively obviate the need for other object based storage systems. It has so much goodness in its roadmap … It is well worth paying attention to. Customers want choice, flexibility, and don’t want to be boxed in or locked in. Unfortunately, my read on the direction of Lustre is that it is headed in the lock-in direction. Which is likely to lose it support. We have multiple customers, active and proposed, with Lustre dependencies, and concerns about how to manage going forward. Thanks to Clusterstor, we can support the legacy and new installs. But what of the future? Where is a roadmap that will not cause concern? I don’t see one at the moment. And this worries me. FWIW: We have used and sell GlusterFS, Lustre, PVFS2, … based systems atop our siCluster and JackRabbit and Delta-V offerings. We are pleased that we are showing some of the highest sustained performance density in the market, and we know we are capable of significantly more than that. We are actively working with Gluster, and speaking with several other groups about making sure that, no matter which file system our customers choose, they won’t be left high and dry. I’ve had companies evaporate on me after I sold/deployed their systems. I’ve seen changes of direction from other companies, or changes in business conditions, cause them to alter their offerings in a manner incompatible with our customers needs. This is the danger of bricking, and anything wholly proprietary carries this. It represents and increased risk, and customers are sensitive to the risk. We want to avoid deploying risk. So do our customers. For the risk equation, ask yourself this question: If a bus/train/whatever demolishes the company and all the people within it, that provides your subsystem X, can you get support for it? If the answer is “no”, then this represents an on-going risk to your business that you need to mitigate. If the answer is yes, then things aren’t bad, but the direction that they are moving in is important to note. For Lustre, for the moment, the answer is yes, you can get support. You can pay people other than Oracle for support. The question is, for how long though. Will Lustre fork, will it go completely closed? I don’t know. And that concerns me, and it concerns my customers. I am not worried about GlusterFS going completely proprietary, nor am I worried about Ceph doing this. Fraunhoffer is proprietary, as is PANFS and GPFS. These do increase risks. Do we have to worry about a future Lustre v2++ being not open source, then Oracle demanding, in the name of copyright, that the forked project change its name? And how quickly would the projects diverge? Hence the concern.