Cluster file systems views

We’ve had a chance to do a compare/contrast in recent months between GlusterFS and Lustre. Way back in the 1.4 Lustre time period, we helped a customer get up and going with it. I seem to remember thinking that this was simply not something I felt comfortable leaving at a customer site without a dedicated file system engineer monitoring it/dealing with it 24×7. Seriously, it needed lots of hand-holding then.

Have a recent 1.8.2 installation … I have the same indelible impression … that I am concerned with whether or not the customer has the interest/man-power to really maintain this. Lustre is not for the feint of heart. It requires a serious over-engineering of resources in order to prevent some of its myriad of issues from leaping up and interrupting you (yeah, we should be able to tune these issues, but …) . If you don’t have the luxury of over-engineering these resources, you’d better get ready to dedicate a person or more. It can easily become a full time job for someone.

I don’t consider that a benefit, and I don’t see this problem improving soon.

We’ve also deployed a number of Gluster installations. First was v2.09, subsequent were/are 3.0.x (x=3 right now). Some for customers wanting to evaluate it via open source, others are paying for it.

2.09 had problems. I won’t minimize those. We had some corruption issues, but really, they did seem tied to Infiniband. We have partially fixed that aspect by switching some of the underlying IB technology, but also we have updated to the 3.0.x branch for current/future customers. We aren’t seeing these issues we saw with 2.09.

It seems that there are numerous Infiniband related issues, specifically memory leaks in the stack, and we had to take some … well … extreme measures … to keep this under control.

We’ve seen those issues elsewhere, but now we know how to control them and minimize their impact.

One vendor mentioned to us that they build their Lustre designs defensively, to minimize variables. Well … I am not sure about minimzing variables, I do think they are more focused on building a specifically resilient architecture which will resist many of the problems that Lustre does bring up. The issue is that, like RAID’s purpose in life, their designs fundamentally increase the time available to the administration staff to handle problems, and decrease the relative severity of single problems.

Take an upward holistic step. Is this the right approach? Resilient designs are great. Very important to have in general … but … is it a good idea to start with something apparently inherently brittle as the core? That is, if you start with something that effectively reduces the resiliency of your system, so you have to work to bring the system back to a reliable operational state … is this a good thing? I don’t think so.

The 3.0.x stack of GlusterFS is fairly simple, and works quite well. We are comfortable deploying it and using it. 2.09 wasn’t great. A number of issues … we’d suggest Gluster users move to the new code base soon. Problems we’d seen before with 2.09 (various buffer corruptions) seem to be gone in 3.0.x

Of course, these aren’t the only CFS out there. Spoke with someone about another one this past week, that we will look at again. Apparently my supposition about its state was in fact incorrect. They had an interesting take on the Gluster/Lustre/… scenario, specifically Lustre. Their point was well made, and resonated quite well with me.

Viewed 9748 times by 2151 viewers

2 thoughts on “Cluster file systems views

  1. I have some hope for Ceph preferable seems to be going to use btrfs.

    What I always disliked about Lustre, you need a custom kernel to start with if I’m not mistaken. Glusterfs doesn’t need that, you might need a patched libfuse, but usually that all works.

    But if Ceph and btrfs are ready that sounds like a really nice combination.

    Ceph is fully distributed with automatic replication and loadbalancing if I remember correctly, which Lustre doesn’t seem to have. For example the meta-data-server in Lustre is central and you always need a failover server.

    In Ceph you can balance their load.

Comments are closed.