Sounds like Lustre is getting something of a bad rap

Coverage from ISC09 in Hamburg has GPFS (from IBM) doing well, and Lustre being … well … Lustre.
From InsideHPC’s coverage in the sidebar …

* LSI: “Lustre does not handle very well things going bad!” Thank you Terascale for that very insightful Session! #ISC09
* Lots of GPFS users here praising the robustness and stability of GPFS #ISC09 please think about that before looking for the cheapest way 😉
* great quote from a user: “Lustre badly failed while GPFS was absolutely stable!” #ISC09
* Best practises for deploying parallel filesystems – interactive session, however with Lustre focus #ISC09

3 thoughts on “Sounds like Lustre is getting something of a bad rap”

  1. And the source of all of those tweets? mwhiegl: Deep Computing / HPC Sales Specialist at IBM.
    The company I work for has multiple clusters. The one I installed myself with Perceus and Lustre has been running production jobs for many months. The pre-packaged IBM xcat/GPFS cluster that another part of the company purchased has yet to go production almost 9 months after being acquired.
    That being said, I wouldn’t be using either filesystem as a highly available NAS replacement. But for HPC scratch space? I’ll take the open source sans-consulting-fees version please. Sun support is there for a fee if/when you need them (for the moment anyway…).

  2. @Todd
    Many of them were from him. I didn’t have much time to follow through and look.
    Perceus is good kit. We build high performance storage clusters using our JackRabbit kit and GlusterFS, Lustre, PVFS2, … . Had one customer request GPFS so far.
    I am surprised that you indicate the system is not in production 9 months later. Something sounds not quite right.
    The concerns I have with Lustre now all focus upon Sun and its future at Oracle. And its not just Lustre. Many of our customers have business dependencies upon GridEngine. Its future at Sun/Oracle like Lustre is, to put it mildly, in doubt. As open source projects, I think they will be fine, though I think projects like Lustre and SGE need a corporate backer.

  3. I can believe 9 months, we had to replace our IBM NFS server with non IBM hardware after IBM were unable to fix it months ago when it randomly started failing disks in software RAID (failing as in orange lights coming up on the enclosures when you trigger a MD RAID6 self test).
    Now 5 months after the initial failure they’ve still not been able to fix it after OS changes (they blamed Debian at first, but RHEL does the same thing), replacement SAS cards, new firmware, etc.. Not happy..

