NetApp sues Sun over patents in ZFS

See this link . Not good for ZFS.

A day after someone posted an amusing and somewhat contradictory set of reasons why they preferred Sun x4500 to JackRabbit (including the “if a RAID card fails you have to replace it, and this is bad” in close temporal proximity to “the SATA controller failed and we had to replace the motherboard”, with the first offered up as to why JackRabbit was not as good as x4500, and the second as to why x4500 was better … you can read this amusing gem on the beowulf list if you wish …) we see ZFS being attacked on patent grounds.

There is lots of hype, and this is the right word, around ZFS. It could be a good file system. As Linus and others have noted, it is, by design, a massive layering violation: it has operation and impact across layers that should be independent. Without that independence, a failure in one of the layers might not be localizable to that layer, and have impacts up and down the stack. This isn’t a minor caveat, think of a GUI which writes low level bytes to a disk. That would be a layering violation as well, and just as egregious.

ZFS has captured significant interest (in part due to Sun’s massive over-hyping of it). It combines low level disk block managers with volume management with file systems. It claims it is the first to do this, though if you look at the documents (man pages) for xfs, you realize that xfs had hooks for this (without the layering violations) long long ago on IRIX. That said, I don’t remember using this “feature” under Irix at all.

ZFS’s big claim to fame from an admin’s point of view is that building file systems across lots of disks is easy. From a safety point of view, it calculates checksums on more things than most file systems. This is not a bad thing BTW.

The latter issue is a good one, and all file systems should emulate that. The former is automatable using good tools. OpenFiler does this with a GUI for example. EVMS2 does this very nicely as well. That is, ZFS does what other tools do, and I am sure others will complain that Sun has something really novel here (I am not quite sure I agree).

The interesting parts of ZFS have more to do with data safety than anything else. But now its future may be in question as it appears to have tramped all over WAFL from NetApp. That was not a good move.

Hopefully we will be able to separate the hype from the reality, and play with ZFS on Linux soon, compare it to xfs and jfs. But the patent bit looks like it might put that on hold for a while.

Viewed 7413 times by 1631 viewers


4 thoughts on “NetApp sues Sun over patents in ZFS

  1. I heard BTRFS also has a layering violation? Is this correct? If this is correct, then that makes BTRFS also bad?

    BTRFS seems to be “broken by design”? A RedHat developer writes

    “In the meanwhile I confirm that Btrfs design is completely broken: records stored in the B-tree differ greatly from each other (it is
    unacceptable!), and the balancing algorithms have been modified in insane manner. All these factors has led to loss of *all* boundaries holding internal fragmentation and to exhaustive waste of disk space
    (and memory!) in spite of the property “scaling in their ability to address large storage”.

    It seems that nobody have reviewed Btrfs before its inclusion to the mainline. I have only found a pair of recommendations with a common idea that Btrfs maintainer is “not a crazy man”. Plus a number of
    papers which admire with the “Btrfs phenomena”. Sigh.

    The first obvious point here is that we *can not* put such file system to production.”

    “The interesting parts of ZFS have more to do with data safety than anything else.”
    This is actually the only point in using ZFS. CERN did a major study on their valuable data from the large particle machines, and all data gets bad over time. The hardware did not even notice this! CERN is now migrating from Linux hw raid to Sun ZFS machines.
    Read the very bottom here

    Also, computer science researchers have injected errors into XFS, JFS, ReiserFS, ext3, NTFS, etc and all errors were not corrected. They where not even detected. There is a link to the research paper here:

    In another comp sci study, ZFS detected all errors, and would have corrected them all if it used ZFS raid. In the study only a single disc where used so there where no redundancy to correct the detected errors.

    This data safety (which no other filesystem offers) is the ONLY reason to use ZFS. Never mind the fancy functionality. You dont want to corrupt your valuable data. And to get the checksums correct, is very very very difficult to do correctly. Look at all XFS, JFS, etc – they do not succeed in correcting nor detecting errors – and they compute checksums all the time.

  2. @kebabbert

    The BTRFS bits were covered in significant depth by LWN. What Edward found was ostensibly a bug in the implementation, not a design flaw. This has been discussed significantly on the btrfs lists, as well as other locations. The consensus is that the problem (leaf merging) is handleable with a workaround now, and Chris Mason is looking to get a bug fix in shortly.

    Its obviously up to you to decide whether or not to use BTRFS. This said, I have doubts that so many people could have the figurative wool pulled over their eyes by a broken design. That simply doesn’t pass the sniff test. I’d suggest following the discussion on LWN, as they do, as usual, a very good job of getting to the issue.

    As for “nobody reviewing btrfs before inclusion” … er … I honestly don’t know how to respond to this. It was obviously reviewed, by the many contributers to BTRFS from Redhat, IBM, Oracle, and others, as well as by the core Linux team itself. You may of course, choose to believe what you wish relative to this, but this claim isn’t well supported by the facts or evidence. Sadly it sounds more like the FUD we’ve heard in the past from various sources. You know the ones where people go on about any old random code being incorporated, and point to this as being a security problem, or a trojan horse problem, or … in this case, a file system review problem. This neglects the fact that random code can’t get in, and moreover, code that goes in, undergoes very close inspection from many directions. Its hard to get code into the kernel. Its pretty damn near impossible to get code in which others haven’t reviewed or criticized.

    I’ve seen lots of articles on fuzzing of APIs. Error injection and detection is not something the older file systems do a good job with. Of this there is no doubt. Zfs is newer than the other file systems … its somewhat disingenuous to rip on an older file system for lacking a newer developed file system capability.

    But just to make sure you understand where all of this is going, Linux has been adding the integrity checking capability to its base. All block layer devices will be able to make use of it. Ext4 currently has journal checksumming. I’d expect to see the rest of the Linux file system systems start to use the in-kernel functions once they finalize.

    ZFS’s only real benefit is that checksumming. And this is not a benefit that is exclusive to ZFS going forward. As for Cern switching to ZFS, this was 3 years ago. These days, the dudes seem to have got a Dell …. DDN does the block checksumming in hardware.

    More to the point, the paper referenced points out that all hardware/software has issues, and there is nothing special or magical about ZFS. ZFS itself has had some serious bugs a number of years ago, some very high profile, which caused massive data loss.

    So these marketing points you made aside, the focus upon data integrity is a good thing. I agree with them.

    But none of this changes the issue that ZFS is tied to Sun/Oracle, and is under legal threat from Netapp. If it ever does escape the legal cloud issues, it will have other rather extensive … and existential … issues to deal with.

  3. To add to what Joe said, it appears that most of the concerns over btrfs appear to have been overblown; Chris Mason has fixed the one bug that was found (fix in 2.6.35-rc6) and seems to repeatedly respond to Edward with comments like:

    # Again, the leaves and nodes both have balancing conditions.

    # The top-down balancing requires that we control the number
    # of upper level inserts. This is already done today.

    # I’m still trying to nail down if you’re worried about something
    # other than packed file data.

    Edward didn’t reply to any of those queries.

    Regarding layering violation, well yes, but the pragmatic part of me says “so what, it works” (as does ZFS).

    Regarding ZFS, I’m currently using ZFS under FUSE for backups, but that’s all. The future for OpenSolaris seems bleak with Oracle essentially ignoring it to death, hence the recent motion of the OGB to disband on August 24th if Oracle don’t appoint a liaison by August 16th.

    Meetings will be suspended until August 16, unless conversations occur between OGB and Oracle that enables a meeting.

    The ongoing litigation between NetApp and Sunacle over ZFS also doesn’t help things, especially with NetApp now taking action over 3rd parties who use ZFS in OpenSolaris in their products.

  4. You should also keep in mind that all software has bugs (and is trying to kill you). Even ZFS is not immune to data corruption/loss bugs like this nasty one:

    Synchronous writes on ZFS file systems prior to Solaris 10 Update 8 are not properly committed to stable media prior to returning from fsync() call, as required by POSIX and expected by Oracle archive log writing processes.

    For what it???s worth, Sun support provided no useful assistance on this case. We dtrace???d Oracle log writes, replicated the problem using an Oracle database, and then ??? to prevent Sun from blaming Oracle or our storage vendor – replicated the data loss with a trivial ???C??? program on local spindles.

    It’s worth noting that they then followed up to say it’s not entirely solved with U8 either:

    DBA???s pay attention: Any Solaris 10 server kernel earlier than Update 8 + 142900-09 that is running any application that synchronously writes more than 32k chunks is vulnerable to data loss on abnormal shutdown.


Comments are closed.