The many joys of Redhat based linux distributions: part 1, filesystems and conary

A customer has a JackRabbit. They want to install Scientific Linux 4.4 (SL4.4) on it. Ok.

Holding back on the criticism of the positively ancient kernel in RHEL4 derived distributions, its weak NUMA support, and other issues.
Lets look at file systems.
JackRabbit is a server. A 5U monster that can push tremendous amounts of data around; to disk, from disk, out onto the network. It needs a relatively modern kernel to make best use of its chipsets, which aren’t supported before 2.6.15 as far as I can tell, and are poorly supported in 2.6.16. IMO Linux does not have MSI support down pat. Until it is down pat, turn it off. Really. You won’t regret it.
This JackRabbit unit shipped with 28.5 TB of file system pre-configured. Two file systems: one was 25 TB of hardware RAID6, and the 3.5 TB was a software RAID5. Customer told us “big and fast” for the unit. Fine, thats what was delivered. Our RAID6 numbers would be the envy of other folks RAID0. Customer loved the numbers.
So much so that they wiped our disk config out, and created the one they wanted. Ok, not a problem, though their config won’t be as high in performance. As long as they can live with that, I am fine.
The next thing was the OS. We ship pre-loaded with OpenFiler 2.2 x86_64. It is a great distribution, does wonderful stuff out of the box. I am not all that happy with rPath, what it is based upon, as I am not what one might call, I dunno, impressed? with conary. Package management by Python? Given the (sheer lack of) robustness of other great python projects such as RPM, this bugs me. Also this is ripe for abuse, as it would not be hard at all to ship a disruptive object as a package.
Don’t get me wrong, the idea behind rPath is great. One I had been thinking of for a while. Its the implementation … well, an end user of a software appliance will not ever see this. And that is good. But to a developer … If I want to ship a software appliance, I have to start by building it. The rPath builder is anything but understandable. Worse, I have to build Python modules to create and install packages. I don’t normally write Python, certainly not voluntarily if at all possible. I can use it, hack it, debug it (spend way too much time on that with other people’s code).
But I wanted to write about file systems more than rPath’s conary.
The customer has this JackRabbit. And they want to load SL4.4. Which is great. Except for that being a rebuild of RedHat, it is missing important and essential features, which RedHat has made business decisions, not technical decisions, not to support. Likely due to its backing of other technologies.
Some of the missing features: file systems such as xfs. This is an important file system, as it easily handles huge file systems and files, large I/O, and other issues very nicely. It is fast, it is safe, you can recover from disasters. There are lots of good reasons to use it. The code base is large. It is not a simple small file system. There are lots of moving parts. It is also older than ext2, and much older than ext3. Unlike both of these, xfs doesn’t have little nasty 16TB file system limits.
Which you might note, if you are shipping a 25TB file system product, a 16TB limit would present at least a small problem.
Unfortunately, the customer chose to use a RedHat based system, and given that RHEL5 is as likely to support xfs as RHEL4 is, we have to patch in xfs support. Fine, we will need to charge for this support (costs us time, effort, …) if customers elect to use RH based products. rPath is based upon a Fedora Core release as far as I can tell, and it has xfs and other goodness built in. Making it work on JackRabbit was not a stretch. SL4.4 likely will be.
Our kernel modules, xfsprogs, and other RPMs for SL4.4 x86_64 is here.
I spoke with a RH representative at SC06. Didn’t tear into him, but did let him know that we had to migrate customers off his companies product due to lack of support of critical things like, I dunno, SATA? XFS, and others. He came back with the weak “well ext3 is as good as xfs”.
No, its not even close.
With JackRabbit, I can attach quite a few drives to a single unit for direct connections. More than 500 via SAS links. Thats about 375 TB raw. Precisely why would I want to run a file system that can only handle 16 TB at maximum and needs to fsck itself every 60 days or so when I can run a file system where 375 TB is not even remotely near its upper limits (exabytes). We are talking to people with Petabyte sized problems, and ext3 ain’t gonna be on the menu, because it can’t handle the big stuff. Yeah, JackRabbit can be an iSCSI target and initiator, and I can aggregate lots of iSCSI targets together behind a HA iSCSI system (no single points of failure in the hundreds of TB to PB range). We can build huge file systems for customers who ask for this, and many have. It doesn’t make sense to constrain yourself with a file system that can’t handle what you and your users need it to. Working around such limits represents a hack at best.
Note: I am aware that with 8kB pages, ext3 can get to 32TB volumes. Unfortunately, such machines aren’t prevalent, and most x86/x86_64 machines have 4kB pages, which limits ext3 to 16 TB. Regardless of that limit, the 2TB limit on file sizes is a problem as well. An HPC procurement benchmark two years ago (2004) was dealing with single file sizes of 2.5 TB and larger.
If you use an RHEL4 based distribution you have to accept all of the baggage and decisions that go along with it. Some of them, as indicated, can be really painful. RHEL is an excellent and stable distribution, just I don’t (and most other Linux vendors, and many Linux users) do not agree with some of their choices.

2 thoughts on “The many joys of Redhat based linux distributions: part 1, filesystems and conary”

  1. I found it to be an 8TB limit. But whats worse is that you have no dynamic inodes, so on very large filesystems you are bound to run out of inodes before you have used up all your space. And lastly most releant for testing or recreating clean filesystems: mkfs.xfs finishes in

  2. Hi Michael:
    There are some special patches you can get to get to 16 TB. It isn’t worth it though if you have a machine that can provide more than 16 TB.
    This is an issue for Redhat. It appears that they have chosen long ago to market ext3 as “good enough”, and “as good as xfs”. Unfortunately, it is not, neither in terms of capacity, nor performance. Now ext4 will be developed. It is an “extents based file system”. Hmmm… where have we heard that before? xfs maybe? I think what we have is a severe case of Not-Invented-Here syndrome. The arguments about being hard to support are specious IMO.
    But they made their choice, and our large boxes can’t use their kernel without xfs or jfs. Kind of silly, but hey, we never really needed more than 640 kB ram anyway, right? Same argument, different decade, just as wrong.

Comments are closed.