A peek within the ΔV kimono …

What you see below is from a $5400USD list price machine on initial run through, pre-tuning. Please remember that as you look at these numbers. This is less than $1USD per usable GB.
Will be formally introduced/announced soon. You can see one live at ohiolinuxfest this weekend (this exact machine as it turns out).
This is a RAID6 unit. We could go RAID5 and increase performance, though running in a configuration we do not recommend.

root@dV3:~# df -h /big
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0              5.9T   21G  5.9T   1% /big
root@dV3:~# bonnie++ -u root -d /big -f
Version 1.03b       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
dV3             16G           291317  52 160346  29           499531  35 589.6   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  6199  31 +++++ +++  4391  17  5649  31 +++++ +++  4290  23

buffered writes (2x system ram):

root@dV3:~# dd if=/dev/zero of=/big/big.file bs=1664k count=10k
10240+0 records in
10240+0 records out
17448304640 bytes (17 GB) copied, 51.5349 s, 339 MB/s

buffered reads (2x system ram):

root@dV3:~# dd if=/big/big.file of=/dev/null bs=1664k
10240+0 records in
10240+0 records out
17448304640 bytes (17 GB) copied, 30.8129 s, 566 MB/s

6 thoughts on “A peek within the ΔV kimono …”

  1. What would you say if I built a server with an 8-TB ZFS raidz2 pool (10 x 1TB with dual-parity), with 8 GB RAM, a decent 4-core CPU, and achieve the same sustained read/write throughput or better for only $2000 (TB/$ 3.3x better than yours) ?
    Because based on one of my lower-spec project [1], I think it is possible 🙂
    [1] http://opensolaris.org/jive/message.jspa?messageID=216434

  2. @mrb
    ΔV’s focus is price and capacity. This is (relatively speaking) the mid-range of the units. That we are getting the numbers that we are, immediately after a build, with zero tuning (not kidding at all), for bonnie++ and dd tests (greater than 2x memory) is quite suggestive of good things to come.
    When we are done with the engineering bits on ΔV, it will run Linux (of course), Windows 2008 server, and OpenSolaris. Though from all the customers we have spoken with, very few are interested in that last one. We will have it and support it, but we don’t expect it to recoup our investment in getting on the system.
    All this said, I did read your post and have a number of comments. The obvious on is that you use dd without telling us how large your file that you read/wrote.
    One of the frustrating aspects of dealing with people running benchmarks on zfs based systems is that, quite often, they run the benchmarks that fit entirely in RAM. So all you ever test is cache, and not physical disk or disk-to-os performance. Its not obvious to me from reading the post on the other site that you have actually tested disk or just cache. We often run into what I call “the marketing numbers” from Sun and zfs-fans … you know, the 2GB/s rates. Ask people what they measure, and it is quite different, for realistic configurations. This goes back to your claim of achieving the same results. We did this with a RAID6 with 1 hot spare. Your results

    I did a quick test with a 7-disk striped pool too:
    – 330-390 MByte/s seq. writes
    – 560-570 MByte/s seq. reads (what’s really interesting here is that the
    bottleneck is the platter speed of one of the disks at 81 MB/s: 81*7=567, ZFS
    truly “runs at platter speed”, as advertised, wow)

    Ummm … apples (our RAID6) and oranges (your RAID0). Would you like me to test with our units striped? These disks (individually) read and write more than 20% faster than yours. You probably would not like to see this comparison. Might do it next week for laughs, but I do not know anyone who would be so unwise as to run critical infrastructure with a RAID0. Ok, scratch that, I do know such people. I certainly wouldn’t want to be them. Zfs with all of its purported advantages can’t save you from bad decisions.
    So back to your RAID6 to RAID6 comparison (though you failed to indicate size)

    – 220-250 MByte/s sequential write throughput (dd if=/dev/zero of=file
    – 430-440 MByte/s sequential read throughput (dd if=file of=/dev/null

    We are ~50% faster on writes and about 25% faster on reads.
    You are claiming you can do this smaller performance at $2000 more?
    mrb, we are going to get OpenSolaris up on it as well, running off of our CF cards (we have most Linux running off CF cards). You don’t like the OS, you can pop the other one in. Your choice.
    You should also know that your tuned results are being compared to our untuned results. And that initial preliminary tuning suggests a bit more head room for us.
    If you are in Columbus OH this saturday, come by our booth at Ohio Linux Fest. I’ll show you the unit, and, I’ll run what ever benchmark you ask that we can reasonably do on a “show” floor.
    But, if you want to spend $2000 more for a server that is 25-50% slower, by all means. Go for it.

  3. @Joe: no no no, you didn’t understand my post, let me clarify 🙂
    (1) The link I gave you is just to give you 1 unrelated raidz/RAID5 SATA+ZFS datapoint, it’s not the raidz2/RAID6 (!= raidz) system I am speculating about.
    (2) BTW, don’t compare these RAID0/RAID5 numbers (you seem to think it was RAID6) with your RAID6 numbers. Of course this is apple/orange.
    (3) You say “spend $2000 more for a server that is 25-50% slower” but the same raidz system actually costs $900 today (270+90*7). This is 6x CHEAPER than $5400 (not “$2000 more” !)
    (4) My benchs weren’t cached, I let dd read/write 5-10 GB of data while the system had only 512 MB of RAM.
    (5) My system is NOT tuned at all. This is a basic snv_82 install (I reverted max_payload_size to its default setting).
    @Chris: I haven’t built the system. I just *know* how to build one that I *think* would perform as I described. It all boils down to FIS-based controllers (AHCI, SiI3124), good max_payload_size, multithreaded checksum/parity computation, 334-GB platters, HT throughput between SB and NB, no reallocated sectors, and ZFS. I will try to post a list of parts this week-end. My current raidz system is 80% full, so I’ll soon have the occasion to build the raidz2 one and verify my theoretical numbers 🙂

  4. @mrb
    Your numbers/specs seem to keep changing. On the page you state $1320 for the system, yet here you claim somewhat less.
    It seems obvious that a $270 server isn’t going to have a set of hot swap drives, or a hot swap backplane, or a rail mount or redundant PS or … . This is something that is a pure low end desktop box. In which case, it needs to be compared with the Buffalo units and related, which are much less expensive, and have similar issues to yours. I am not aware of many admins that would look kindly at a non-hot-swap drive scenario.
    Also, you now claim no tuning, yet you have a large section of your post devoted to, of all things, how you tuned it.
    I guess I am unsure of what you are posting or why you are posting it. You state that I can’t compare raidz2 to RAID6 (you can), and you weren’t comparing a stripe to a RAID6 (you were).
    This ΔV has 16 drives, 8 GB ram, and 4 cores, not to mention dual gigabit NICs, rack mount case and rails, hot swap drives, redundant ps, and IPMI/kvm over IP built in. It boots from a CF device.
    Now place your components into a similar case, with similar size memory/motherboard, drive count, etc. What happens to the cost? You can always build a deskside box, but some things you are not going to want to give up.
    ΔV scales down as well as up. We have 1, 2, 3, and 4U units. Ranging from 1TB usable throught 33 TB usable. We are OS agnostic on it, you can run Linux, Windows 2008, and Solaris if you want. Tomorrow (er today) on the Ohio Linux show floor, we will be running Linux base with several VMware sessions running other OSes over iSCSI connections to the disk. Some will be local on it, some will be remote.
    Stop by if you wish, and say hi.

  5. I would love to take a look at this box, unfortunately I am not in Ohio.
    Regarding $1320 vs. $900: $1320 was how much it cost me 6 months ago, $900 is how much it would cost me today. (Becuse 750GB disks have dropped from $150 to $100).
    No, the $270 server doesn’t have hotswap drive cages (but it does support hotswap, I can manually disconnect/reconnect the drives while the server is running, OpenSolaris supports it, so I guess I could buy a chassis if I wanted/needed hotswap drive cages).
    Yes, I maintain I didn’t tune the system. Even though I spent some time talking about and playing with max_payload_size, as I explained in my post in the end it didn’t affect the ZFS performance. (It only affected dd benchmarks, because independent instances of dd to read/write from/to the drives themselves (to bypass ZFS) were the only benchmarks able to stress the system close to its limits.)
    To clarify one more time: the system I linked to is raidz/RAID5. The system I am speculating about (but did not build) would be raidz2/RAID6. You compared my system (raidz) against RAID6. This is apples vs. oranges.
    Put me, an engineer, in charge of designing the ??V, and I can cut your hw costs by at least 2x or 3x. That’s all I am saying =)

Comments are closed.