A plan to work around the rPath issues in OpenFiler

So I was thinking about how to work around the issues I found in OpenFiler. Basically the inability to upgrade drivers is the critical issue.

Well, if OpenFiler never touches the hardware, this is much less of an issue.

Bear with me.


Basically, we sell JackRabbit as a server or an appliance, and pre-configure it so that our customers can pull it out of the box, stick it into the rack or on the floor, turn it on, and start working. We want this to be the case no matter what … that there is no installation effort required for them. In this sense, we want the hardware to be an appliance.

Ok.

OpenFiler is pretty good kit, as long as you don’t need to add/update drivers to handle the hardware.

So why should we let it?

Run it as a VM atop our normal base load. Have the VM start when the OS starts. Have the base system export its disk(s) via iSCSI or similar to the virtualized OpenFiler. You get all the benefits of OpenFiler, with none of the pain of making sure the system has drivers for the disks. It no longer needs them.

JackRabbit has more than enough power to run this in a VM. Will explore this on DeltaV as well.

If this works as well as I think it will, we should be able to support OpenFiler without worrying that its RAID card drivers are ancient, or that it has specific kernel issues dealing with other cards.

Run it in a VM.

Keep it away from the bare silicon.

Viewed 18337 times by 4927 viewers

6 thoughts on “A plan to work around the rPath issues in OpenFiler

  1. Drivers are definitely a challenge for any system which is prebaked, which software appliances certainly are. Linux has always had device driver limitations and nothing about an appliance approach makes that problem go away.

    Virtual appliances are definitely a good approach to resolve this problem by moving the driver requirements into the hypervisor. They’re also easier to install and manage in many ways.

    If you really want to build your own drivers, however, installing a build environment on top of open filer isn’t particularly hard. You’ll need to have some passing familiarity with our packaging system, conary. It’s documented at wiki.rpath.com. Basically installing group-devel from conary.rpath.com@rpl:2 will give you most of what you need.

  2. Sorry to respond to myself here…

    Conary always makes it really simple for you to take the openfiler system definition (the group), replace the kernel or add drivers, and generate new images based on your modified system. It would be quite easy to use this to create JackRabbit images which are derived from the OpenFiler system but includes the right drivers for your hardware.

  3. @Erik

    Sadly, I haven’t seen this in the documentation (simple, worked examples). I’ve personally found the documentation for rPath to be quite opaque.

    This said, I have tried what you suggested, with a fresh pull of the OpenFiler vm image. Just pulled it a few minutes ago, running it on my development machine.

    [root@dhcp1 ~]# conary update group-devel
    The following dependencies would not be met after this update:

    httpd:runtime=2.2.6-2.7-1 (Already Installed) requires:
    soname: ELF64/libapr-1.so.0(SysV x86_64)
    soname: ELF64/libaprutil-1.so.0(SysV x86_64)
    which was provided by:
    apr:lib=1.2.12-0.2-1 (Updated to 0.9.7-1-0.1) or apr-util:lib=1.2.12-0.1-1 (Updated to 0.9.7-1-0.1)

    So that, unfortunately, doesn’t work due to incorrect/unsolvable dependency.

    Expand it a bit. Lets try including apr:lib and apr-util:lib

    [long session not posted]

    similar errors. Ok, spend an hour googling and seeing if we can find out how to work around/force the issue. RTFM doesn’t seem to help here.

    Found the magical formula that got this to install:

    conary update group-devel apr:lib apr-util:lib –replace-managed-files

    and now, lets use our fio test case to see if we can build anything.

    [root@dhcp1 ~]# wget http://brick.kernel.dk/snaps/fio-1.26.tar.gz
    –13:57:24– http://brick.kernel.dk/snaps/fio-1.26.tar.gz
    => `fio-1.26.tar.gz’
    Resolving brick.kernel.dk… 93.163.65.50
    Connecting to brick.kernel.dk|93.163.65.50|:80… connected.
    HTTP request sent, awaiting response… 200 OK
    Length: 1,479,901 (1.4M) [application/x-gzip]

    100%[=========================================================================>] 1,479,901 194.50K/s ETA 00:00

    13:57:34 (180.88 KB/s) – `fio-1.26.tar.gz’ saved [1479901/1479901]

    [root@dhcp1 ~]# tar -zxvf fio-1.26.tar.gz
    fio/
    fio/README
    fio/flist.h
    fio/stat.c
    fio/options.c
    fio/fio.c
    fio/smalloc.c
    fio/parse.h
    fio/mutex.h
    fio/log.h
    fio/debug.h
    fio/fifo.c
    fio/rbtree.c
    fio/engines/
    fio/engines/net.c
    fio/engines/guasi.c
    fio/engines/posixaio.c
    fio/engines/cpu.c
    fio/engines/null.c
    fio/engines/sync.c
    fio/engines/solarisaio.c
    fio/engines/sg.c
    fio/engines/libaio.c
    fio/engines/skeleton_external.c
    fio/engines/splice.c
    fio/engines/mmap.c
    fio/engines/syslet-rw.c
    fio/fio.1
    fio/fio.h
    fio/hash.h
    fio/filesetup.c
    fio/fio_generate_plots
    fio/time.c
    fio/memory.c
    fio/crc/
    fio/crc/crc32c.h
    fio/crc/crc16.c
    fio/crc/md5.h
    fio/crc/crc32c-intel.c
    fio/crc/crc32c.c
    fio/crc/sha256.c
    fio/crc/sha512.c
    fio/crc/crc7.c
    fio/crc/crc64.c
    fio/crc/crc32.c
    fio/crc/crc64.h
    fio/crc/sha512.h
    fio/crc/crc32.h
    fio/crc/crc7.h
    fio/crc/sha256.h
    fio/crc/md5.c
    fio/crc/crc16.h
    fio/mutex.c
    fio/Makefile.FreeBSD
    fio/smalloc.h
    fio/os/
    fio/os/kcompat.h
    fio/os/os-solaris.h
    fio/os/os.h
    fio/os/os-linux.h
    fio/os/syslet.h
    fio/os/indirect.h
    fio/os/os-freebsd.h
    fio/rbtree.h
    fio/HOWTO
    fio/ioengines.c
    fio/COPYING
    fio/log.c
    fio/verify.c
    fio/.gitignore
    fio/filehash.c
    fio/REPORTING-BUGS
    fio/compiler/
    fio/compiler/compiler.h
    fio/compiler/compiler-gcc3.h
    fio/compiler/compiler-gcc4.h
    fio/lib/
    fio/lib/bswap.h
    fio/lib/ffz.h
    fio/lib/fls.h
    fio/lib/strsep.c
    fio/lib/strsep.h
    fio/fifo.h
    fio/io_u.c
    fio/filehash.h
    fio/diskutil.c
    fio/init.c
    fio/eta.c
    fio/blktrace_api.h
    fio/blktrace.c
    fio/Makefile
    fio/Makefile.solaris
    fio/parse.c
    fio/arch/
    fio/arch/arch-s390.h
    fio/arch/arch-x86_64.h
    fio/arch/arch-sparc.h
    fio/arch/arch-alpha.h
    fio/arch/arch.h
    fio/arch/arch-ppc.h
    fio/arch/arch-ia64.h
    fio/arch/arch-sparc64.h
    fio/arch/arch-x86.h
    fio/examples/
    fio/examples/tiobench-example
    fio/examples/netio
    fio/examples/surface-scan
    fio/examples/fsx
    fio/examples/aio-read
    fio/examples/1mbs_clients
    fio/examples/iometer-file-access-server
    fio/examples/disk-zone-profile
    fio/examples/ssd-test
    fio/.git/
    fio/.git/packed-refs
    fio/.git/description
    fio/.git/index
    fio/.git/objects/
    fio/.git/objects/pack/
    fio/.git/objects/pack/pack-aa997395542a4cfff1d4f02fe9e5c9158f0348ed.idx
    fio/.git/objects/pack/pack-aa997395542a4cfff1d4f02fe9e5c9158f0348ed.pack
    fio/.git/objects/info/
    fio/.git/objects/info/packs
    fio/.git/config
    fio/.git/logs/
    fio/.git/logs/refs/
    fio/.git/logs/refs/heads/
    fio/.git/logs/refs/heads/master
    fio/.git/logs/refs/remotes/
    fio/.git/logs/refs/remotes/origin/
    fio/.git/logs/refs/remotes/origin/HEAD
    fio/.git/logs/HEAD
    fio/.git/refs/
    fio/.git/refs/heads/
    fio/.git/refs/heads/master
    fio/.git/refs/tags/
    fio/.git/refs/remotes/
    fio/.git/refs/remotes/origin/
    fio/.git/refs/remotes/origin/HEAD
    fio/.git/hooks/
    fio/.git/hooks/update.sample
    fio/.git/hooks/pre-applypatch.sample
    fio/.git/hooks/pre-commit.sample
    fio/.git/hooks/applypatch-msg.sample
    fio/.git/hooks/pre-rebase.sample
    fio/.git/hooks/post-update.sample
    fio/.git/hooks/post-commit.sample
    fio/.git/hooks/commit-msg.sample
    fio/.git/hooks/prepare-commit-msg.sample
    fio/.git/hooks/post-receive.sample
    fio/.git/branches/
    fio/.git/HEAD
    fio/.git/info/
    fio/.git/info/exclude
    fio/.git/info/refs
    fio/gettime.c
    [root@dhcp1 ~]# cd fio
    [root@dhcp1 fio]# make
    CC gettime.o
    gettime.c:5:20: unistd.h: No such file or directory
    gettime.c:6:22: sys/time.h: No such file or directory
    In file included from gettime.c:8:
    fio.h:4:19: sched.h: No such file or directory
    In file included from /usr/lib64/gcc/x86_64-unknown-linux/3.4.4/include/syslimits.h:7,
    from /usr/lib64/gcc/x86_64-unknown-linux/3.4.4/include/limits.h:11,
    from fio.h:5,
    from gettime.c:8:
    /usr/lib64/gcc/x86_64-unknown-linux/3.4.4/include/limits.h:122:61: limits.h: No such file or directory
    In file included from gettime.c:8:
    fio.h:6:21: pthread.h: No such file or directory
    fio.h:8:26: sys/resource.h: No such file or directory
    fio.h:9:19: errno.h: No such file or directory
    fio.h:10:20: stdlib.h: No such file or directory
    fio.h:11:19: stdio.h: No such file or directory
    fio.h:13:20: string.h: No such file or directory
    fio.h:14:20: getopt.h: No such file or directory
    fio.h:15:22: inttypes.h: No such file or directory
    fio.h:16:20: assert.h: No such file or directory
    In file included from fio.h:19,
    from gettime.c:8:
    flist.h: In function `flist_del’:
    flist.h:97: error: `NULL’ undeclared (first use in this function)
    flist.h:97: error: (Each undeclared identifier is reported only once
    flist.h:97: error: for each function it appears in.)
    In file included from fio.h:21,
    from gettime.c:8:
    rbtree.h: In function `rb_link_node’:
    rbtree.h:148: error: `NULL’ undeclared (first use in this function)
    In file included from os/os.h:5,
    from fio.h:23,
    from gettime.c:8:
    os/os-linux.h:4:23: sys/ioctl.h: No such file or directory
    os/os-linux.h:5:21: sys/uio.h: No such file or directory
    os/os-linux.h:6:25: sys/syscall.h: No such file or directory
    os/os-linux.h:8:19: fcntl.h: No such file or directory
    In file included from os/syslet.h:4,
    from os/indirect.h:4,
    from os/os-linux.h:14,
    from os/os.h:5,
    from fio.h:23,
    from gettime.c:8:
    os/kcompat.h:4:20: stdint.h: No such file or directory
    In file included from os/os.h:5,
    from fio.h:23,
    from gettime.c:8:
    os/os-linux.h: At top level:
    os/os-linux.h:40: error: parse error before “os_cpu_mask_t”
    os/os-linux.h:40: warning: type defaults to `int’ in declaration of `os_cpu_mask_t’
    os/os-linux.h:40: warning: data definition has no type or storage class
    os/os-linux.h:71: error: parse error before ‘*’ token
    os/os-linux.h: In function `fio_cpuset_init’:
    os/os-linux.h:73: warning: implicit declaration of function `CPU_ZERO’
    os/os-linux.h:73: error: `mask’ undeclared (first use in this function)
    os/os-linux.h: At top level:
    os/os-linux.h:77: error: parse error before ‘*’ token
    os/os-linux.h: In function `ioprio_set’:
    os/os-linux.h:86: warning: implicit declaration of function `syscall’
    os/os-linux.h: At top level:
    os/os-linux.h:113: warning: “struct iovec” declared inside parameter list
    os/os-linux.h:113: warning: its scope is only this definition or declaration, which is probably not what you want
    os/os-linux.h: In function `async_exec’:
    os/os-linux.h:132: warning: cast to pointer from integer of different size
    os/os-linux.h: In function `blockdev_invalidate_cache’:
    os/os-linux.h:180: warning: implicit declaration of function `ioctl’
    os/os-linux.h:180: warning: implicit declaration of function `_IO’
    os/os-linux.h: In function `blockdev_size’:
    os/os-linux.h:185: warning: implicit declaration of function `_IOR’
    os/os-linux.h:185: error: parse error before “size_t”
    os/os-linux.h:188: error: `errno’ undeclared (first use in this function)
    os/os-linux.h: In function `os_phys_mem’:
    os/os-linux.h:195: warning: implicit declaration of function `sysconf’
    os/os-linux.h:195: error: `_SC_PAGESIZE’ undeclared (first use in this function)
    os/os-linux.h:196: error: `_SC_PHYS_PAGES’ undeclared (first use in this function)
    os/os-linux.h: In function `os_random_seed’:
    os/os-linux.h:205: warning: implicit declaration of function `srand48_r’
    os/os-linux.h: In function `os_random_long’:
    os/os-linux.h:212: warning: implicit declaration of function `lrand48_r’
    os/os-linux.h: In function `fio_lookup_raw’:
    os/os-linux.h:221: warning: implicit declaration of function `major’
    os/os-linux.h:227: warning: implicit declaration of function `open’
    os/os-linux.h:227: error: `O_RDONLY’ undeclared (first use in this function)
    os/os-linux.h:234: warning: implicit declaration of function `minor’
    os/os-linux.h:236: warning: implicit declaration of function `close’
    In file included from fio.h:23,
    from gettime.c:8:
    os/os.h:15:20: libaio.h: No such file or directory
    os/os.h:19:17: aio.h: No such file or directory
    os/os.h:24:21: scsi/sg.h: No such file or directory
    In file included from fio.h:24,
    from gettime.c:8:
    mutex.h: At top level:
    mutex.h:7: error: parse error before “pthread_mutex_t”
    mutex.h:7: warning: no semicolon at end of struct or union
    mutex.h:8: warning: type defaults to `int’ in declaration of `cond’
    mutex.h:8: warning: data definition has no type or storage class
    mutex.h:13: error: parse error before ‘}’ token
    mutex.h: In function `fio_mutex_getval’:
    mutex.h:31: error: dereferencing pointer to incomplete type
    In file included from fio.h:25,
    from gettime.c:8:
    log.h: At top level:
    log.h:4: error: parse error before ‘*’ token
    log.h:4: warning: type defaults to `int’ in declaration of `f_out’
    log.h:4: warning: data definition has no type or storage class
    log.h:5: error: parse error before ‘*’ token
    log.h:5: warning: type defaults to `int’ in declaration of `f_err’
    log.h:5: warning: data definition has no type or storage class
    log.h:18: error: parse error before ‘*’ token
    log.h:18: warning: type defaults to `int’ in declaration of `get_f_out’
    log.h:18: warning: data definition has no type or storage class
    log.h:19: error: parse error before ‘*’ token
    log.h:19: warning: type defaults to `int’ in declaration of `get_f_err’
    log.h:19: warning: data definition has no type or storage class
    In file included from gettime.c:8:
    fio.h:123: error: field `iocb’ has incomplete type
    fio.h:126: error: field `aiocb’ has incomplete type
    fio.h:129: error: field `hdr’ has incomplete type
    fio.h:138: error: field `start_time’ has incomplete type
    fio.h:139: error: field `issue_time’ has incomplete type
    fio.h:375: error: field `stat_sample_time’ has incomplete type
    fio.h:380: error: field `ru_start’ has incomplete type
    fio.h:381: error: field `ru_end’ has incomplete type
    fio.h:489: error: parse error before “os_cpu_mask_t”
    fio.h:489: warning: no semicolon at end of struct or union
    fio.h:532: error: parse error before ‘}’ token
    fio.h:540: error: field `o’ has incomplete type
    fio.h:542: error: parse error before “pthread_t”
    fio.h:542: warning: no semicolon at end of struct or union
    fio.h:554: error: field `next_file_state’ has incomplete type
    fio.h:555: warning: unnamed struct/union that defines no instances
    fio.h:564: error: ‘ioprio_set’ redeclared as different kind of symbol
    os/os-linux.h:85: error: previous definition of ‘ioprio_set’ was here
    fio.h:571: error: parse error before ‘*’ token
    fio.h:571: warning: type defaults to `int’ in declaration of `iolog_f’
    fio.h:571: warning: data definition has no type or storage class
    fio.h:662: error: parse error before ‘}’ token
    fio.h: In function `fio_ro_check’:
    fio.h:724: warning: implicit declaration of function `assert’
    fio.h:724: error: dereferencing pointer to incomplete type
    fio.h: In function `should_fsync’:
    fio.h:736: error: dereferencing pointer to incomplete type
    fio.h:738: error: dereferencing pointer to incomplete type
    fio.h:740: error: dereferencing pointer to incomplete type
    fio.h:740: error: dereferencing pointer to incomplete type
    fio.h:740: error: dereferencing pointer to incomplete type
    fio.h: At top level:
    fio.h:786: error: field `time’ has incomplete type
    fio.h:845: error: ‘ramp_time_over’ redeclared as different kind of symbol
    fio.h:629: error: previous declaration of ‘ramp_time_over’ was here
    fio.h:856: warning: “struct option” declared inside parameter list
    fio.h:949: error: ‘io_u_queued’ redeclared as different kind of symbol
    fio.h:592: error: previous declaration of ‘io_u_queued’ was here
    fio.h:968: warning: “struct timespec” declared inside parameter list
    fio.h:992: warning: “struct timespec” declared inside parameter list
    fio.h: In function `fio_file_reset’:
    fio.h:1040: warning: implicit declaration of function `memset’
    fio.h: In function `clear_error’:
    fio.h:1045: error: dereferencing pointer to incomplete type
    fio.h:1046: error: dereferencing pointer to incomplete type
    fio.h: In function `dprint_io_u’:
    fio.h:1054: warning: implicit declaration of function `getpid’
    fio.h:1054: warning: implicit declaration of function `fprintf’
    fio.h: In function `fio_fill_issue_time’:
    fio.h:1070: error: dereferencing pointer to incomplete type
    fio.h:1071: error: dereferencing pointer to incomplete type
    fio.h:1071: error: dereferencing pointer to incomplete type
    fio.h:1071: error: dereferencing pointer to incomplete type
    gettime.c: In function `fio_gettime’:
    gettime.c:125: warning: implicit declaration of function `memcpy’
    gettime.c:125: error: dereferencing pointer to incomplete type
    gettime.c:129: warning: implicit declaration of function `gettimeofday’
    gettime.c:131: error: storage size of ‘ts’ isn’t known
    gettime.c:133: warning: implicit declaration of function `clock_gettime’
    gettime.c:138: error: dereferencing pointer to incomplete type
    gettime.c:139: error: dereferencing pointer to incomplete type
    gettime.c:131: warning: unused variable `ts’
    gettime.c:147: error: dereferencing pointer to incomplete type
    gettime.c:147: error: invalid use of undefined type `struct timeval’
    gettime.c:148: error: dereferencing pointer to incomplete type
    gettime.c:148: error: invalid use of undefined type `struct timeval’
    gettime.c:149: error: invalid use of undefined type `struct timeval’
    gettime.c:149: error: dereferencing pointer to incomplete type
    gettime.c:150: error: dereferencing pointer to incomplete type
    gettime.c:150: error: invalid use of undefined type `struct timeval’
    gettime.c:151: error: dereferencing pointer to incomplete type
    gettime.c:151: error: invalid use of undefined type `struct timeval’
    gettime.c:154: error: dereferencing pointer to incomplete type
    gettime.c: In function `fio_gtod_init’:
    gettime.c:159: error: invalid application of `sizeof’ to incomplete type `timeval’
    gettime.c: At top level:
    fio.h:577: error: storage size of `bsrange_state’ isn’t known
    fio.h:578: error: storage size of `verify_state’ isn’t known
    fio.h:604: error: storage size of `lastrate’ isn’t known
    fio.h:619: error: storage size of `random_state’ isn’t known
    fio.h:621: error: storage size of `start’ isn’t known
    fio.h:622: error: storage size of `epoch’ isn’t known
    fio.h:623: error: storage size of `rw_end’ isn’t known
    fio.h:624: error: storage size of `last_issue’ isn’t known
    fio.h:625: error: storage size of `tv_cache’ isn’t known
    fio.h:634: error: storage size of `rwmix_state’ isn’t known
    fio.h:661: error: storage size of `file_size_state’ isn’t known
    gettime.c:14: error: storage size of `last_tv’ isn’t known
    make: *** [gettime.o] Error 1

    Short version? No.

    For laughs, I also pulled down the specific driver. And, many error lines later (missing headers and all that), it is fairly obvious that this doesn’t work.

    Ok. So how does one actually get a full fledged development environment into an rPath appliance? This is the $64k question.

    Without a good answer (and no, calling them an appliance and denying the need for this is decidedly not a good answer), this effectively relegates rPath appliances to pure virtual operation. Which was my original speculation as to the right way to deploy them.

    Fundamentally, this is fine, as long as it is indicated a-priori “don’t install this appliance on bare metal as you cannot add drivers, compile security updates, etc.”

    Our Delta-V environment is more than powerful enough to host one or more VMs, our JackRabbit even more so.

    This is part of the reason that closed appliance models really worry me. We can’t support anything but what the appliance builder chooses for us, to support. So if they choose to support an old driver, or no driver, or use an insecure ssh, or … then we are either SOL unless we build a complete build environment for rPath, or beg the appliance maker to fix this.

    All rPath has to do is to add a complete-build-environment rpl. Thats it. Would solve the problem.

    They don’t have to though. I suspect they won’t.

  4. Hi Joe,

    I think you’re pointing the finger in the wrong direction. Certainly if you’re an (software) appliance builder, you can’t get any better development workflow than with conary/rbuilder. The problem here is that the Openfiler group is built from a set of components pulled from different branches, with a smattering of our own individual packages rolled in. From a standing start to a sane build environment there would be a number of hoops to jump through. OTOH if you were to take a general purpose rpl-based distro, such as Foresight, getting stuff to build would be easy as apple pie.

    As for a complete-build-environment rpl — that’s incumbent on the appliance builder to provide – since each appliance build environment will differ from the next based on the package set that makes up the appliance.

    Documentation on wiki.rpath.com is *extensive*. That may be the problem. There’s so much of it and several new concepts to grapple that it does not necessarily lend itself well to situations where you just need to get some single package built and move on to other issues of the day.

    If you need specific drivers to support a hardware configuration drop us a line and we’ll work something out :).

    R.

  5. @Rafiu

    I did figure out how to do some builds. But I can’t build drivers. So some of the bug fixes suggested by RAID vendors, or updated network drivers become effectively impossible to incorporate into the system.

    I understand Erik’s point on this being a function of the appliance vendor. I understand your point about openfiler being an assemblage of different troves. The issue is that we have now run into several cases where in order to solve a customer issue, we have had to build drivers, or update kernels.

    If we can’t do this (and for the moment, we cannot), then we can’t let OF control the hardware. We can run it as a service, and I will be testing this next week. If it works as well as I hope, we should be able to solve all the issues by insulating the needed changes from OF. OF can hook into our system through iSCSI. We can have OF seem to run on it, but manage the underlying system seperately. Not an ideal situation, but at least a workable one.

Comments are closed.