NyBLE

So there I am updating my new repository to enable using zfs in the ramboot images. This is a simplification and continuation of the previous work I did a few years ago, with some massive code cleanups. And sadly, no documentation yet. Will fix soon, but for now, I am trying to hit the major functionality points.

NyBLE is a linux environment for hypervisor hosts. It builds on the old open source SIOS work, and extends it in significant ways. Basically, we want to be able to PXE boot (or locally boot if you prefer, but managing OS or boot drives is something you never want to consider at scale) a full working OS into a ramdisk.

I have this working nicely for Debian 9, 8 and others, CentOS 7 (and RHEL 7 by extension). Eventually I’ll add others.

The purpose of NyBLE is to build what we want to use, not what the distro or image builder wants us to use. It is not a linux-from-scratch. It starts with a base distribution, so as to make questions about “but its X and we like Y, so why are you forcing us to change?” largely irrelevant.

OSes are, generally, a detail of an implementation. Distros are packagings of OSes in particular ways, that in theory, should impart some value to the users and consumers.

In the past, this may have been quite true. Currently, the only real values distros provide are centralized bug databases, security patches, and approximately working configurations.

Distros carry significant baggage. Ancient toolchains, often EOLed for years are being shipped with “modern” distros. Just remember they are more “stable”. Or something.

Distros often have unfortunate dependency radii. You can’t simply install package X, as it also depends on some functionality on Y and Z, which themselves depend upon … . Which means, building small footprint installations with all the functionality you need, say, ideal for memory booting, becomes fraught with peril of unmet dependencies.

While this is all … well … a fun challenge, think of it as fitting together puzzle pieces, where you only approximately know the shape of the pieces and the picture they display. And of course, each attempt iteration at fitting things together, could take minutes to hours …

… now add to this a temporal dimension. Suppose you want to compile support for a subsystem in the kernel. This support requires that the kernel-devel tree exist for a number of reasons. And suppose the kernel-devel tree is very large. Which makes PXE booting impossible.

So … now what you need to do is to temporarily load the tree, build and install the modules, and then remove the tree.

Sort of like cooking, where you need essence of kernel. But not the thing providing the essence as it would make the image too large.

This is the joy I am working through now. Works great BTW, in debian. Not so well in CentOS. Huge tree size.

The things we do to make end users lives easier and friction free …

Viewed 2172 times by 577 viewers

Dealing with disappointment

In the last few years, I’ve had major disappointments professionally. The collapse of Scalable, some of the positively ridiculous things associated with the aftermath of that, none of which I’ve written about until they are over. Almost over, but not quite. Waiting for confirmation. My job search last year, and some of the disappointment associated with that.

Recently I’ve had different type of disappointments, without getting into details. The way I’ve dealt with these things in the past has been to try to understand if there was a conflict, what could I have done better. Regardless of whether I was in the right or the wrong, to try to look at things from a different viewpoint.

Dealing with the collapse of Scalable has been a challenge, personally and professionally. I can’t get into all the reasons why right now, as we are awaiting conclusions on a few things. Dealing with what came next, from the groups that thought they would cherry pick the remains, and tie me into their organizations on the cheap, was tremendously frustrating.

Value for me would have been help with relief of the personal monetary burden imposed. Not indentured servitude, which what was effectively proposed. Doing due diligence on some of the folks who reached out to me suggested that this would be their behavior. I naively hoped against hope that this time would be different, that their past behavior would not be reflected in dealings.

Yeah, I’ve got lots to write about there someday. This I am dealing with, and I’ll eventually write about it.

How to deal with disappointment … where there is conflict, I try to work with people to understand the nature of the conflict. I’ve found in the past that keyboards and asynchronous messaging systems tend to amplify contact, but personal contact, phone calls, enable you to more accurately express thoughts and listen.

As long as I am conversing with a person who wants to get to a point of resolution or even agree to disagree, I can manage. What I find disappointing is, when this is not the case. When the people involved seem more focused upon the argument, or a narrative that doesn’t mesh with reality. I take people to task for stuff like that. On my teams, on other teams.

As Mark Twain may have once written

A lie can travel halfway around the world before the truth can get its boots on

Information asymmetry is a tool of those who wish to manipulate. If you inject falsehoods into discussions, you can get your opponents to waste their energy countering a massive flow of accusations. This is basically how political agitprop channels work here in the US.

It is sometimes called “throw stuff at the wall and see what sticks” but in parallel, with many throwers. Usually done to protect ideologies. The instigators of such know they are on shaky ground. Many are not thinking big picture, and “be careful what you wish for, you just might get it.”

I try to turn conflict into a way to improve myself. This way, disappointment that a conflict exists become a mechanism for self reflection.

But when that conflict is being seeded with nefarious motives, and people I otherwise genuinely respect and like are actively engaged in it … yeah, not so much I can do there other than to disengage, let their fire cool.

Later on, when they are ready to re-engage, I can decide if I really should or not, as past unprofessional behavior may indicate significant issues with potential future unprofessional behavior. I am too old for drama, and intrigue. I don’t have time to waste on stupid arguments, and ridiculous narratives.

Attack the premise. Not the person. Reflect deeply before attacking the person. Accusations have a way of metastitising. Asynchronous message systems have a positive feedback loop on these, while phone calls have a negative feedback loop.

Someone offers to call you to try to understand your issues, take em up on it. Chances are they’ve dealt with bad stuff before and know that this is the better route to take.

Viewed 6186 times by 1167 viewers

Late Feb 2018 update

Again, many apologies over the low posting frequency. Several things that are nearing completion (hopefully soon) that I want finalized first.

That said, the major news is that this site is now on a much improved server and network. I’ve switched from Comcast Business to WOW business. So far, much better speed, more consistent performance, far lower cost per bandwidth.

I do have lots to write about, and have been saving things up until after this particular objective is met, so I can work/write distraction free.

More soon.

Viewed 75449 times by 9269 viewers

Apologies on the slow posting rate

Many things are going on simultaneously right now, and I have little time to compose thoughts for the blog. I anticipate a bit of a letup in the next week or two as the year comes to a close.

Viewed 172477 times by 18401 viewers

Cool bug on upgrade (not)

WordPress is an interesting beast. Spent hours working through issues that I shouldn’t have needed to on an upgrade, as some functions were deprecated.

In an interesting way. By removing them, and throwing an error. Which I found only through looking at a specific log.

So out goes that plugin. And the site is back.

Viewed 209388 times by 21423 viewers

#SC17

I’ve had numerous requests from friends and colleagues about whether I will be attending #SC17 this year. Sadly, this is not to be the case. $dayjob has me attending an onsite meeting that week in San Francisco, and the schedule was such that I could not attend the talks I was interested in.

I’d love for there to be a way to listen to the talks remotely. Maybe I’ll simply buy the DVD/USB stick of the talks if there is an online store for them.

Next year at #SC18 in Dallas if possible.

Enjoy, have fun @BeowulfBash, and please tweet/post what you see and hear.

And, for those whom are not aware of some of the most awesome hardware out there for big data analytics and deep learning, have a look at @thedeadline and Basement Supercomputing. Best in market, designed and built by people who know how to use the machines, what they are used for and why.

(unpaid/uncompensated endorsement … get out and support the small HPC guys, the ones who actually know what they are doing).

Viewed 222036 times by 21972 viewers

Disk, SSD, NVMe preparation tools cleaned up and on GitHub

These are a collection of (MIT licensed) tools I’ve been working on for years to automate some of the major functionality one needs when setting up/using new machines with lots of disks/SSD/NVMe.

The repo is here: https://github.com/joelandman/disk_test_setup . I will be adding some sas secure erase and formatting tools into this.

These tools wrap other lower level tools, and handle the process of automating common tasks you worry about when you are setting up and testing a machine with many drives. Usage instructions are in the code at the top … I will eventually add better documenation.

Here are the current list of tools (note: they aren’t aware of LVM yet, let me know if you would like this):

  • disk_mkfsxfs.pl : Takes every disk or ssd that is not mounted or part of an MD RAID, and creates a file system on the disk, a mount point under a path of your choosing (/data by default). This is used with the disk_fio.pl code.
  • disk_fio.pl : generate simple disk fio test cases, in which you can probe the IO performance to file system for many devices. The file names it generates include case type (read, write, randread, randwrite, randrw), blocksize and number of simultaneous IOs to each LUN. This will leverage the mount point you provide (defaults to /data) and will create test directories below this so you don’t get collisions. To run the test cases, you need fio installed.
  • disk_wipefs.pl : removes any trace of file system metadata on a set of drives if they are not mounted and not part of an existing MDRAID.
  • ssd_condition.pl : runs a conditioning write on a set of SSDs if they are not mounted or part of an MDRAID. If you are setting up an SSD based machine, you are of course, running something akin to this before using the SSDs … right? Otherwise, you’ll get a nasty performance and latency shock after you transition from the mostly unallocated block scenario to the completely allocated scenario. Especially painful when the block compression/garbage collection passes come through to help the FTL find more space to write your blocks. You can tell you are there if you see IO pauses after long sequences of writes. Also, condition helps improve the overall life of the drive. See this presentation around slide 7 and beyond for more info.
  • sata_secure_erase.pl : which will completely wipe the SSD (will work with rotational media as well).

The user interface for these tools are admittedly spartan and documented at the top of the code itself. This will improve over time. Have at them, MIT licensed, and please let me know if you use them or find them useful.

Viewed 283358 times by 25672 viewers

Aria2c for the win!

I’ve not heard of aria2c before today. Sort of a super wget as far as I could tell. Does parallel transfers to reduce data motion time, if possible.

So I pulled it down, built it. I have some large data sets to move. And a nice storage area for them.

Ok.

Fire it up to pull down a 2GB file.

Much faster than wget on the same system over the same network. Wow.

Then the rest of the ML data set. About 120GB in all.

Yeah, this is a good tool. Need to make sure it is on all our platforms.

Sort of like gridftp but far more flexible.

Definitely a good tool.

Viewed 253014 times by 24087 viewers

Working on benchmarking ML frameworks

Nice machine we have here …

root@hermes:/data/tests# lspci | egrep -i '(AMD|NVidia)' | grep VGA
3b:00.0 VGA compatible controller: NVIDIA Corporation GP100GL (rev a1)
88:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XTX [Radeon Vega Frontier Edition]

I want to see how tensorflow and many others run on each of the cards. The processor is no slouch either:

root@hermes:/data/tests# lscpu | grep "Model name"
Model name:            Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz

Missing a few things for this, as Amazon is a bit late on shipping some of the needed parts, but hopefully soon, I’ll be able to get everything in there.

Looking at the integrated Tensorflow benchmarks, which require image-net, as well as others. Feel free to point more out to me … happy to run some nice baseline/direct comparisons. I’d prefer open (sharable/distributable) benchmarks (alas image net isn’t precisely this, I put in my request for download).

Everything else is fair game though. Planning on publishing what I find.

Viewed 215375 times by 21665 viewers