when you eliminate the impossible, what is left, no matter how improbable, is likely the answer

This is a fun one.

A customer has quite a collection of all-flash Unison units. A while ago, they asked us to turn on LLDP support for the units. It has some value for a number of scenarios. Later, they asked us to turn it off. So we removed the daemon. Unison ceased generating/consuming LLDP packets.

Or so we thought.

Fast forward to last week.

We are being told that LLDP PDUs are being generated by the kit. I am having trouble believing this. As we removed the LLDP daemon from the OS load, and there is nothing in the OS or driver stack consuming/producing those.

We worked back and force, and I got a packet trace, clearly showing something that should not be possible. Something highly improbable.

So then I looked deeper. Really, no LLDP daemon on there at all.

If there was, I should see LLDP packets being passed into the ring buffer, and visible in packet captures.

So I started capturing packets.

Lo and behold … nothing. Nada. Zippo. Zilch.

No LLDP packets passed up the stack.

Customer reset counters, we tried again. They saw the packets. I didn’t.

So, here are some impossible things I can eliminate.

  1. The OS is generating/consuming LLDP packets. It is not. This is provable.
  2. The switch is lying about LLDP packets. It is not. This is provable.
  3. There is no 3.
  4. The hardware is failing. It is not. This is provable.
  5. Russian hackers? No … not possible.

What I am left with, however unlikely, must be a possibility.

That the NIC, without passing this information back up the stack, is generating and consuming LLDP PDU broadcast packets, or the switch is misbehaving.

As much as I don’t like the first, it is possible. THe second is also possible, but I only have control over the first, so let me work on that.

Normally, spurious packets don’t bug me. Transient “ghost daemon in the machine” phenomenon need to be looked at, and traced down, but rarely do they have an impact. In this case, the daemon may be in hardware, outside of the control plane (via the driver), and not on the same data plane.

This phenomenon is causing the switch to shut down ports after not receiving more LLDP packets. So it is spurious. Transient.

And there is a failure cascade after this. The switch shutting down ports takes a metadata server for a parallel file system offline. After which, the wrong type of hilarity ensues.

Yes, we can likely have them configure the switch so as to ignore LLDP packets. But that is aside from the point, in that the system shouldn’t be generating/consuming them by default on its own, without a kernel or user space control over it. And they should be propagated up the stack.

One possible solution is to replace the NIC. We may pursue this, but it wouldn’t be a bad thing to also try to isolate and solve this problem. We have to weigh the impact of either course and decide what to do. Until then, temporary workaround it to shut off the LLDP port toggling here.

Viewed 48280 times by 3244 viewers

Virtualized infrastructure, with VM storage on software RAID + a rebuild == occasional VM pauses

Not what I was hoping for. I may explain more of what I am doing later (less interesting than why I am doing it), but suffice it to say that I’ve got a machine I’ve turned into a VM/container box, so I can build something I need to build.

This box has a large RAID6 for storage. Spinning disk. Fairly well optimized, I get good performance out of it. The box has ample CPU, and ample memory.

The VM bulk storage points over the the spinning disk RAID6, not the SSD RAID10.

I noted a failing drive, so I ejected it and swapped it out for a working one. RAID rebuild started, and now I’ve got another couple of hours before it finishes. 6 VMs are consuming maybe 25% of the CPU cycles when busy, and about 25% of the RAM in total. The machine is otherwise idle.

And when I log into one of the VMs, I am getting dramatic pauses, while there is no real load going on. Nothing in the process table. Yet the load average is wound up a little, which usually happens when IO is paused.

Sure enough … this looks like what is happening. I am going to explore these code paths somewhat more. Fairly modern 4.4.x kernel, so its not likely a long looming bug of the 3.10/3.16 variety.


Viewed 47167 times by 2865 viewers

A new #HPC project on github, nlytiq-base

Another itch I’ve been wanting to scratch for a very long time. I had internal versions of a small version of this for a while, but I wasn’t happy with them. The makefiles were brittle. The builds, while automated, would fail, quite often, for obscure reasons.

And I want a platform to build upon, to enable others to build upon. Not OpenHPC which is more about the infrastructure one needs for building/running high performance computing systems. That is a good effort, though it also needs .debs for Ubuntu/Debian, or even better, source and Makefiles.

What I wanted here was a set of analytical and programming tools for working with data. Specifically, up to date tools, modern … not end-of-life packaged tools that are so badly out of date, that you can’t install modern extensions to them, or use them to bootstrap the tools you need.

So the github repo is here. This is very early release of the tool chain build environment. You can configure everything from base.config, and run make. It will take a while, but it will eventually result in a fully populated analytical tree.

One gotcha now will be the ATLAS build. I need to set up detection to see if there exists on machine blas/lapack/atlas, as ATLAS wants you to turn off processor throttling to build, or it fails in a strange way. I’ll add in some code to detect this. Specifically, I’ll see if I can force affinity for a specific processor and have it build on that. Not optimal, but better than failing. If this is not possible, I’ll look for the lapack/blas/atlas libs on the main unit. If they are there, great, we’ll use them. Otherwise, in the worst case, if we can’t do any of these, I’ll build the slow versions.

I certainly would like to get feedback from people on what they might want in this, what additional R/Python/Go/Node/Perl packages they want embedded. And whether or not they want a mountable compressed file system image, a docker image, or whatnot else.

My plan is to use this as a base for something else I’ve been wanting to build.

More later, but its a start.

Viewed 42561 times by 2578 viewers

There are real, and subtle differences between su and sudo

Most of the time, sudo just works. Every now and then, it doesn’t. Most recently was with a build I am working on, where I got a “permission denied” error for creating a directory.

The reason for this was non-obvious at first. You “are” superuser after all when you sudo, right? Aren’t you?

Sort of.

Your effective user ID has been set to the superuser. Your real user ID still is yours. This means things like your temp directory are not necessarily yours … er … the real user ID of the temp directory owner might be different from the effective user ID you are building as. And if you have a root_squash on an NFS mount, or your system uses one or the other security mechanisms to prevent privilege escalation … here be dragons.

So it seems, during a build of rust 0.14.0, I ran head first into this. I will freely admit that my mouth was agape for a bit. I will not admit to drool falling out, and have rapidly deleted any such webcam video.

Ok, more seriously, it was a WTF moment. Took me a second to understand, as my prompt says # when I sudo -s. The make was run as sudo. The make command failed under sudo with a permission (!!!@@!@@!) error. Then a fast ‘su’ and off to the races we went.


While I want to dig into this more, my goal here was building rust in a reliable and repeatable manner. I don’t have that going quite yet. Very close, but I’ve now run into LLVM/clang oddities, and have switched back to gcc for it. Build completes now, but install is still problematic because of this issue.

I could just build as root user, and other build environments I’ve built do that. I’ve been trying to get away from that, as it is a bad habit, and an errant make file could wreak havoc. But the converse is also true, in that during installation, often you need to be root to install into specific paths.

I can change that assumption, and create a specific path owned by a specific user, and off to the races I go. I prefer that model, and then let the admins set up sudo access to the tree.

Viewed 41622 times by 2292 viewers

Combine these things, and get a very difficult to understand customer service

In the process of disconnecting a service we don’t need anymore. So I call their number. Obviously reroutes to a remote call center. One where english is not the primary language.

I’m ok with this, but the person has a very thick and hard to understand accent. Their usage and idiom were not American, or British English. This also complicates matters somewhat, but I am used to it. I can infer where they were from, from their usage. It was very common in my dealings with other people there.

Of course, this isn’t bad enough.

The call center is busy, and you can hear lots of background noise.

Of course, this isn’t bad enough.

Now add a poor VOIP connection. I was doing this over a cell phone, and my connection is generally quite good … I’ve been on many hour long con calls over this phone, headset, etc. from this location. Its not an ultra busy part of the day. So I am not getting dropped connections. I have a major US carrier for the cell. So its not a tower congestion problem.

Likely a backhaul problem shipping the voice bits halfway around the world and back, on a congested/contended for link. Noticeable delays in response. Ghosting/echoing. All manner of artifacts.

Of course, this isn’t bad enough.

Finally, add a crappy mic on the remote person’s head set.

End result was, I had to struggle to understand the person. Really struggle. Some of it was guessing what they were saying. Some was not.

I have to wonder aloud, whether companies in search of cost reduction, think its a good idea to make it hard to understand the support staff, by a combination of language usage, poor equipment, substandard networking, etc.

I guess it is amusing that this is a large “business ISP” here in the US.

At bare minimum, they should have the headsets upgraded, the network (ha!) upgraded, and the work area more noise isolated so that you get less of these issues to deal with. Hiring people whom speak with less of a thick accent is also recommended, or conversely, training them on how to adapt their elocution so as to be more understandable.

I, as an escaped New Yorker, probably shouldn’t be answering phones myself (Hey, wassamadda for you?) … but seriously … at least make an effort on this.

Viewed 41339 times by 2182 viewers

SSD/flash/memory shortage, day N+1

There has been a huge demand of SSD/Flash/memory components from a number of end users. Sadly not the day jobs customers … but enough to deplete the market of supply.

Watching basic economics at work is fascinating.

Supply is highly constrained, while demand is rising. Couple that with a (mis)expectation of continuous falling prices across the board leads to interesting conversations with customers.

We’ve tried to set expectations appropriately, but we’ve been bitten in the past by doing just this. That is, by being honest and up-front with our customers that some things will take more time to get, and cost more, we’ve watched customers go to different vendors, hear a different message, and then be screwed over as we weren’t being dishonest … while the other vendor was.

In another post, I said this was getting to me.

We’ve been advising customers placing orders 2+ months in advance for some specific sets of parts in very short supply. It does take some time for manufacturing to ramp up, and OEMs are in no hurry to flood a market and lower the effective purchase price (and their profits).

Yet, I am still seeing people think that parts are available with a quick phone call. For a large enough order (more than 1 or 2 systems worth), you need to get an allocation, and you need to get in queue for that allocation. That queue can be long. Other larger orders can and will bump you in queue. And the direct customers for the OEMs that bought all the product last time might just do it again. I’d call this highly likely. These aren’t the Dells, HPEs, etc. of the world. Go ahead and guess who might be doing this. And note that shortages in the broader market serve to underscore a portion of their message.

Many folks are building out their backlog from inventory, though their inventories aren’t deep, as the products age fast, and become obsolete in short time intervals. Many do just-in-time building. For those of us doing that, this is becoming painful at best.

Yeah, this is getting to me.

Viewed 25144 times by 2038 viewers

A new (old) customer for the day job

Our friends at MSU HPCC now are the proud owners of a very fast/high performance Unison Flash storage system, and a ZFS backed high performance Unison storage spinning disk unit. Installed first week of Jan 2017.

As MSU is one of my alma mater institutions, I am quite happy about helping them out with this kit.

They’ve been a customer previously; they had bought some HPC MPI/OpenMP programming training in the dim and distant past.

Viewed 24469 times by 1640 viewers

Architecture matters, and yes Virginia, there are no silver bullets for performance

Time and time again, the day job had been asked to discuss how the solutions are differentiated. Time and time again, we showed benchmarks on real workloads that show significant performance deltas. Not 2 or 3 sigma measurements. More often than not, 2x -> 10x better.

Yet … yet … we were asked, again and again, how we did it. We pointed to our architecture.

But, they complained, isn’t it the same as X (insert your favorite volume vendor here)?

No, we pointed out. It isn’t. We described the differences. Showed them precisely how the differences manifested. Showed them that the results are normal, repeatable, and generally different from what others made claims about.

Often in the past, we’ve heard that (insert random vendor here) has comparable systems. And when the real world measurements come out, we hear a very different message. Or when a customer eschewed our solution, went for the brand name version (at much higher cost), they rapidly discovered that engineering by spec sheet is a very … very bad thing to do.

It doesn’t work (engineering by spec sheet).

Yet, we heard this all the time.

In the past, I’ve railed against the notion of silver bullets. A silver bullet is a magical component, hardware or software, that will suddenly make something go much faster, and you know, give a competitor an unfair competitive advantage.

Marketing people love their silver bullets. They don’t work, but hey, they are fun to talk about.

How do we know they don’t work? Easy. Decades of benchmarking against them. Running real applications against them and our kit.

We designed and built something quite good. It enabled us to build parallel IO engines, tune the heck out of the engines. Move tremendous amounts of data between process/memory complex, storage, and RAM, without bottlenecks of other designs. Despite protestations and spec sheets to the contrary, measurements I and many others have done have demonstrated sustained and profound advantages of superior architectures over the silver bullet enhanced architectures.

I see spec sheets and marketing blurbs on products proclaiming them to be “the fastest” stuff in the west, with numbers that are lower … lower … than numbers we surpassed more than 3 years ago. Yet, we are told by some that these products are comparable.

Or, even more (unwittingly) humorously, that there really is no difference, even though, in a number of cases, we had just demonstrated a profound (nearly order of magnitude) difference.

It astounds me. No, confounds me … this may be a better way to articulate it.

We’ve not simply created a better mousetrap, we’ve tried to tell the world about it. And been ignored.

And we tried to get folks to invest in this. And been ignored.

All the while the market is going on validating our ideas (dense and high performance systems), and we see VCs investing in things like, I dunno … Secret?

This gets to you after a while. You start questioning a number of premises you had held to be truth.

So here we are, with (what I’d argue what is) a fantastic architecture second to none. And despite the simplicity and obviousness of our message, our (many and repeatable and sustained) measured results … we get people reading off a marketing spec sheet telling us we are not all that different. Though we are.

This is one of those inflection points in a company’s existence.

I’ve been asked multiple times in recent months to estimate what we could do if we took our stack to other hardware. Apart from the significant performance (and likely stability) loss, such that we’d be like everyone else, not much.

I’ve also been asked multiple times to “divulge our secrets”, though our architecture is open, our kernels are available online.

As I said, it gets to you.

I am thinking hard about this battle, and whether or not I want to keep fighting it.

Our kit is obviously, objectively better. And not by a little bit.

But it doesn’t matter if we can’t sell it, because people read spec sheets and think the numbers printed on it are what they will get in normal operational states, versus the best case scenarios that are specifically set up for those parts.

A friend noted the fallacy of engineering by spec sheet a while ago. They are right.


Viewed 19350 times by 1716 viewers

#Perl on the rise for #DevOps

Note: I do quite a bit of development in Perl, and have my own biases, so please do take this into consideration. It is one of many languages I use, but it is by and large, my current go-to language. I’ll discuss below.

According to TIOBE (yeah, I know), Perl usage is on the rise. The linked article posits that this is for DevOps reasons. The author of the article works at a company that makes money from Perl and Python … they build (actually very good) tools. Tools that I personally use (Komodo).

The rationale is that Perl is very powerful, quite fast, extremely flexible, and ubiquitous to boot. They compare performance of Python and Perl performance, noting some of the differences, and speculating why.

Generally, I don’t normally like saying “language X is better than Y”. Languages have domains of applicability, strengths, and weaknesses. Moreover, if you have to justify your choice by making the point “Y is better than X because of Z” then you’ve largely not understood the point of the languages in the first place. I’ve made this point in the past before, but “delegitimizing” a language (such as, I dunno, the line noise meme? or use of sigils … where the latter seems to be only applied to a single language …) isn’t a good language advocacy path.

So put that aside, and lets talk DevOps. DevOps at the core is about turning processes and hardware into larger portions of an algorithmic application delivery and support. To make automation simpler, to wrapper applications that aren’t services into something that looks/acts like a service. To enable composable systems, or if you prefer the moniker used today, Software Defined systems. I’m going to focus less on the container side here, and more on the process side.

There are many tools to help enable this. Some are fairly new and undergoing rapid development. Some are more mature, others are ossified.

Generally, you need a few specific features to build elements for inclusion in a DevOps pipeline. You need the ability to build API endpoints for services that you will be running. You need the ability to link these API end points to specific functional elements. Like running a non-service based program with specific arguments. You need the ability to ingest data in common formats, and output in common formats. You need the ability to easily send signals in or out of band (depending upon how your architecture is built).

This “glue” functionality is, to a very large extent, what Perl excels at. Ok, I am talking Perl5 here. Perl6 is (literally) a new language with a similar though not identical syntax … but from what I have seen, it can do this, and far far more. But that is a topic for another time.

You can create endpoints trivially in Perl using standard modules. You can set up simple servers, or restful APIs fairly trivially without much boiler plate code (see Mojolicious on CPAN. It has significant capability in Meta-programming via various modules such as Class::MOP/Moose, and others. It has amazing multi-language capabilities with the Inline:: series (Inline::C , Inline::Python, …). It interfaces quite trivially to external libraries written in any language (FFI::Platypus). It has the ability to run external code via a tremendously powerful interface, IPC::Run, as well as with simple back ticks. It can run multi-threaded, multi-process code using threads::shared and MCE amongst many others. Its database connectivity is excellent, and it is easy to hook into (No|New)SQL DBs. It has event loops for async processes.

I could keep going, but the point is that it is fairly trivial to build responsive services for DevOps using Perl and a smattering of these tools.

This said, some of the distribution providers (I am looking at you Red Hat) are still shipping not merely ancient tool sets, but tools that have been end of lifed for years … as their current supported tools. Like the ancient Go, Python, and other tools, Perl on these distributions is so woefully out of date, that some of the modules (Mojolicious and a few others) may not work properly. This is on them, they need to decide if they want developers whom need modern tools, or not.

What I’ve been doing has been building my own tree of tools. I’ll be refreshing this soon, and putting the refreshed tree up on github shortly. These are modern versions of Perl5, Perl6, Python3, Rust, Octave, R, Julia, Node, Jupyter, and a few others, along with my build environment. These tools make DevOps and analytics generally quite easy. All batteries included as far as I can tell based upon my usage, but happy to learn of more tools we need to include.

This environment is not yet set up for containerized deployment, as it is more of an add-in to an environment, than providing a specific service. We are looking at ways of packaging/using this in a more “traditional” container scenario.

But back to Perl and DevOps. The majority of Scalable Informatics code is Perl based DevOps code, and has been for more than a decade. The code is simple, fast, well debugged. Handles very intense loads.

Tastes great, and less filling.

I’ve not felt that Perl5 was dying as a language. I’ve thought that there are many tools out there, and some of them are pretty good. Perl is just one of them. Though for the moment, it is my go-to language.

Personally I am a polyglot, and I try to use the system that gets in my way the least; allowing me to express what I need the most, with the greatest simplicity and accuracy. I think that if there is a signal in the TIOBE data, that it likely reflects this to some degree. People rediscovering solutions to problems.

Creating something new for the sake of creating something new versus using a powerful system that exists and solves problems correctly now may not be the best pattern for follow for DevOps. We’ve seen this time and again in this industry though. Some of the patterns are fads, some have longevity (even if there is no valid reason for their existence).

DevOps going forward will continue to push hard on toolchains, and those whom enable the greatest functionality with the least pain will likely be the winners. Similarly with analytics …

Make it easy to use and adopt. Make it ubiquitous. Perl has this now. Which is why I think the article might be on to something, even if the assumptions on the data are not valid.

Viewed 32215 times by 1940 viewers

Another itch scratched

So there you are, with many software RAIDs. You’ve been building and rebuilding them. And somewhere along the line, you lost track of which devices were which. So somehow you didn’t clean up the last build right, and you thought you had a hot spare … until you looked at /proc/mdstat … and said … Oh …

So. I wanted to do the detailed accounting, in a simple way. I want the tool to tell me if I am missing a physical drive (e.g. a drive died), or if a disk thinks it is part of a raid, even though the OS doesn’t agree.

And yes, this latter bit can happen, if you re-build the array, and omit one of the devices for whatever reason.

Like I did.

So …

root@usn-t60:/opt/scalable/sbin# ./lsswraid --raid=md23
N(OS)	= 14
N(disk)	= 15
More Physical disk RAID elements than OS RAID elements, likely you have a previously built element which has not been cleared.
The extra devices are: sdz

root@usn-t60:/opt/scalable/sbin# grep sdz /proc/mdstat

And to add this particular device back in as a hot spare …

/dev/sdz: 4 bytes were erased at offset 0x00001000 (linux_raid_member): fc 4e 2b a9
root@usn-t60:/opt/scalable/sbin# mdadm /dev/md23 --add /dev/sdz
mdadm: added /dev/sdz

root@usn-t60:/opt/scalable/sbin# grep sdz /proc/mdstat
md23 : active raid6 sdz[16](S) sdap[14] sdar[13] sdas[12] sdau[11] sdat[10] sdaf[9] sdag[8] sdai[7] sdah[6] sdaj[5] sdak[4] sdam[3] sdal[2] sdaa[15]

Viewed 59602 times by 2594 viewers