An article on Rust language for astrophysical simulation

It is a short read, and you can find it on arxiv. They tackled an integration problem, basically using the code to perform a relatively simple trajectory calculation for a particular N-body problem.

A few things lept out at me during my read.

First, the example was fairly simplistic … a leapfrog integrator, and while it is a symplectic integrator, this particular algorithm not quite high enough order to capture all the features of the N-body interaction they were working on.

Second, the statically compiled Rust was (for the same problem/inputs) about equivalent to if not slighty faster than the Fortran code in performance, much faster than C++, and even more so than Go. This could be confirmation bias on my part, but I’ve not seen many places where Go was comparable in performance with C++, usually factors of 2 off or worse. There could be many reasons for this, up to, and including poor measurement scenarios, allowing the Go GC to run in critical sections, etc. I know C++ should be running near Fortran in performance, but generally, I’ve not seen that either as the general case. Usually it is a set of fixed cases.

The reason for Fortran’s general dominance comes from the 50+ years people have had to write better optimizers for it, and that the language is generally simpler, with fewer corner cases. That and the removal of dangerous things like common blocks and other global state. This said, I fully expect other codes to equal and surpass it soon on a regular basis. I have been expecting Julia to do this in fairly short order. I am heartened to see Rust appear to do this on this one test, though I personally reserve my opinion on this for now. I’d like to see more code do this.

Rust itself is somewhat harder to adapt to. You have to be more rigid about how you think about variables, how you will use them, and what mechanisms you can use to shuttle state around. You have to worry about this a bit more explicitly than other languages. I am personally quite intrigued by its promises: zero cost abstraction, etc. The unlikelihood of a SEGV hitting again is also quite nice … tracing those down can often be frustrating.

My concern is the rate at which Rust is currently evolving. I started looking at it around 0.10.x, and it is (as of this writing) up to 0.14.x with 0.15.x looming.

Generally, I want languages that get out of my way of doing things, not languages that smother me in boilerplate of marginal (at best) utility, and impose their (often highly opinionated, and highly targeted, often slightly askew) worldview on me. Anything with an explicit garbage collection blocking task which can interfere with calculation is a terrific example of this.

Simple syntax, ease of expression, ability to debug, high performance, accurate results, and freedom from crashing with (*&^*&^$%^&$ SEGVs are high on my list of languages I want to spend time with.

Viewed 6094 times by 732 viewers

Brings a smile to my face … #BioIT #HPC accelerator

Way way back in the early aughts (2000’s), we had built a set of designs for an accelerator system to speed up things like BLAST, HMMer, and other codes. We were told that no one would buy such things, as the software layer was good enough and people didn’t want black boxes. This was part of an overall accelerator strategy that we had put together at the time, and were seeking to raise capital to build. We were thinking that by 2013 or so, that accelerators would become a noticeable fraction of the total computing power for HPC and beyond.

Fast forward to today. I saw this.

Yet another idea/concept/system validated. It looks like our only real big miss was the “muscular desktops” concept … big fast processors, memory, storage, accelerators next to desks. Everything else, from accelerators, to clouds, to … everything … we were spot on.

I hope they don’t mind my smiling at this, and tipping my hat to them. Good show folks, may you sell many many of them!

Viewed 10543 times by 1082 viewers

Another article about the supply crisis hitting #SSD, #flash, #NVMe, #HPC #storage in general

I’ve been trying to help Scalable Informatics customers understand these market realities for a while. Unfortunately, to my discredit, I’ve not been very successful at doing so … and many groups seem to assume supply is plentiful and cheap across all storage modalities.

Not true. And not likely true for at least the rest of the year, if not longer.

This article goes into some depth that I’ve tried to explain to others in phone conversations, private email threads. I am guessing that they thought I was in a self-serving mode, trying to get them to pull in their business sooner rather than later to help Scalable, versus helping them.

Not quite.

At Scalable we had focused on playing the long game. On winning customers for the long haul, on showing value with systems we produced, with software we built, with services and support we provided, with guidance, research and discussions. I won’t deny that orders would have helped Scalable at that time, but Scalable would still have had to wait for parts to fulfill them, which, curiously, would have put the company at even greater exposure and risk.

The TL;DR of the article goes like this, and they are not wrong.

  1. There is insufficient SSD/Flash supply in market
  2. Hard disks which have been slowly dropping in manufacturing rates are not quite able to take up the demand slack at the moment, there are shortages there as well
  3. 10k and 15k RPM drives are not long for this world
  4. New fab capacity needed to increase SSD supply is coming online (I dispute that enough is coming online, or that it is coming online fast enough, but more is, at least organically coming online)
  5. There is no new investment in HDD/SRD capacity … there is a drying up of HDD/SRD manufacturering lines

All of this conspires to make storage prices rise fast, and storage products harder to come by. Which means if you have a project with a fixed unadaptable budget, and a sense that you can order later, well … I wouldn’t want to be in anyones shoes that had to explain to their management team, board, CEO/CFO, etc. why a critical project suddenly was going to be both delayed and far more expensive (really, I’ve seen the price rises over the last few months, and it ain’t pretty).

This isn’t 5, 15, 25 percent differences either. The word “material” factors into it. Sort of like the HDD shortage of several years ago with the floods in Thailand.

It is curious that we appear to not have learned, with the fab capacity located in similarly risky areas … but that would be a topic for another post some day.

Even more interesting to me personally in this article, is a repetition of something I’ve been saying for a while:

To deal with supply uncertainty, as we move from an industry based on mechanical hard drives (which has dedicated production facilities) to one based on commodity NAND, vertically integrated solutions will be optimal. Organizations that control everything from NAND supply to controllers to the software will be in a much better position to deliver consistently than those that don’t.

Vertical integration matters here. You can’t just be a peddler of storage parts, you need to work up the value chain. I’ve been saying this to anyone who would listen for the last 4 years or so.

Also

This may cause an existential crisis for the external storage array. Creating, validating and successfully marketing a new external storage array in a saturated market is difficult. It is unlikely today’s storage vendors will be trying to move up the value chain by reinventing the array.

Yeah, array as a market is shrinking fast. There are smaller faster startups and public companies all feasting on a growing fraction of a decreasing market (external storage arrays). And they stick to what they know, and try to ignore the rest of the world.

There is a component of the market which insists on “commodity” solutions, where the world “commodity” has a somewhat opportunistic definition, so the person espousing the viewpoint can make an argument for their preferred system. These arguments are usually wrong at several levels, but it seems to be a mantra across a fairly wide swath of the industry. It is hard to hold back a tide flowing the wrong way by shouting at it. Sometimes it is simply better to stop resisting, let it wreak its damage, and move on to something else. We can’t solve every problem. Some problem “solutions” are less focused upon analyses, and more focused upon fads, and the “wisdom” of the masses.

You may have seen me grouse, and shake my head recently over “engineering by data sheet”. This falls solidly into this realm as well, where people compare data sheets, and not actual systems or designs. I see this happen far more often in our “software eats the world” view, where people whom should know better, don’t.

Reminds me of my days at SGI, when me with my little 90MHz R8k processor was dealing with “engineering by spec sheet” by some end users salivating over the 333MHz Dec Alpha processor. The latter was “obviously” faster, it had better specs.

I asked then, if this were true, how come our actual real-world (constructed by that same customer no less) tests showed quite the opposite?

Some people decide based upon emotions and things they “know” to be true. The rest of us want hard empirical and repeatable evidence. A spec sheet is not empirical evidence. Multiple independent real world benchmark tests? Yeah, that’s evidence.

Hyper-converged solutions, on the other hand, are relatively easy: there are a whole lot of smaller hyper-converged players that can be bought up cheaply and turned into the basis for a storage vendor’s vertically integrated play.

Well, the bigger players are rapidly selling: Nutanix is public, Simplivity was bought by HPE.

Smaller players abound. I know one very well, and it is definitely for sale … the owners are motivated to move quickly. Reach out to me at joe _at_ this domain name, or joe.landman _at_ google’s email system for more information.

For the small virtual administrator, none of this may be relevant. Our needs are simple and we should be able to find storage even if supplies become a little tight. If, however, you measure your datacenter in acres, by the end of next year you may well find yourself negotiating for your virtual infrastructure from a company that last year you would have thought of as just a disk peddler.

This article almost completely mirrors points I’ve made in the past to some of the disk vendors I’ve spoken to, about why they might want to pick up a scrappy upstart with a very good value prop, but insufficient capital to see their plans through. I’ve seen only Seagate take actions to move along the value chain, with the Xyratex purchase, and I thought originally they had done that specifically for the disk testing elements. Turns out I was wrong … they had designs on the storage appliance side as well.

All the disk vendors would do well to cogitate on this. The writing is definitely on the wall. The customers know this. The remaining runway is of limited length, and every single day burns more of it up.

Exactly what are they going to do, and when will they do it?

Customers want to know.

Viewed 13648 times by 1429 viewers

A nice shout out in ComputerWeekly.com about @scalableinfo #HPC #storage

See the article here.

Surely though, widespread adoption is just a matter of time. With its PCIe connectivity, literally slotting in, NVMe offers the ability to push hyper-converged utility and scalability to wider sets of use cases than currently.

There are some vendors that focus on their NVMe/hyper-converged products, such X-IO (Axellio), Scalable Informatics, and DataON, but NVMe as standard in hyper-converged is almost certainly a trend waiting to happen.

They mention Axellio, and on The Reg article on their ISE product, they say “X-IO partners using Axellio will be able to compete with DSSD, Mangstor and Zstor and offer what EMC has characterised as face-melting performance.”

Hey, we were the first to come up with “face melting performance”. More than a year ago. And it really wasn’t us, but my buddy Dr. James Cuff of Harvard.

For the record, we started shipping the Forte units in 2015. We’ve got a beautiful design for v2 of them as well.

This said, Dataon is in a different market, and XIO is about a very different use case, more similar to Nutanix and now HPE’s Simplivity than the use case we imagined.

/sigh

Viewed 34980 times by 2030 viewers

when you eliminate the impossible, what is left, no matter how improbable, is likely the answer

This is a fun one.

A customer has quite a collection of all-flash Unison units. A while ago, they asked us to turn on LLDP support for the units. It has some value for a number of scenarios. Later, they asked us to turn it off. So we removed the daemon. Unison ceased generating/consuming LLDP packets.

Or so we thought.

Fast forward to last week.

We are being told that LLDP PDUs are being generated by the kit. I am having trouble believing this. As we removed the LLDP daemon from the OS load, and there is nothing in the OS or driver stack consuming/producing those.

We worked back and force, and I got a packet trace, clearly showing something that should not be possible. Something highly improbable.

So then I looked deeper. Really, no LLDP daemon on there at all.

If there was, I should see LLDP packets being passed into the ring buffer, and visible in packet captures.

So I started capturing packets.

Lo and behold … nothing. Nada. Zippo. Zilch.

No LLDP packets passed up the stack.

Customer reset counters, we tried again. They saw the packets. I didn’t.

So, here are some impossible things I can eliminate.

  1. The OS is generating/consuming LLDP packets. It is not. This is provable.
  2. The switch is lying about LLDP packets. It is not. This is provable.
  3. There is no 3.
  4. The hardware is failing. It is not. This is provable.
  5. Russian hackers? No … not possible.

What I am left with, however unlikely, must be a possibility.

That the NIC, without passing this information back up the stack, is generating and consuming LLDP PDU broadcast packets, or the switch is misbehaving.

As much as I don’t like the first, it is possible. THe second is also possible, but I only have control over the first, so let me work on that.

Normally, spurious packets don’t bug me. Transient “ghost daemon in the machine” phenomenon need to be looked at, and traced down, but rarely do they have an impact. In this case, the daemon may be in hardware, outside of the control plane (via the driver), and not on the same data plane.

This phenomenon is causing the switch to shut down ports after not receiving more LLDP packets. So it is spurious. Transient.

And there is a failure cascade after this. The switch shutting down ports takes a metadata server for a parallel file system offline. After which, the wrong type of hilarity ensues.

Yes, we can likely have them configure the switch so as to ignore LLDP packets. But that is aside from the point, in that the system shouldn’t be generating/consuming them by default on its own, without a kernel or user space control over it. And they should be propagated up the stack.

One possible solution is to replace the NIC. We may pursue this, but it wouldn’t be a bad thing to also try to isolate and solve this problem. We have to weigh the impact of either course and decide what to do. Until then, temporary workaround it to shut off the LLDP port toggling here.

Viewed 34818 times by 1998 viewers

Virtualized infrastructure, with VM storage on software RAID + a rebuild == occasional VM pauses

Not what I was hoping for. I may explain more of what I am doing later (less interesting than why I am doing it), but suffice it to say that I’ve got a machine I’ve turned into a VM/container box, so I can build something I need to build.

This box has a large RAID6 for storage. Spinning disk. Fairly well optimized, I get good performance out of it. The box has ample CPU, and ample memory.

The VM bulk storage points over the the spinning disk RAID6, not the SSD RAID10.

I noted a failing drive, so I ejected it and swapped it out for a working one. RAID rebuild started, and now I’ve got another couple of hours before it finishes. 6 VMs are consuming maybe 25% of the CPU cycles when busy, and about 25% of the RAM in total. The machine is otherwise idle.

And when I log into one of the VMs, I am getting dramatic pauses, while there is no real load going on. Nothing in the process table. Yet the load average is wound up a little, which usually happens when IO is paused.

Sure enough … this looks like what is happening. I am going to explore these code paths somewhat more. Fairly modern 4.4.x kernel, so its not likely a long looming bug of the 3.10/3.16 variety.

Fun.

Viewed 40793 times by 2111 viewers

A new #HPC project on github, nlytiq-base

Another itch I’ve been wanting to scratch for a very long time. I had internal versions of a small version of this for a while, but I wasn’t happy with them. The makefiles were brittle. The builds, while automated, would fail, quite often, for obscure reasons.

And I want a platform to build upon, to enable others to build upon. Not OpenHPC which is more about the infrastructure one needs for building/running high performance computing systems. That is a good effort, though it also needs .debs for Ubuntu/Debian, or even better, source and Makefiles.

What I wanted here was a set of analytical and programming tools for working with data. Specifically, up to date tools, modern … not end-of-life packaged tools that are so badly out of date, that you can’t install modern extensions to them, or use them to bootstrap the tools you need.

So the github repo is here. This is very early release of the tool chain build environment. You can configure everything from base.config, and run make. It will take a while, but it will eventually result in a fully populated analytical tree.

One gotcha now will be the ATLAS build. I need to set up detection to see if there exists on machine blas/lapack/atlas, as ATLAS wants you to turn off processor throttling to build, or it fails in a strange way. I’ll add in some code to detect this. Specifically, I’ll see if I can force affinity for a specific processor and have it build on that. Not optimal, but better than failing. If this is not possible, I’ll look for the lapack/blas/atlas libs on the main unit. If they are there, great, we’ll use them. Otherwise, in the worst case, if we can’t do any of these, I’ll build the slow versions.

I certainly would like to get feedback from people on what they might want in this, what additional R/Python/Go/Node/Perl packages they want embedded. And whether or not they want a mountable compressed file system image, a docker image, or whatnot else.

My plan is to use this as a base for something else I’ve been wanting to build.

More later, but its a start.

Viewed 39399 times by 1979 viewers

There are real, and subtle differences between su and sudo

Most of the time, sudo just works. Every now and then, it doesn’t. Most recently was with a build I am working on, where I got a “permission denied” error for creating a directory.

The reason for this was non-obvious at first. You “are” superuser after all when you sudo, right? Aren’t you?

Sort of.

Your effective user ID has been set to the superuser. Your real user ID still is yours. This means things like your temp directory are not necessarily yours … er … the real user ID of the temp directory owner might be different from the effective user ID you are building as. And if you have a root_squash on an NFS mount, or your system uses one or the other security mechanisms to prevent privilege escalation … here be dragons.

So it seems, during a build of rust 0.14.0, I ran head first into this. I will freely admit that my mouth was agape for a bit. I will not admit to drool falling out, and have rapidly deleted any such webcam video.

Ok, more seriously, it was a WTF moment. Took me a second to understand, as my prompt says # when I sudo -s. The make was run as sudo. The make command failed under sudo with a permission (!!!@@!@@!) error. Then a fast ‘su’ and off to the races we went.

Seriously.

While I want to dig into this more, my goal here was building rust in a reliable and repeatable manner. I don’t have that going quite yet. Very close, but I’ve now run into LLVM/clang oddities, and have switched back to gcc for it. Build completes now, but install is still problematic because of this issue.

I could just build as root user, and other build environments I’ve built do that. I’ve been trying to get away from that, as it is a bad habit, and an errant make file could wreak havoc. But the converse is also true, in that during installation, often you need to be root to install into specific paths.

I can change that assumption, and create a specific path owned by a specific user, and off to the races I go. I prefer that model, and then let the admins set up sudo access to the tree.

Viewed 39599 times by 1912 viewers

Combine these things, and get a very difficult to understand customer service

In the process of disconnecting a service we don’t need anymore. So I call their number. Obviously reroutes to a remote call center. One where english is not the primary language.

I’m ok with this, but the person has a very thick and hard to understand accent. Their usage and idiom were not American, or British English. This also complicates matters somewhat, but I am used to it. I can infer where they were from, from their usage. It was very common in my dealings with other people there.

Of course, this isn’t bad enough.

The call center is busy, and you can hear lots of background noise.

Of course, this isn’t bad enough.

Now add a poor VOIP connection. I was doing this over a cell phone, and my connection is generally quite good … I’ve been on many hour long con calls over this phone, headset, etc. from this location. Its not an ultra busy part of the day. So I am not getting dropped connections. I have a major US carrier for the cell. So its not a tower congestion problem.

Likely a backhaul problem shipping the voice bits halfway around the world and back, on a congested/contended for link. Noticeable delays in response. Ghosting/echoing. All manner of artifacts.

Of course, this isn’t bad enough.

Finally, add a crappy mic on the remote person’s head set.

End result was, I had to struggle to understand the person. Really struggle. Some of it was guessing what they were saying. Some was not.

I have to wonder aloud, whether companies in search of cost reduction, think its a good idea to make it hard to understand the support staff, by a combination of language usage, poor equipment, substandard networking, etc.

I guess it is amusing that this is a large “business ISP” here in the US.

At bare minimum, they should have the headsets upgraded, the network (ha!) upgraded, and the work area more noise isolated so that you get less of these issues to deal with. Hiring people whom speak with less of a thick accent is also recommended, or conversely, training them on how to adapt their elocution so as to be more understandable.

I, as an escaped New Yorker, probably shouldn’t be answering phones myself (Hey, wassamadda for you?) … but seriously … at least make an effort on this.

Viewed 39327 times by 1830 viewers

SSD/flash/memory shortage, day N+1

There has been a huge demand of SSD/Flash/memory components from a number of end users. Sadly not the day jobs customers … but enough to deplete the market of supply.

Watching basic economics at work is fascinating.

Supply is highly constrained, while demand is rising. Couple that with a (mis)expectation of continuous falling prices across the board leads to interesting conversations with customers.

We’ve tried to set expectations appropriately, but we’ve been bitten in the past by doing just this. That is, by being honest and up-front with our customers that some things will take more time to get, and cost more, we’ve watched customers go to different vendors, hear a different message, and then be screwed over as we weren’t being dishonest … while the other vendor was.

In another post, I said this was getting to me.

We’ve been advising customers placing orders 2+ months in advance for some specific sets of parts in very short supply. It does take some time for manufacturing to ramp up, and OEMs are in no hurry to flood a market and lower the effective purchase price (and their profits).

Yet, I am still seeing people think that parts are available with a quick phone call. For a large enough order (more than 1 or 2 systems worth), you need to get an allocation, and you need to get in queue for that allocation. That queue can be long. Other larger orders can and will bump you in queue. And the direct customers for the OEMs that bought all the product last time might just do it again. I’d call this highly likely. These aren’t the Dells, HPEs, etc. of the world. Go ahead and guess who might be doing this. And note that shortages in the broader market serve to underscore a portion of their message.

Many folks are building out their backlog from inventory, though their inventories aren’t deep, as the products age fast, and become obsolete in short time intervals. Many do just-in-time building. For those of us doing that, this is becoming painful at best.

Yeah, this is getting to me.

Viewed 23422 times by 1637 viewers