A new (old) customer for the day job

Our friends at MSU HPCC now are the proud owners of a very fast/high performance Unison Flash storage system, and a ZFS backed high performance Unison storage spinning disk unit. Installed first week of Jan 2017.

As MSU is one of my alma mater institutions, I am quite happy about helping them out with this kit.

They’ve been a customer previously; they had bought some HPC MPI/OpenMP programming training in the dim and distant past.

Viewed 23166 times by 1380 viewers

Architecture matters, and yes Virginia, there are no silver bullets for performance

Time and time again, the day job had been asked to discuss how the solutions are differentiated. Time and time again, we showed benchmarks on real workloads that show significant performance deltas. Not 2 or 3 sigma measurements. More often than not, 2x -> 10x better.

Yet … yet … we were asked, again and again, how we did it. We pointed to our architecture.

But, they complained, isn’t it the same as X (insert your favorite volume vendor here)?

No, we pointed out. It isn’t. We described the differences. Showed them precisely how the differences manifested. Showed them that the results are normal, repeatable, and generally different from what others made claims about.

Often in the past, we’ve heard that (insert random vendor here) has comparable systems. And when the real world measurements come out, we hear a very different message. Or when a customer eschewed our solution, went for the brand name version (at much higher cost), they rapidly discovered that engineering by spec sheet is a very … very bad thing to do.

It doesn’t work (engineering by spec sheet).

Yet, we heard this all the time.

In the past, I’ve railed against the notion of silver bullets. A silver bullet is a magical component, hardware or software, that will suddenly make something go much faster, and you know, give a competitor an unfair competitive advantage.

Marketing people love their silver bullets. They don’t work, but hey, they are fun to talk about.

How do we know they don’t work? Easy. Decades of benchmarking against them. Running real applications against them and our kit.

We designed and built something quite good. It enabled us to build parallel IO engines, tune the heck out of the engines. Move tremendous amounts of data between process/memory complex, storage, and RAM, without bottlenecks of other designs. Despite protestations and spec sheets to the contrary, measurements I and many others have done have demonstrated sustained and profound advantages of superior architectures over the silver bullet enhanced architectures.

I see spec sheets and marketing blurbs on products proclaiming them to be “the fastest” stuff in the west, with numbers that are lower … lower … than numbers we surpassed more than 3 years ago. Yet, we are told by some that these products are comparable.

Or, even more (unwittingly) humorously, that there really is no difference, even though, in a number of cases, we had just demonstrated a profound (nearly order of magnitude) difference.

It astounds me. No, confounds me … this may be a better way to articulate it.

We’ve not simply created a better mousetrap, we’ve tried to tell the world about it. And been ignored.

And we tried to get folks to invest in this. And been ignored.

All the while the market is going on validating our ideas (dense and high performance systems), and we see VCs investing in things like, I dunno … Secret?

This gets to you after a while. You start questioning a number of premises you had held to be truth.

So here we are, with (what I’d argue what is) a fantastic architecture second to none. And despite the simplicity and obviousness of our message, our (many and repeatable and sustained) measured results … we get people reading off a marketing spec sheet telling us we are not all that different. Though we are.

This is one of those inflection points in a company’s existence.

I’ve been asked multiple times in recent months to estimate what we could do if we took our stack to other hardware. Apart from the significant performance (and likely stability) loss, such that we’d be like everyone else, not much.

I’ve also been asked multiple times to “divulge our secrets”, though our architecture is open, our kernels are available online.

As I said, it gets to you.

I am thinking hard about this battle, and whether or not I want to keep fighting it.

Our kit is obviously, objectively better. And not by a little bit.

But it doesn’t matter if we can’t sell it, because people read spec sheets and think the numbers printed on it are what they will get in normal operational states, versus the best case scenarios that are specifically set up for those parts.

A friend noted the fallacy of engineering by spec sheet a while ago. They are right.

smh

Viewed 16869 times by 1304 viewers

#Perl on the rise for #DevOps

Note: I do quite a bit of development in Perl, and have my own biases, so please do take this into consideration. It is one of many languages I use, but it is by and large, my current go-to language. I’ll discuss below.

According to TIOBE (yeah, I know), Perl usage is on the rise. The linked article posits that this is for DevOps reasons. The author of the article works at a company that makes money from Perl and Python … they build (actually very good) tools. Tools that I personally use (Komodo).

The rationale is that Perl is very powerful, quite fast, extremely flexible, and ubiquitous to boot. They compare performance of Python and Perl performance, noting some of the differences, and speculating why.

Generally, I don’t normally like saying “language X is better than Y”. Languages have domains of applicability, strengths, and weaknesses. Moreover, if you have to justify your choice by making the point “Y is better than X because of Z” then you’ve largely not understood the point of the languages in the first place. I’ve made this point in the past before, but “delegitimizing” a language (such as, I dunno, the line noise meme? or use of sigils … where the latter seems to be only applied to a single language …) isn’t a good language advocacy path.

So put that aside, and lets talk DevOps. DevOps at the core is about turning processes and hardware into larger portions of an algorithmic application delivery and support. To make automation simpler, to wrapper applications that aren’t services into something that looks/acts like a service. To enable composable systems, or if you prefer the moniker used today, Software Defined systems. I’m going to focus less on the container side here, and more on the process side.

There are many tools to help enable this. Some are fairly new and undergoing rapid development. Some are more mature, others are ossified.

Generally, you need a few specific features to build elements for inclusion in a DevOps pipeline. You need the ability to build API endpoints for services that you will be running. You need the ability to link these API end points to specific functional elements. Like running a non-service based program with specific arguments. You need the ability to ingest data in common formats, and output in common formats. You need the ability to easily send signals in or out of band (depending upon how your architecture is built).

This “glue” functionality is, to a very large extent, what Perl excels at. Ok, I am talking Perl5 here. Perl6 is (literally) a new language with a similar though not identical syntax … but from what I have seen, it can do this, and far far more. But that is a topic for another time.

You can create endpoints trivially in Perl using standard modules. You can set up simple servers, or restful APIs fairly trivially without much boiler plate code (see Mojolicious on CPAN. It has significant capability in Meta-programming via various modules such as Class::MOP/Moose, and others. It has amazing multi-language capabilities with the Inline:: series (Inline::C , Inline::Python, …). It interfaces quite trivially to external libraries written in any language (FFI::Platypus). It has the ability to run external code via a tremendously powerful interface, IPC::Run, as well as with simple back ticks. It can run multi-threaded, multi-process code using threads::shared and MCE amongst many others. Its database connectivity is excellent, and it is easy to hook into (No|New)SQL DBs. It has event loops for async processes.

I could keep going, but the point is that it is fairly trivial to build responsive services for DevOps using Perl and a smattering of these tools.

This said, some of the distribution providers (I am looking at you Red Hat) are still shipping not merely ancient tool sets, but tools that have been end of lifed for years … as their current supported tools. Like the ancient Go, Python, and other tools, Perl on these distributions is so woefully out of date, that some of the modules (Mojolicious and a few others) may not work properly. This is on them, they need to decide if they want developers whom need modern tools, or not.

What I’ve been doing has been building my own tree of tools. I’ll be refreshing this soon, and putting the refreshed tree up on github shortly. These are modern versions of Perl5, Perl6, Python3, Rust, Octave, R, Julia, Node, Jupyter, and a few others, along with my build environment. These tools make DevOps and analytics generally quite easy. All batteries included as far as I can tell based upon my usage, but happy to learn of more tools we need to include.

This environment is not yet set up for containerized deployment, as it is more of an add-in to an environment, than providing a specific service. We are looking at ways of packaging/using this in a more “traditional” container scenario.

But back to Perl and DevOps. The majority of Scalable Informatics code is Perl based DevOps code, and has been for more than a decade. The code is simple, fast, well debugged. Handles very intense loads.

Tastes great, and less filling.

I’ve not felt that Perl5 was dying as a language. I’ve thought that there are many tools out there, and some of them are pretty good. Perl is just one of them. Though for the moment, it is my go-to language.

Personally I am a polyglot, and I try to use the system that gets in my way the least; allowing me to express what I need the most, with the greatest simplicity and accuracy. I think that if there is a signal in the TIOBE data, that it likely reflects this to some degree. People rediscovering solutions to problems.

Creating something new for the sake of creating something new versus using a powerful system that exists and solves problems correctly now may not be the best pattern for follow for DevOps. We’ve seen this time and again in this industry though. Some of the patterns are fads, some have longevity (even if there is no valid reason for their existence).

DevOps going forward will continue to push hard on toolchains, and those whom enable the greatest functionality with the least pain will likely be the winners. Similarly with analytics …

Make it easy to use and adopt. Make it ubiquitous. Perl has this now. Which is why I think the article might be on to something, even if the assumptions on the data are not valid.

Viewed 30451 times by 1618 viewers

Another itch scratched

So there you are, with many software RAIDs. You’ve been building and rebuilding them. And somewhere along the line, you lost track of which devices were which. So somehow you didn’t clean up the last build right, and you thought you had a hot spare … until you looked at /proc/mdstat … and said … Oh …

So. I wanted to do the detailed accounting, in a simple way. I want the tool to tell me if I am missing a physical drive (e.g. a drive died), or if a disk thinks it is part of a raid, even though the OS doesn’t agree.

And yes, this latter bit can happen, if you re-build the array, and omit one of the devices for whatever reason.

Like I did.

So …

root@usn-t60:/opt/scalable/sbin# ./lsswraid --raid=md23
N(OS)	= 14
N(disk)	= 15
More Physical disk RAID elements than OS RAID elements, likely you have a previously built element which has not been cleared.
The extra devices are: sdz

root@usn-t60:/opt/scalable/sbin# grep sdz /proc/mdstat

And to add this particular device back in as a hot spare …

/dev/sdz: 4 bytes were erased at offset 0x00001000 (linux_raid_member): fc 4e 2b a9
 
root@usn-t60:/opt/scalable/sbin# mdadm /dev/md23 --add /dev/sdz
mdadm: added /dev/sdz

root@usn-t60:/opt/scalable/sbin# grep sdz /proc/mdstat
md23 : active raid6 sdz[16](S) sdap[14] sdar[13] sdas[12] sdau[11] sdat[10] sdaf[9] sdag[8] sdai[7] sdah[6] sdaj[5] sdak[4] sdam[3] sdal[2] sdaa[15]

Viewed 57116 times by 2244 viewers

ClusterHQ dies

ClusterHQ is now dead. They were an early container play, building a number of tools around Docker/etc. for the space.

Containers are a step between bare metal and VMs. FLocker (ClusterHQ’s product) is open source, and they were looking to monetize it in a different way (not on acquisition, but on support).

In this space though, Kubernetes reigns supreme. So competing products/projects need to adapt or outcompete.

And its very hard to outcompete something like k8s.

Again, I feel for the folks kicked to the street. And this is likely just the beginning.

Viewed 62862 times by 2437 viewers

fortran for webapps

Use Fortran for your MVC web app. No, really

Here you are, coding your new density functional theory app, and you want to give it a nice shiny new web framework front end. Config files are so … 80s … Like in grad school, man … You want shiny new MVC action, with the goodness of fortran mixed in.

Out comes Fortran.io.

Viewed 62828 times by 2268 viewers

Another fun bit of debugging

Ok … so here you are doing a code build.

Your environment is all set. You have ample space. Lots of CPU, lots of RAM. All packages are up to date.

You start your make.

You have another window open with dstat running, just to kinda, sorta watch the system, while you are doing other things.

And while you are working, you realize dstat has stopped scrolling.

Strange, why would that be.

Ping the machine

Not responding.

Ok … hmmm … it crashed? Look in the BMC SEL (our kernel dumps panic messages there). Nothing.

Look at the system condition … overheating? Heck no, its actually running cool.

Hmmm….

Ok. Maybe something spurious. Connect up the SOL console, watch it finish booting.

Iterate. Log in 2 windows. Start dstat in one, build in another.

and …

bang …

Hmmm … nothing on the console …

Ok, hook up icl (ipmi console logger) to it. Capture the data. Lets see what is really happening.

Rinse repeat.

Bang.

Look in the log (ipmi console log that is, it will have everything).

Nope, completely blank.

/var/log/{syslog,messages}

Nothing.

Only happens under load? Could I have a blown CPU? I did see an EDAC memory error crop up once … ok, lets try something stupid. Something that should not work.

Drop the memory frequency to lowest speed.

Nope.

Turn off SMT (aka HT).

Nope.

Ok, lets go full moron, and assume hardware is the culprit, and is somehow … somehow not triggering an MCE or EDAC subsystem.

Let me remove 1/2 the memory.

Why not. Can’t hurt, easy to see it it works, right?

Start the build.

Works.

Do two intensive builds at once.

Works.

Do 3.

Works.

smh.

This is new memory, older board, older CPUs. Never given me a problem before.

Crashed with no message whatsoever.

I am going to assume something like a loading issue with the CPU. I can run this at 1/2 the ram, though I’ll probably put 1/2 of what I took out back in to check, and see if its a bad RAM, or a loading problem. Bad RAM should have triggered EDAC/MCE. Loading problem … maybe not.

Viewed 63092 times by 2312 viewers

Violin files for Chapter 11

This has been long in coming. I feel for the people involved.

Violin makes proprietary flash modules and chassis, to provide an all flash “array”. The performance is somewhat “meh”, and the cost is high. Like most of the rest of the companies in this space, their latest model bits are quite a bit below Scalable’s 4 year old models, never mind the new stuff.

Since the IPO, they’ve been on something of a monotonic down-direction in share price. This is because the market is crowded with flash array wanna-bes, there is little real differentiation apart from real performance (which precious few actually have).

Call them the first of many. I don’t see consolidation anymore, I see companies going bust.

Sad that this is happening to good people, but it is just the leading edge of the wave. You should expect to hear about others soon.

Viewed 70641 times by 2492 viewers

So it seems Java is not free

This article on The Register indicates that Oracle is now working actively to monetize java use.

Given the spate of java hacks over the years, and the decidedly non-free nature of the language, I suspect we are going to see replacement development language use skyrocket, as people develop in anything-but-Java going forward. Think about the risks … you have a massive platform that people have been using with a fairly large number of compromises (client side certainly) … and now you need to start paying for the privilege of using the platform.

We all knew this was coming.

Money quotes:

?Oracle has started marking this as an issue,? one expert told The Reg on condition of anonymity. Our source claimed there had been an upswing in enquiries in the last five months.

Craig Guarente, chief executive and founder of Palisade Compliance, told us Oracle?s not drawing the line at customers either, with partners feeling the LMS heat, too.

?Oracle is targeting its partners. That makes people angry because they are helping Oracle,? he told us. Partners want to know: ?How could Oracle do this to me??

?Java is something that comes up more and more with our clients because Oracle is pushing them more and more,? Guarente said.

The root cause seems to be the false perception that Java is ?free?.

That perception dates from the time of Sun; Java under Sun was available for free – as it is under Oracle ? but for a while Sun did charge a licensee fee to companies like IBM and makers of Blu-ray players, though for the vast majority, Java came minus charge. That was because Sun used Java as the thin end of the wedge to help sales of its systems.

Oracle has taken the decision to monetise Java more aggressively.

Further down in the article they claim that Java SE is free. I don’t expect that to be the case for long.

This is always the risk with non-open products, be they languages, cloud-based *aaS, whatever. The terms and conditions can be changed on you, in ways that upset your business model. Enough that you need to assume that such elements are increasing risk, and the risk may not actually be worth the benefit.

Viewed 52272 times by 2312 viewers

She’s dead Jim

It looks like (if the rumor is true) that Solaris will be pushing up the daisies soon.

Note: Solaris != SmartOS

This has been a long time coming. Combine this with Fujitsu dumping SPARC for headline projects … yeah … its likely over.

FWIW: I like SmartOS. The issue for it are drivers. We tried helping, and were able to get one group to update their driver set. But getting others to update (specifically Mellanox) will be even harder now (and it was impossible beforehand, for reasons that were not Mellanox’s fault). I’d like to use more SmartOS, but I keep running into things I can’t fix or work around. I can’t use my Mellanox 40+ Gb cards, or any Infiniband stack. I can’t use 100Gb cards. I can’t use Intel OPA. CUDA is right out. I had hoped that Samsung would be throwing beau-coup money at Joyent to really solidify the platform after the purchase. Still hoping.

So our OS choices seem to be Linux based and BSD based going forward. We use BSD for specific functions, and Linux for many things.

The closest thing I’ve found to SmartOS on Linux is RancherOS. It is not identical, but darn it, it is close to what we need, and I can replace the kernel, add in a few things we need. Ubuntu is making a strong play for this as well adding ZFS to its mix.

But again, SmartOS != Solaris. I played with recent Solaris a few months ago to see how it had progressed. Still not that impressive (especially compared to SmartOS and others).

So while Solaris is going away, I don’t think it will be missed greatly. If the licensing could be made to work to cross-pollinate between Linux and SmartOS, I’d bet we could solve the driver problem too.

/sigh

Viewed 38135 times by 2616 viewers