sios-metrics code now on github

See link for more details. It allows us to gather many metrics, saves them nicely in the database. This enables very rapid and simple data collection, even for complex data needs.

Viewed 19324 times by 1549 viewers

Solved the major socket bug … and it was a layer 8 problem

I’d like to offer an excuse. But I can’t. It was one single missing newline.

Just one. Missing. Newline.

I changed my config file to use port 10000. I set up an nc listener on the remote host.

nc -k -l a.b.c.d 10000

Then I invoked the code. And the data showed up.

Without a ()*&*(&%&$%*&(^ newline.

That couldn’t possibly be it. Could it? No. Its way to freaking simple.

I went so far as to swap out using Socket::Class rather than IO::Network::Socket. Ok, admittedly, that was not a hard changeover. Very easy in fact.

But it gave me the same results.

Then I got the idea of putting up the nc listener as above. And no freaking newline.

It couldn’t be that this was the difference … could it?

This is just too bloody simple. Really, no way on earth it could be that. The bug is subtle, damn it, not simple!!!

So, to demonstrate that I am not a blithering moron, I put the newline in the send method.

And it started working.

/sigh

I am a blithering moron.

The other code (accidentally) included a newline in the value. And thats why it worked. This one, I happily removed the newline. And then things fell over.

Newline is like a semicolon at the end of the line for a programming language, some APIs require it and assume it. The socket API does not assume it, and will happily send whatever buffer you hand it. This is the correct behavior. The data collector API assumes that received data is terminated by a newline, so it can start its parsing.

Its the details that are killers.

Ok, it works now. Onto more parallel monitor debugging. Making sure it gets into the InfluxDB correctly. Once we have this done, a major issue in monitoring/metrics I’ve been itching to solve correctly is done.

I’ll put the code up for this as well. The monitoring code will work on *nix, MacOSX, and Windows, no matter if it is in VMs, containers, physical servers. This is why its so important for us, we can monitor near anything with it.

Viewed 21386 times by 1734 viewers

New monitoring tool, and a very subtle bug

I’ve been working on coding up some additional monitoring capability, and had an idea a long time ago for a very general monitoring concept. Nothing terribly original, not quite nagios, but something easier to use/deploy. Finally I decided to work on it today.

The monitoring code talks to a graphite backend. Could talk to statsd, or other things. In this case, we are using the InfluxDB plugin for graphite. I wanted an insanely simple local data collector. And I want it controllable in a very simple config file manner. This is to run on the client being monitored, and the data pushed back to the database.

Here is the basic config file layout:

# config-file-type: JSON 1
{
   "global" : {
      "log_to_file" : "1",
      "log_name" : "/tmp/metrics-$system.log",
    },

    "db" : {
      "host"    : "a.b.c.d",
      "port"    : "2003",
      "proto"   : "tcp"
    },

   "metrics" : {      
      "uptime" : {
          "command"   : "/home/landman/work/development/gui/plugins/uptime.pl",
          "interval"  : 5,
          "timeout"   : 2                
      },      
   } 
}

The database is pointed to by the db section, and the metrics are contained in the data structure as indicated. Command could be a script or a command line. Interval is the time in seconds between runs, and timeout is the maximum length in seconds before a child process is killed (preventing a run-away and accumulation of zombies).

The code reads this config, creates one thread per metric, opens its own connection to the database (yeah, potentially problematic for large numbers of metrics, will address that later). Then it takes the output of the command, does brain dead simple “parsing”.

The scripts look like this (can use any language, we don’t care)

#!/usr/bin/perl

use strict;
my ($rc,$u,$n);

chomp($rc = `cat /proc/uptime`);
if ($rc =~ /(\d+\.\d+)\s+(\d+\.\d+)/) {
   printf "uptime:%.2f\n",$1;
}

and the data they spit out is very simple as well.

landman@lightning:~/work/development/gui$ plugins/uptime.pl 
uptime:104612.03

Simple future optimizations include launching a process once that wakes up at a configurable time to return data and then goes back to sleep. Potentially important on busy systems.

The metrics.pl code then pulls this data in, slightly reformats it for graphite, and sends it off.

It, well, mostly … sort of … worked. I had to fix two bugs. Technically, I didn’t fix the bugs. I worked around them. They were not good bugs, and they are showing me I might need to rethink writing this code in Perl.

The first bug I caught was quite surprising. Using the IPC::Run module, which I’ve used for almost a decade, I create a run harness, and start the code running. Everything executes correctly. Output is generated correctly. Gets pulled into the program.

Notice how I didn’t say “gets pulled into the program correctly”. It took me a while to find this, and I had to resort to “first principles”.

 # I can't find where this bug is, but the last character of mvalue is wrong ...
 @c =split(//, $mvalue);
 pop @c;
 $mvalue = join('',@c);
 # ... so lop it off

For some reason, and I’ve not figured it out, we were getting a carriage return appended to the end of the output. Chomp, the “smart” code that lops off newline characters on input lines, was unable to handle this.

I only saw it in my debugging output, when output lines were corrupted. Something that should never happen. Did.

Ok. Si the code above splits the mvalue into its characters. I included a


$mvalue = join("|",@c);

in the code so I could see what it thought that should be. And sure enough, thats how I caught the bug that should not be.

The work around is hackish, but I can live with it until I can figure the rest out.

Its the next bug that is brutal. I have a work around. And its evil. Which is why I am thinking about rewriting in another language, as this may point to a bug in part of the core functionality. I need this to work quickly, so I’ll use the hack in the near term, but longer term, I need to have something that works and is easy to work with.

I am using IO::Socket::INET to connect a client to a remote socket. I am doing this inside of the threads::shared paradigm in Perl. For a number of reasons, Perl doesn’t have real threads … well it does and it doesn’t. Long story, but threads::shared is the best compromise, leveraging forking and some “magic”. Sockets generally work well in forked environments … at least servers do. I am not sure about clients now.

Brain dead simple constructors, nothing odd. Checking return values. All appears to be well.

then I do a send($msg) and …

… not a bloody thing. It “succeeds” but the data never shows up in the database.

So, here comes the hack. The send($msg) call is logically equivalent to “echo $msg | nc $host $port”, so replace that one line with the send, with this external call. See what happens.

Now data starts showing up.

Of course, the single threaded version of the testing code, where I built the code to do the actual sends, works great. Data shows up.

But not when the identical code (both calling the same methods in the module), is running in the threads::shared environment.

Grrrrr….

I’ll figure it out later. But this is a subtle bug. Very hard to characterize, and then I have to chase it down.

Viewed 22697 times by 1818 viewers

New 8TB and 10TB drives from HGST, fit nicely into Unison

The TL;DR version: Imagine 60x 8TB drives (480TB about 1/2 PB) in a 4U unit or 4.8PB in a rack. Now make those 10TB drives. 600TB in 4U. 6PB in a full rack.

These are shingled drives, great for “cold” storage, object storage, etc. One of the many functions that Unison is used for. These aren’t really for standard POSIX file systems, as your read-modify-write length is of the order of a GB or so, on a per drive basis. But absolutely perfect for very large streaming loads. Think virtual tape, or streaming archives. Or streaming objects.

The short version is that we will use them when they make sense for customers extremely dense systems. The longer version is that you should be hearing more soon about this.

Just remember though that the larger the single storage element, the higher the storage bandwidth wall … the time to read/write the entire element. The higher this wall is, the colder the data is. Which for these drives, is their design point. But you still need sufficient bandwidth to drive these units, either over 10/40/100 GbE or IB of various flavors.

Viewed 32465 times by 2440 viewers

The Haswells are (officially) out

Great article summarizing information about them here. Of course, everyone and their brother put out press releases indicating that they would be supporting them. Rather than add to that cacophony (ok, just a little: All Scalable Informatics platforms are available with Haswell architecture, more details including benchies … soon …) we figured we’d let it die down, as the meaningful information will come from real user cases.

Haswell is interesting for a number of reasons, not the least of which is 16 DPi/cycle, but fundamentally, its a more efficient/faster chip in many regards. The ring architecture may show some interesting artifacts in high memory contention codes, so we might see a number of cases where lower core count (MCC) variants are faster at certain codes than the high core count (HCC) units.

DDR4 is welcome as a change, and the 2133 LRDIMMs should be the DIMM of choice for most use cases.

Haswell should provide a serious uptick to siFlash performormance, which is, as we occasionally remind people, the fastest single converged server storage device in market, and not by a little bit. It will also give DeltaV a serious kick forward. Couple the faster processing with the massive 12g data rail guns we have …

Yeah, this should be an interesting next few months :D

Viewed 34251 times by 2518 viewers

Be sure to vote for your favorites in the HPCWire readers choice awards

Scalable Informatics is nominated in

  1. #12 for Best HPC storage product or technology,
  2. #20 Top supercomputing achievement which could be for this, this on a single storage box, or this this result ,
  3. #21 Top 5 new products or technologies to watch for our Unison
  4. and #22 for Top 5 vendors to watch

Our friends at Lucera are nominated for #4, Best use of HPC in financial services

Please do vote for us and our friends at Lucera!

Viewed 34685 times by 2501 viewers

InfluxDB cli is up on github

I know there is a node version, and I did try it before I wrote my own. Actually, the reason I wrote my own was that I tried it and … well …

Link is here.

And yes, the readme is borked about 1/2 way through. Doesn’t quite show the formatting of the output quite right. Will try to fix over the weekend, as I move this a far more feature complete bit.

Also, this is my first github based project. Most of our public projects are on our gitlab instance.

Viewed 26750 times by 2676 viewers

Time series databases for metrics part 2

So I’ve been working with influxdb for a while now, and have a working/credible cli for it. I’ll have to put it up on github soon.

I am using it mostly as a graphite replacement, as its a compiled app versus a python code, and python isn’t terribly fast for this sort of work.

We want to save lots of data, and do so with 1 second resolution. Imagine I want to save a 64 bit measurement, and I am gathering say 100 per second. Thats 6.4kB/s of data in real time. This is a mixture of high and low level bits. Some of it can be summarized over more than a 1 second interval, but I’d rather do that summarization at the query side.

This is 522MB/day, per machine.

Take, say, 8 machines. This is 4.4 GB/day just for log storage.

Not really a problem, as 3 years is about 1096 days, or about 4.8TB.

Uncompressed, though compression would reduce this a bit.

None of this is a problem. That is, until you try to query the data.

Then simple selects without summarization are generating 2.2GB real memory usage by the query tool. Using a 60s average results in a manageable 20MB sized csv file from a single query which I can build analytical tools for.

But those queries take a long time.

I need the graphite replacement aspect for the inbound data to reduce the amount of code I’d need to write. Or conversely, I could simply write a new output plug-in for the data collector (using collectl for the moment for major metrics and some of our own code which fires things to graphite/statsd).

The options for the database are obviously influxdb and a few others. InfluxDB works, but will require management of data sets to work correctly. We’ll have to have paring queries, shard dropping and other bits to manage.

kdb+ is an option. There are many good things about it, and I think I could write a simple receiver for the graphite data to plug into the database. But … the free version of kdb+ is 32 bit. Note the database sizes I indicate above. I’d have to do a different sort of management with it. I’m not against that, I just have to compare the effort involved. This said, its quite likely kdb+ would be simply the fastest option.

There is Dalmatiner, which is crafted with performance in mind, but looks to depend upon ZFS, which I can’t use on Linux (and we can’t switch to an Illumos base for). Yes, I know, ZFS for Linux. Unfortunately, there are a few issues with this, not the least of which is the license clash, and our impression that this is something you should ask an attorney about rather than taking a risk of a very large corporation reading the situation in a different way from you, leveraging their considerable resources to enforce their viewpoint (fairly or unfairly).

Put another way, all the solutions I see in front of me have some sort of additional set of assumptions that would cause additional work or expense. I am still thinking on how to handle these, but will, at least for the moment, keep cranking on InfluxDB until I exhaust our capability with it.

We definitely need query performance to be very high, irrespective of the solution we use. I don’t mind adding storage capacity to handle additional data.

Viewed 30601 times by 3033 viewers

An article on Detroit that is worth the read

Detroit had filed for bankruptcy protection a while ago. The rationale for this was simple, they simply did not have the cash flow to pay for all their liabilities. They had limited access to debt markets for a number of reasons, and they couldn’t keep cranking up the taxes on residents and businesses in the city to generate revenue.

They were between a rock and a hard place.

I have a soft spot in my heart for Detroit. I went to grad school there. I spent many years going back and forth to the Physics building, and later in our lives, to the various restaurants of Greektown, the theatres downtown, the museums where my wife worked after grad school … .

The city has character. Sort of like the Steve Rodgers character in the recent Captain America movie, you can beat it up and its response is “I can do this all day.”

You root for the big D. You have to.

But, if you’ve lived here as long as I have (26 years now … sheesh!), you were struck by the complete, unabashed mixture of political corruption, incompetence, and everything bad you could possibly imagine rolled up into a city government. And it wasn’t just the city (and county for that matter) government that was the problem.

Thats where this article comes in. In this article, Mr. Williamson lays out the scope of the malfeasance. Its not that hard to understand, and it shows the danger of allowing ideological driven (city) business decisions to continue. This said, his article actually truncates the history of the malfeasance, which extended into the dim and distant past.

While some people of various ideological bents may cheer one group getting screwed over, while demanding that another be made whole … the point about the bankruptcy is it is an object lesson of what to never, ever, do when electing and running a city (or any other) government.

The article is a good read, and hopefully the big D can be revived. Its a beautiful place, even with the ruins.

Viewed 29458 times by 2891 viewers

XKCD on thesis defense

I guess I did it wrong …

See here

Viewed 42511 times by 3722 viewers

Optimization WordPress Plugins & Solutions by W3 EDGE