My vote for most awesome Mac OSX software

Karabiner If you switch back and forth between Linux and Mac on same keyboard, this is an absolute must have.

From my perspective, the keys in Mac are horribly borked. Home and End do not do what I expect. Control-Anything doesn’t work except in exceptional cases. iTerm2 (also very good Mac software) largely does the right thing on its own, but the keyboard side of MacOSX is basically borked. This lets you unbork it.

That is huge. I’ve been looking for this for years. The page that pointed me to it is here. My google-fu must not have been good in the past, as this is the first time I’d seen this …

What brought this about was sheer frustration at hitting the home key, expecting it to go to the beginning of the paragraph/line in Keynote, and watching it, insanely, go to the beginning of the file. And the same thing with end, though this time to the end of the file.

Seriously, this tool unborks that-which-was-borked.

Viewed 219 times by 155 viewers

Memory channel flash: is it over?

[full disclosure: day job has a relationship with Diablo]

Russell just pointed this out to me.

The short (pedestrian) version (I’ve got no information that is not public, so I can’t disclose something I don’t know anyway): Netlist filed a patent infringement suit against Diablo, and then included SanDisk as they bought Smart Storage, whom worked with Diablo prior to Smart being acquired by SanDisk. Netlist appears to have won an, at least temporary, injunction against Diablo.

Netlist makes fast DIMM chips and has IP in the fast DIMM interface. Yeah, highly simplified, but this is approximately correct. Its definitely more involved than that, but this is the pedestrian version.

Netlist claimed, and apparently convinced a patent court that it was being damaged by Diablo’s use of its IP. I know that part is in dispute by Diablo, and I cannot, and will not, comment on the merits of either the suit or any counter-suit.

It seems as part of this injunction involved SanDisk not being allowed to sell/ship its inventory. This aspect was just lifted. But SanDisk cannot acquire any more.

So what does this portend for memory channel flash?

I liked the idea, but for different reasons than others had been talking about in public. I’ve always felt that IO channel memory was a throwback to the old XMM/EMM PC days. What, you don’t remember those days? Putting a windowed ram card in an expansion chassis, addressing it 64kB at a time. It had some utility, but it used up valuable IO space. And it was slower than memory near the CPU. This was cured by using bigger memory address space systems.

Similarly, I looked at memory channel flash as a way to get flash closer to the CPU and away from the valuable IO channel lines. It could never really be primary memory, or even primary storage (unlike a number of pundits suggesting as such, this was a terrible idea). It would be fantastic as a temp space for paging, or for certain types of caching or persistence.

But thats on hold now, as Diablo and Netlist fight it out.

I’m not happy with this, and had hoped that a nice cross licensing would fix this quickly. Doesn’t look like this is happening though. And as Diablo is a startup, how long will they be able to hold out with revenues falling off? I am guessing they would be an acquisition target now for the likes of SanDisk or others (IBM?) whom has more power to push a deal with Netlist.

Not a great situation, though I am still hopeful for the Diablo team and the product. It looks really good, and we have a great use case.

Netlist isn’t a patent troll, they are legitimate technology company with interesting low latency memory DIMM technology. They came to our attention a number of years ago when we had very focused HFT customers trying to eek out any advantage anywhere. Diablo has been making good things IMO. I do wish there was a way to make this work for all.

Viewed 387 times by 228 viewers

New all-flash-array: SanDisk’s Infiniflash

Interesting development from SanDisk. Not quite an M&A bit, but an attempt at accelerating adoption of non-spinning storage by bringing out a proof of concept product in a few flavors. They are aiming at $2/GB for this system.

This is an array product though, so you need to attach it to a set of servers. Also, for something this large, the spec’s are kind of disappointing. 7GB/s maximum and 1M IOPs. Density up to 1/2 PB in 3U. We are currently at 1/4 PB in 4U, combined with a massive IO/compute/network capability, so that part is interesting. Our next gen will put that to shame though.

Not for nothing, but siFlash did 30+GB/s at much more than 1M IOPs (in an end user/real world test) 2 years ago. Indeed our new range of Cadence devices are … significant steps up from this … . Sadly, for that test, thanks to SanDisk’s acquisitive nature, our supplier for SSDs was bought, they jacked up the price of our drives, and drove the customer to seek other, low performance and low cost options. I don’t precisely know how they are doing, but I get the sense that they may realize you can’t fake performance, which is a problem if what you need, is, performance.

What makes this interesting is that this is a shot across the bows of Violin, Kaminario, Violin, Pure, Skyera, and many others. We don’t see this as particularly competitive in our space (Big Data appliances), as its a pure storage array. Moreover, our spinning disk systems do 7GB/s sustained, have integrated computers, 10GbE, 40GbE, IB, and in very short order, something much faster.

Moreover, Wikibon and others predict that the SAN market (that this is very much a play for) is in decline. Building new SAN elements today probably isn’t a good long term strategy.

But, understand what SanDisk wants to do. They want to spur adoption of flash. They want to be able to generate sufficient demand so that they can build more flash, more flash fab lines (not cheap!).

There are many contenders for the next generation of non-volatile memory (NVM). All of these contenders may have interesting advantages or drawbacks relative to flash. Flash’s big one is the limited number of write cycles. This said, I don’t see flash going away any time soon. Industry momentum is built up by folks like SanDisk pushing hard on things like this.

If anything, this will likely spur other vendors to either build or buy their own version of this. Moreover, with the advent of Big Data, dumb arrays are basically on the way out as Wikibon (and many others) have noted. This is part of why folks like EMC were looking for new things to freshen their business last year. They are arrays and filer heads. And other things in the federated company, but thats the storage side.

So I expect this announcement to light fires under folks like WD/HGST (hey, look, they just bought Amplidata), Seagate (Xyratex), SanDisk (FusioIO). Toshiba still hasn’t gotten into this game.

But I expect things like this to drive more M&A.

Viewed 566 times by 270 viewers

M&A: HGST acquires Amplidata

This is closer to home. Amplidata is an erasure coded cold storage system atop “cheap” hardware. HGST makes, of course, storage devices.

This continues a trend in vertical integration of folks with systems experience, and folks who make the things that go into these systems. If you control more of the stack, you can create more value to your bottom line … up to a point.

The flip side to this is if you start competing with your customers. This is a good way to kill a channel, and drive customers to your competitors.

The only major tier 1 vendor I don’t see doing this now is Toshiba. HGST/WD, Seagate, Sandisk are all building vertically, with integrated units of one sort or the other.

All these systems will compete with some segment of their customer base though. Finding and striking that balance is important. Where you can add value (cold storage, big data, massive performance storage) is where they could play nicely.

I do expect this to be fairly disruptive to a number of vendors in the space. Should be quite interesting.

Viewed 692 times by 304 viewers

M&A Avago (the LSI acquirers) just bought Emulex

Ok, this is starting to look like someone is buying up the tech behind storage and storage networking on the hardware side. Avago acquired LSI in 2013, and now they’ve done and grabbed Emulex.

Emulex has a large FC capability, but I can’t imagine that this is the only reason for this buy. They also have converged network adapters, RDMA and offload capability, and other bits. They are an OEM to many large vendors.

They also do ASICs, as does LSI. Their ASICs include storage, networking, and fabric controllers. Though their FC controllers peak at 8G. But it still complements the LSI bits which do 12g/6g SAS.

I had thought Emulex might be looking to sell at some point, but wasn’t sure who would grab them. This one seems to be making sense, and I get a picture of a larger Avago strategy emerging. If I were a betting man (founding a startup? nah, not a betting man …) I’d say they would be in the market fairly soon for a 40GbE player. Who is left in that camp: Chelsio, Solarflare, Mellanox, and now Intel. I’ve got an guess (won’t share it though, want to see if I am right).

Viewed 10123 times by 1754 viewers

influxdb cli queries now with regex

This is the way queries are supposed to work. Note the perl regex in the series name

unison> select * from  /^usn-ramboot.nettotals.kb(in|out)$/ limit 10
D[23261]  Scalable::TSDB::_generate_url; dbquery = 'select * from /^usn-ramboot.nettotals.kb(in|out)$/ limit 10'
D[23261]  Scalable::TSDB::_generate_url; query = 'p=XXXXXXXX&u=scalable&chunked=1&time_precision=s&q=select%20%2A%20from%20%2F%5Eusn-ramboot.nettotals.kb%28in%7Cout%29%24%2F%20limit%2010'
D[23261]  Scalable::TSDB::_generate_url; url = 'http://localhost:8086/db/unison/series?p=XXXXXXX&u=scalable&chunked=1&time_precision=s&q=select%20%2A%20from%20%2F%5Eusn-ramboot.nettotals.kb%28in%7Cout%29%24%2F%20limit%2010'
D[23261] Scalable::TSDB::_send_chunked_get_query -> reading 0.009837s 
D[23261] Scalable::TSDB::_send_chunked_get_query -> bytes_received = 530B 
D[23261] Scalable::TSDB::_send_chunked_get_query return code = 200
D[23261] Scalable::TSDB::_send_chunked_get_query cols = [time,sequence_number,usn-ramboot.nettotals.kbin]
D[23261] Scalable::TSDB::_send_chunked_get_query cols = [time,sequence_number,usn-ramboot.nettotals.kbout]
D[23261] Scalable::TSDB::_send_chunked_get_query -> mapping 0.001205s 
D[23261]; DB query 'select * from  /^usn-ramboot.nettotals.kb(in|out)$/ limit 10' took 0.011656s
D[23261]; output formatting took 0.000685s
|     results: query = 'select * from  /^usn-ramboot.nettotals.kb(in|out)$/ limit 10'     |
| time       | sequence_number | usn-ramboot.nettotals.kbin | usn-ramboot.nettotals.kbout |
| 1423495580 |               1 |                          1 |                           5 |
| 1423495579 |               1 |                          2 |                           6 |
| 1423495578 |               1 |                          1 |                           5 |
| 1423495577 |               1 |                          1 |                           5 |
| 1423495576 |               1 |                          1 |                           5 |
| 1423495575 |               1 |                          1 |                           5 |
| 1423495574 |               1 |                          1 |                           5 |
| 1423495573 |               1 |                          1 |                           5 |
| 1423495572 |               1 |                          1 |                           5 |
| 1423495571 |               1 |                          1 |                           5 |

D[23261]; outputting took 0.002347s

Viewed 25957 times by 2795 viewers

InfluxDB cli ready for people to play with

The code is on github. Installation should be simple

sudo make INSTALLPATH=/path/where/you/want/it

It will install any needed Perl modules for you. I’ve reduced the dependency set to LWP::UserAgent, Getopt::Lucid, JSON::PP, and some text processing. As much as I like Mojolicious, the UserAgent was 1/10th the speed of LWP for the same work.

Once it is done, point it over to an InfluxDB database instance:

landman@metal:~/work/development/influxdbcli$ ./ --user scalable --pass XXXXXXX --host --db unison --debug


list some series

unison> list series
D[713]  Scalable::TSDB::_generate_url; dbquery = 'list series'
D[713]  Scalable::TSDB::_generate_url; query = 'p=scalable&u=scalable&chunked=1&time_precision=s&q=list%20series'
D[713]  Scalable::TSDB::_generate_url; url = ''
D[713] Scalable::TSDB::_send_chunked_get_query -> reading 0.074382s 
D[713] Scalable::TSDB::_send_chunked_get_query -> reading 0.000197s 
D[713] Scalable::TSDB::_send_chunked_get_query -> reading 0.000173s 
D[713] Scalable::TSDB::_send_chunked_get_query -> bytes_received = 101922B 
D[713] Scalable::TSDB::_send_chunked_get_query return code = 200
D[713] Scalable::TSDB::_send_chunked_get_query tpos = 0
D[713] Scalable::TSDB::_send_chunked_get_query spos = 1
D[713] Scalable::TSDB::_send_chunked_get_query cols = [time,name]
D[713] Scalable::TSDB::_send_chunked_get_query -> mapping 0.138265s 
D[713]; DB query 'list series' took 0.219349s
D[713]; output formatting took 0.056791s
|      results: query = 'list series'      |
| series                                   |
| rsnode.sicloud.MHz.cpu0                  |
| rsnode.sicloud.MHz.cpu1                  |
| usn-ramboot.tcpinfo.icmperrs             |
| usn-ramboot.tcpinfo.iperrs               |
| usn-ramboot.tcpinfo.tcperrs              |
| usn-ramboot.tcpinfo.udperrs              |

D[713]; outputting took 0.109668s

The D[number] … bit are the debugging messages. You can turn debugging off if you wish by exiting and omitting the –debug option.

In short order, you’ll be able to toggle it from within the code itself.

Now lets select some values.

unison> select * from
	message	= 'syntax error, unexpected '-', expecting $end
select * from
	rc	= '400'

Uh oh, we ran into an escaping/quoting issue. Try again

unison> select * from "" limit 10
| results: query = 'select * from "" limit 10' |
| time                   | sequence_number            | value            |
|             1423495580 |                          1 |                0 |
|             1423495579 |                          1 |                0 |
|             1423495578 |                          1 |                0 |
|             1423495577 |                          1 |                0 |
|             1423495576 |                          1 |                0 |
|             1423495575 |                          1 |                0 |
|             1423495574 |                          1 |                0 |
|             1423495573 |                          1 |                0 |
|             1423495572 |                          1 |                0 |
|             1423495571 |                          1 |                0 |

Much better. Ok, what about querying multiple series …

unison> select * from "","usn-ramboot.swapinfo.used" limit 10
| results: query = 'select * from "","usn-ramboot.swapinfo.used" limit 10' |
| time                            | sequence_number                      | value                     |
|                      1423495580 |                                    1 |                         0 |
|                      1423495579 |                                    1 |                         0 |
|                      1423495578 |                                    1 |                         0 |
|                      1423495577 |                                    1 |                         0 |
|                      1423495576 |                                    1 |                         0 |
|                      1423495575 |                                    1 |                         0 |
|                      1423495574 |                                    1 |                         0 |
|                      1423495573 |                                    1 |                         0 |
|                      1423495572 |                                    1 |                         0 |
|                      1423495571 |                                    1 |                         0 |

Not too shabby. Why not do some computations on the values

unison> select value/10 from "" limit 10
| results: query = 'select value/10 from "" limit 10' |
| time                     | sequence_number               | expr0              |
|               1423495580 |                             1 |                  0 |
|               1423495579 |                             1 |                  0 |
|               1423495578 |                             1 |                  0 |
|               1423495577 |                             1 |                  0 |
|               1423495576 |                             1 |                  0 |
|               1423495575 |                             1 |                  0 |
|               1423495574 |                             1 |                  0 |
|               1423495573 |                             1 |                  0 |
|               1423495572 |                             1 |                  0 |
|               1423495571 |                             1 |                  0 |

unison> select mean(value/10) from "" limit 10
| results: query = 'select mean(value/10) from "" limit 10' |
| time                                     | mean                                     |
|                                        0 |                                        0 |

unison> select max(value/10) from "" group by time(1h)
| results: query = 'select max(value/10) from "" group by time(1h)' |
| time                                             | max                                      |
|                                       1423494000 |                                        0 |
|                                       1423490400 |                                        0 |
|                                       1423486800 |                                        0 |
|                                       1423483200 |                                        0 |
|                                       1423479600 |                                        0 |
|                                       1423476000 |                                        0 |
|                                       1423472400 |                                        0 |

You can see some pretty nice query capabilities. You can also output to csv and adjust the separator. And set an output file

unison> select max(value/10) from "" group by time(1d)
#time time max
1423440000 0
1423353600 0
1423267200 0
1423180800 0
1423094400 0
1423008000 0

unison> \set
unison> select max(value/10) from "" group by time(1d)

landman@metal:~/work/development/influxdbcli$ cat 
#time time max
1423440000 0
1423353600 0
1423267200 0
1423180800 0
1423094400 0
1423008000 0

Nice, huh?

There are still a number of bugs, but this is ready for alpha. Please do feel free to start beating on it, and we’ll fix bugs as rapidly as possible.

We’ve selected InfluxDB for a graphite replacement backend for Unison and indeed, for all our FastPath appliances. We’ll be updating sios-metrics plugins so we can remove the collectl dependency (massive overkill for what we need) on our monitoring.

Viewed 26235 times by 2823 viewers

So I finally figured it out

I’d been trying for a while in my spare time to understand why my incredibly simple Perl Mandelbrot test, inspired by the Julia benchmarks, was returning wrong numbers. Yeah, they were wrong. As in incorrect values.

So I figured it out this morning. The punchline. There is a bug (which I haven’t quite yet found) in the Math::Complex library, specifically in the pathway for the abs(z) function.

How did I find this.

Start with the C code

int mandel(double complex z) {
    int maxiter = 80;
    double complex c = z;
    for (int n=0; n<maxiter ; ++n) {
        if (cabs(z) > 2.0) {
            return n;
        z = z*z+c;
    return maxiter;

Very simple loop. Convert this to Perl

sub mandel {
    use Math::Complex;
    my $z = shift;
    my $c = $z;
    my $n;
    my $maxiter = 80;
    for($n=0; $n < $maxiter; $n++) {
	 if ( abs($z) > 2.0 ) { return $n  }
     $z = $z * $z + $c ;
    return $maxiter;

Run both codes with their drivers.


landman@lightning:~/work/benchmarking/Julia$ ./mandel.exe 
sum: 14791


landman@lightning:~/work/benchmarking/Julia$ ./ 
 sum mandel = 14722

The sum is both a sanity check and an optimizer defeater. It should be the same. And it is, in every other language.

So what is going on here.

I finally had time to think this through, and follow good debugging practice. Remove the impossible from consideration, until the only thing that remains is the answer.

Instrumenting everything that made sense, comparing results along the iteration pathway led me to believe that there is an issue with the Math::Complex code. So I started out replacing key aspects of it.

Finally, I changed the abs($z) portion out for its equivalent, and simplified the expression

sub mandel {
    use Math::Complex;
    my $z = shift;
    my $c = $z;
    my $maxiter = 80;
    for(my $n=0; $n < $maxiter; $n++) {
        if ( (Re($z)**2+Im($z)**2) > 4.0 ) { return $n  }
        $z *= $z;
        $z += $c;
    return $maxiter;

Basically I removed the call to abs($z)

landman@lightning:~/work/benchmarking/Julia$ ./ 
 sum mandel = 14791

Now swapping the abs($z) code back into that one line …

sub mandel {
    use Math::Complex;
    my $z = shift;
    my $c = $z;
    my $maxiter = 80;
    for(my $n=0; $n < $maxiter; $n++) {
        #if ( (Re($z)**2+Im($z)**2) > 4.0 ) { return $n  }
        if ( abs($z) > 2.0 ) { return $n  }
        $z *= $z;
        $z += $c;
    return $maxiter;

we get

landman@lightning:~/work/benchmarking/Julia$ ./ 
 sum mandel = 14722

Note how the sums are wrong.

Ok. Looking at the abs function in Math::Complex, it basically overloads the original core library abs code, and decides between abs(real), abs(cartesian) and abs(polar)

sub abs {
        my ($z, $rho) = @_ ? @_ : $_;
        unless (ref $z) {
            if (@_ == 2) {
                $_[0] = $_[1];
            } else {
                return CORE::abs($z);
        if (defined $rho) {
            $z->{'polar'} = [ $rho, ${$z->_polar}[1] ];
            $z->{p_dirty} = 0;
            $z->{c_dirty} = 1;
            return $rho;
        } else {
            return ${$z->_polar}[0];

Following through the code, it hits z->_polar, which is

sub _polar     {$_[0]->{p_dirty} ?
		   $_[0]->_update_polar : $_[0]->{'polar'}}

and we’ll hit _update_polar

sub _update_polar {
	my $self = shift;
	my ($x, $y) = @{$self->{'cartesian'}};
	$self->{p_dirty} = 0;
	return $self->{'polar'} = [0, 0] if $x == 0 && $y == 0;
	return $self->{'polar'} = [CORE::sqrt($x*$x + $y*$y),
				   CORE::atan2($y, $x)];

where the actual computation is done.

Ok. Deep call stack for something that should be fast and called frequently.

This is bad design IMO. I understand the desire to handle r,θ type coordinates as well as cartesian. But this introduces many issues in the library. Unacceptable tradeoffs IMO.

Core library code should be simple, fast, and as much as possible bug free. Even more than this, there is a whole extra computation that is not needed here (note the atan2 call).

All I can say is … wow.

This one is bad enough that I might rewrite this library, stripping it down to the bare essentials for cartesian complex numbers.

But more frightening in this is that I am getting what appears to be a failure in a Core:: function. Which means I probably have to look through this as well.

Enough for now.

Viewed 31738 times by 3178 viewers

love/hate relationship with new hardware

One of the dangers of dealing with newer hardware is often that, it doesn’t work so well. Or the drivers get hosed in mysterious ways.

We’ve got some nice shiny new 10GbE cards for a set of Unison systems going into a customer next week. We had some very odd issues with other 10GbE cards, so we rolled over to newer design cards. Younger silicon, younger design. Newer kernel module.

I can’t say I am enjoying this experience thus far. When we burn things in for customers, we expect drivers to be able to load/unload correctly during setup and shut down. As often as not, drivers misbehave or the hardware is somehow semi-stupid, and during the udev settle phase … it .. doesn’t .. settle. In fact, we see all manner of soft hangs on CPUs, grabbing resources, then crashing threads (which eventually takes down the whole machine).

This is 2015, and I expect driver initialization to not be hard. In fact, it should be bloody simple at this stage. Set hooks, register services, and prepare for initialization. Initialization itself should be very simple. Soft reset to hardware, which should either come up, or not. And if it fails to initialize, the initialization code should note this and return control.

It should not loop forever.

I should not have to blacklist drivers from loading during system boot, because they don’t know how to correctly initialize themselves. In our completely ramdisk based OS, I do exactly this though. For the stateful systems, I am regretting not doing this. I can easily install the stateless system atop the stateful system, and use overlays. This way I can force the issue, and not be beholden to borked driver initialization code.

The OS should come up, period. Drivers should initialize even if the hardware doesn’t, so it can report the failure of the hardware to respond correctly.

This is not the way I want to spend my late evenings.

Viewed 32668 times by 3251 viewers

Real measurement is hard

I had hinted at this last week, so I figure I better finish working on this and get it posted already. The previous bit with language choice wakeup was about the cost of Foreign Function Interfaces, and how well they were implemented. For many years I had honestly not looked as closely at Python as I should have. I’ve done some work in it, but Perl has been my go-to language. For me, the brevity of the interface, the ease of use of the FFI in it was what made me rethink some things.

I look at languages as tools to an end, a way to implement an algorithm, which will not always be expressed in the same language. I’ve been doing one manner or the other of FFI (not called that back then) since the mid to late 1980s. Usually Fortran and C, but as often as not, Assembler, Basic, Fortran, C, etc. One of the first codes I wrote for hire in 1984 was a terminate and stay resident driver for an experiment control card. Main code was in basic (I kid you not), and I exposed interrupt service routines to handle control functions. It was a fun thing to do, and later the experiment became famous. Regardless of that, the ability to understand where code was spending its time, and what it was doing became very tightly ingrained in me from that experience. Not that basic was fast … it wasn’t. I had to do all the hard/fast stuff in assembler. If you didn’t understand your code, chances are your measurements of the code, and your ability to effect changes would be … limited.

I look at rapid development languages (Perl, Python, etc.) as being high level mechanisms to bridge a variety of “fast” lower level bits (libraries, etc.) together. These tools often have good internal support for things that I frankly really don’t want to waste my time implementing.

Continue reading »

Viewed 44729 times by 5006 viewers