#SC14 day 2: @LuceraHQ tops @scalableinfo hardware … with Scalable Info hardware …

Report XTR141111 was just released by STAC Research for the M3 benchmarks. We are absolutely thrilled, as some of our records were bested by newer versions of our hardware with newer software stack. Congratulations to Lucera, STAC Research for getting the results out, and the good folks at McObject for building the underlying database technology.

This result continues and extends Scalable Informatics domination of the STAC M3 results. I’ll check to be sure, but I believe we are now the hardware side of most of the published records.

Whats really cool about this is that you can get this power from Lucera if you don’t want or need to stand up your own kit, from us or our partners if you prefer to stand up your own private cloud, or combinations if you would like to take advantage of all the additional capabilities and functionality Lucera brings to the table.

Viewed 5016 times by 800 viewers

Starting to come around to the idea that swap in any form, is evil

Here’s the basic theory behind swap space. Memory is expensive, disk is cheap. Only use the faster memory for active things, and aggressively swap out the less used things. This provides a virtual address space larger than physical/logical memory.

Great, right?

No. Heres why.

1) swap makes the assumption that you can always write/read to persistent memory (disk/swap). It never assumes persistent memory could have a failure. Hence, if some amount of paged data on disk suddenly disappeared, well …

Put another way, it increases your failure likelihood, by involving components with higher probability of failure into a pathway which assumes no failure.

2) it uses 4k pages (on linux). Just. Shoot. Me. Now. Ok, there are ways to tune this a bit, and we’ve done this, but … but … you really don’t want to do many many 4k IOs to a storage device. Even an SSD.

NVMe/MCS may help here. But you still have the issue number 1, unless you can guarantee atomic/replicated writes to the NVMe/MCS.

3) Performance. Sure, go ahead and allocate, and then touch every page of that 2TB memory allocation on your 128GB machine. Go ahead. I’ve got a decade or two to wait.

4) Interaction with the IO layer is sometimes buggy in surprising ways. If you use a file system, or a network attached block device (think cloud-ish), and you need to allocate a SKB or some additional memory to write the block out, be prepared for some exciting (and not in the good way) failures, some spectacular kernel traces that you would swear are recursive allocation death spirals.

“Could not write block as we could not allocate memory to prepare swap block for write …”

Yeah. These are not fun.

5) OOM is evil. There is just no nice way to put this. OOM is evil. If it runs, think “wild west”. kill -9 bullets have been lobbed against, often, important, things. Using ICL to trace what happened will often lead you agape with amazement at the bloodbath you see in front of you.

So towards this end, we’ve been shutting off paging whenever possible, and the systems have been generally faster and more stable. We’ve got some ideas on even better isolation of services to prevent self flagellation of machines. But the take home lesson we’ve been learning is … buy more ram … it will save you headache and heartache.

Viewed 19192 times by 1323 viewers

#sc14 T-minus 2 days and counting #HPCmatters

On the plane down to NOLA. Going to do booth setup, and then network/machine/demo setup. We’ll have a demo visualfx reel from a customer whom uses Scalable Informatics JackRabbit, DeltaV (and as the result of an upgrade yesterday), Unison.

Looking forward to getting everything going, and it will be good to see everyone at the show!

Viewed 18990 times by 1273 viewers

Gui updates … oh my …

Viewed 23742 times by 1654 viewers

30TB flash disk, Parallel File System, massive network connectivity

This will be fun to watch run …

Scalable Informatics FastPath Unison for the win!

Viewed 25087 times by 1766 viewers

SC14 T minus 6 and counting

Scalable’s booth is #3053. We’ll have some good stuff, demos, talks, and people there. And coffee. Gotta have the coffee.

More soon, come by and visit us!

Viewed 25139 times by 1749 viewers

Mixing programming languages for fun and profit

I’ve been looking for a simple HTML5-ish way to represent our disk drives in our Unison units. I’ve been looking for some simple drawing libraries in javascript to make this higher level, so I don’t have to handle all the low level HTML5 bits.

I played with Raphael and a few others (including paper.js). I wound up implementing something in Raphael.

The code that generated this was a little unwieldly … as javascript doesn’t quite have all the constructs one might expect from a modern language. And thanks to its object orientation, its … er … somewhat more verbose than it really needs to be.

<div id="chassis" style="width:<% $win{x} %>px; height:< % $win{y} %>px" ></div>


var paper = Raphael({
	container: "chassis",
	width: < % $win{x} %>,
	height:< % $win{y} %>

var disks = [], labels = [], hoverArea = [], lx =[], ly = []; 
var x0,y0,x1,y1,label,count;

for (var row = 1; row < = <% $dim{rows} %> ; row++) {
	for (var col = 1; col < = <% $dim{cols} %> ; col++) {
		x0 = Math.round(< % $xpar{margin} %> + (col-1) * (< % $xpar{width}  + $xpar{spacing} %>) + 0.5);
		y0 = Math.round(< % $ypar{margin} %> + (< % $dim{rows} %> -(row-1)) * (< % $ypar{height} + $ypar{spacing} %>) + 0.5);
		x1 = Math.round(< % $xpar{width}  %> + 0.5 );
		y1 = Math.round(< % $ypar{height} %> + 0.5 );
		lx[count] = Math.round(x0 + x1/3 + 0.5 );
		ly[count] = Math.round(y0 + y1/3 + 0.5 );
		label = "R".concat(row.toString()).concat("\nC").concat(col.toString());
		disks[count] = paper.rect(x0,y0,x1,y1)
.attr({fill: "green", opacity: 0.2, stroke: 1})					.hover(	function () {    		
this.attr({fill: "blue", opacity: 0.2, stroke: 1});											  	},
function () {
this.attr({fill: "green", opacity: 0.2, stroke: 1});
}								  );
		labels[count]	= paper.text(lx[count],ly[count],label);		


Note that there are some constructs there not in javascript. They are the < % $variable %> bits. This is what we use to pass data from the perl template code (Mojolicious and HTML::Mason) into the HTML (and Javascript).

I guess I am finding it humorous that I am having Perl rewrite the Javascript as it emits it. This is minor, variable substitution, but I’ve done more major bits of editing which javascript goes out over the wire.

Viewed 22463 times by 1707 viewers

turnkey, low cost and high density 1PB usable at 20+ GB/s sustained

Fully turnkey, we’d ship a rack with everything pre-installed/configured. Some de-palletizing required, but its plug and play (power, disks) after that. More details, and a sign up to get a formal quote here.

This would be in 24U of rack space for less than $0.18/raw GB or $0.26/usable GB. Single file system name space, a single mount point. Leverages BeeGFS, and we have VMs to provide CIFS/SMB access, as well as NFS access, in addition to BeeGFS native client.

Oh, and we’ve got S3 object storage native on it as well.

How hard is it to make work:

De-palletize it, slide disks in, plug it in, turn on the manager nodes, run the external config script (via web page, console, serial port) to set up network, password, and other bits. And off you go.

What sort of usable capacity:

960TB in base configuration, scales to many 10s of PB.

What sort of performance:

~20GB/s streaming write performance, similar streaming read performance. Excellent metadata performance. Integrated flash cache for heavy IOP workloads.

What sort of network connectivity:

1GbE, 10GbE, 40GbE, FDR Infiniband. We provide an integrated 40GbE/Infiniband fabric. You may connect the fabric to your 10GbE/40GbE/Infiniband network.

But I already have storage:

No problem, you can add this on, and we have a trade in program. Contact us for more details.

But I need block based storage:

No problem, we provide high performance iSCSI/iSER/FCoE/SRP targets integrated into the unit.

But I need object based storage:

No problem, we include an S3 compatible object storage system.

Viewed 47051 times by 3440 viewers

Velocity matters

For the last decade plus, the day job has been preaching that performance is an advantage, a feature you need, a technological barrier for those with both inefficient infrastructures and built in resistance to address these issues. You find the latter usually at organizations with purchasing groups that dominate the users and the business owners.

The advent of big data, (ok, this is what the second or third time around now) with data sets that have been pushing performance capabilities of infrastructure has been putting the exclamation point on this for the past few years. Big data is all about turning massive data sets into actionable intelligence, and enabling people to make use of the data that they’ve gathered.

Our argument has been and remains, the faster you can make your analytics, both the larger data sets you can analyze, and you can analyze them with more frequency (more models per unit time), and more accuracy (more sophisticated and coupled models representing a more accurate and testable set of hypotheses) you can obtain.

Our voice has been a lonely one in the wilderness for years until the Data Mining/Business Intelligence/Big data revolution was (re)born. Now, all of a sudden, we see article after article saying what we’ve been saying all along.

Here’s one at techcrunch (yes, I know, but still its an ok high level read). Written by a KPCB person, so they’ve got a built in bias as to what is important. And who is important.

But points 8 and 9 matter.

8. Data is the new competitive advantage: The promise of Big Data is its power to bring new insights to light. Improved analytics have triggered new, non-obvious ideas about how businesses operate. For instance, InsideSales.com discovered that a sales rep who calls a lead back within 5 minutes of a request for information boosts the likelihood of closing a sale by 10X. By harnessing big data sets, companies will discover patterns like this for the first time.

Well, it used to be called Data Mining. Then Business Intelligence. Now Big Data. The idea is to analyze your data, build testable hypotheses, and iterate until you have enough of a sense of how things might play out to build a set of tactics to further a bigger picture strategy. This is nothing new in one sense, but whats new is you have the ability to look at data at scale. And that starts opening up whole new vistas of inquiry.

9. Speed kills ? your competitors: Faster application performance will distinguish the winners in enterprise. Studies by Walmart and Compuware show that one additional second of latency in application delivery can decrease revenue by 10 percent. An emerging set of companies is focused on speed: Instart Logic accelerates application delivery, AppDynamics monitors and helps address application response time, and PernixData and Fusion-io leverage flash-based storage to make data delivery many times faster.

Put a different way, you need performance to be able to respond to rapidly changing conditions. Obviously KPCB is biased about which of these matter, but the point is that there are products that can help make things go faster.

And at the end of the day, the most important aspect of being able to work with and gain insight from data, is your ability to interact with the data, ask questions of the data, construct testable hypotheses, and run many tests.

Which is why Scalable Informatics matters, as the FastPath software defined appliances combine the highest performance hardware with the best and fastest software stacks for analytics.

Just an observation on this, that its nice that the market as a whole has come around to our viewpoint.

Viewed 65509 times by 4489 viewers

And the 0.8.3 InfluxDB no longer works with the InfluxDB perl module

I ran into this a few weeks ago, and am just getting around to debugging it now. Traced the code, set up a debugger and followed the path of execution, and … and …

Yup, its borked.

So, I can submit a patch or 3 against the InfluxDB code, or roll a simpler more general Time Series Data Base interface that will talk to InfluxDB. And eventually kdb+. Since I wanted to code for that as well, I am looking more seriously at the second option.

This means, in the near term, that influxdb-cli.pl is broken. I’ll work a simple interface to it via LWP or Mojo user agents, and will see what we can do from there. But the idea in the longer term is to leverage the db and analytics layer behind a common interface for us, one that won’t break (often).

Of course, the best of all possible worlds would be to figure out how to interface kdb+ as an engine for InfluxDB. It already lets you use a number of them. Had I time I might look at that. Unfortunately, we just ditched the go language from our toolchain (very long story, the language may be fine, but the environment and its ability to put it anywhere in a tree is so completely borked its not funny … even java is easier/less dumb about things), so building it could be a problem for us.

Viewed 68915 times by 4716 viewers

Optimization WordPress Plugins & Solutions by W3 EDGE