Learning to respect my gut feelings again

A “gut feeling” is, at a deep level, a fundamental sense of something that you can’t necessarily ascribe metrics to, you can’t quantify exactly. Its not always right. Its a subconscious set of facts, ideas, concepts that seem to suggest something below the analytical portion of your mind, and it could bias you into a particular set of directions. Or you could take it as an aberration and go with “facts”.

As an entrepreneur, I’ve had many gut feelings about what to do, when to do it. They aren’t always right. But when they are … they are usually whoppers.

In 2002, when Vipin and I were in his lab looking at 40 core DSP chips, and speaking aloud about building accelerators out of them … that was a gut feeling that the high performance computing market had no other choice than to go that route to continue to advance in performance. We built business plans, architected designs for the platform, pricing models, went to investors, told them things that later turned out to be remarkably prescient. No one saw fit to invest unfortunately.

This has been a feature of my career. Many very good ideas, that later turn out to be huge markets, developing almost exactly as we speculated, and we can’t get investment. Its fundamentally, profoundly frustrating. Not to mention discouraging. But we soldier on.

In 2005, when we had sold the concept of remote computer cycles (what was to be later called “cloud”) to a large company in Michigan, again, we tried to get investment going. We had a large committed customer, a good business model, a good operational model, even some investors lined up if we could get the state of Michigan to commit as part of its tri-corridor process. They only need have put in a token amount, and thats all we needed. I need not tell you this didn’t happen, and the reasons given were, sadly, laughably wrong at best.

Our gut feelings on both of these markets were that they were going to be huge.

To put it midly, we were right. Very very right.

The next epiphany was on the cluster and storage side. We’d been designing and building clusters up until then with embedded storage. Dell decided it wanted to own clusters, and it worked on depriving the small folks of oxygen with pricing gymnastics. It was easy for them to write off coming in under cost on clusters, much harder for a small outfit to justify paying a customer to take hardware. My gut feeling at the time was that clusters would become an impossibly hard market to work in, so we focused upon where we could add our unique value. Storage and storage based systems it was.

Along the way, we’ve seen many opportunities, some looking very good but bugging me something fierce, and some looking bad on the surface, but having the qualities that we needed. So I went with my gut on whether or not to pursue those.

And we’ve grown, by quite a bit during that time. There is much to be said for subconcious analytics.

As we’ve grown, we’ve brought on more people to work on opportunities. Recently we’ve had some opps and we’ve serviced them, which have run strongly against my gut feeling. As we’ve seen these evolve, my gut was right, they’ve turned into (in some cases) bad deals for us.

Also, as we look at our capitalization efforts, I get similar feelings about particular directions and potential investors. I don’t mind lots of legalese. We have lawyers to deal with that. I mind games. If people play them now, it will be worse later on. If they won’t act in reasonable time frames and with reasonable terms, we need to move on.

Its this gut feeling that has served us very well, that I temporarily overrode in the past … that I am bringing back in a big way.

We met with an erstwhile customer at the SC14 show. Makes great promises, sets additional hurdles. Never does business with us. I’d like to, but the cost of chasing this customer may simply be too high for us, for little return, at this stage of our life. If we were bigger, it would be less of an issue. If we had a large investor with a lot of committed capital in us, again, less of an issue to act on working with them. But we don’t as of yet. We have a particular hand of cards we can play, and some we need to discard in order to improve our hand. As much as I might like to play this hand with that card for that customer, my gut tells me to wait.

Its tough, but I’m going back to my gut feelings on these. For customers, for investors, whatever. If I don’t get a good feeling, or if I see actions which on their own might be innocuous, but collectively would be predatory, I’ll rethink working with them.

The gut feeling is all about the value prop and the ROI on effort. Sometimes its dead on. Far more often for us than not. Its time to use it again, in the large.

Viewed 477 times by 261 viewers

#SC14 day 2: @LuceraHQ tops @scalableinfo hardware … with Scalable Info hardware …

Report XTR141111 was just released by STAC Research for the M3 benchmarks. We are absolutely thrilled, as some of our records were bested by newer versions of our hardware with newer software stack. Congratulations to Lucera, STAC Research for getting the results out, and the good folks at McObject for building the underlying database technology.

This result continues and extends Scalable Informatics domination of the STAC M3 results. I’ll check to be sure, but I believe we are now the hardware side of most of the published records.

Whats really cool about this is that you can get this power from Lucera if you don’t want or need to stand up your own kit, from us or our partners if you prefer to stand up your own private cloud, or combinations if you would like to take advantage of all the additional capabilities and functionality Lucera brings to the table.

Viewed 8827 times by 1192 viewers

Starting to come around to the idea that swap in any form, is evil

Here’s the basic theory behind swap space. Memory is expensive, disk is cheap. Only use the faster memory for active things, and aggressively swap out the less used things. This provides a virtual address space larger than physical/logical memory.

Great, right?

No. Heres why.

1) swap makes the assumption that you can always write/read to persistent memory (disk/swap). It never assumes persistent memory could have a failure. Hence, if some amount of paged data on disk suddenly disappeared, well …

Put another way, it increases your failure likelihood, by involving components with higher probability of failure into a pathway which assumes no failure.

2) it uses 4k pages (on linux). Just. Shoot. Me. Now. Ok, there are ways to tune this a bit, and we’ve done this, but … but … you really don’t want to do many many 4k IOs to a storage device. Even an SSD.

NVMe/MCS may help here. But you still have the issue number 1, unless you can guarantee atomic/replicated writes to the NVMe/MCS.

3) Performance. Sure, go ahead and allocate, and then touch every page of that 2TB memory allocation on your 128GB machine. Go ahead. I’ve got a decade or two to wait.

4) Interaction with the IO layer is sometimes buggy in surprising ways. If you use a file system, or a network attached block device (think cloud-ish), and you need to allocate a SKB or some additional memory to write the block out, be prepared for some exciting (and not in the good way) failures, some spectacular kernel traces that you would swear are recursive allocation death spirals.

“Could not write block as we could not allocate memory to prepare swap block for write …”

Yeah. These are not fun.

5) OOM is evil. There is just no nice way to put this. OOM is evil. If it runs, think “wild west”. kill -9 bullets have been lobbed against, often, important, things. Using ICL to trace what happened will often lead you agape with amazement at the bloodbath you see in front of you.

So towards this end, we’ve been shutting off paging whenever possible, and the systems have been generally faster and more stable. We’ve got some ideas on even better isolation of services to prevent self flagellation of machines. But the take home lesson we’ve been learning is … buy more ram … it will save you headache and heartache.

Viewed 23043 times by 1652 viewers

#sc14 T-minus 2 days and counting #HPCmatters

On the plane down to NOLA. Going to do booth setup, and then network/machine/demo setup. We’ll have a demo visualfx reel from a customer whom uses Scalable Informatics JackRabbit, DeltaV (and as the result of an upgrade yesterday), Unison.

Looking forward to getting everything going, and it will be good to see everyone at the show!

Viewed 22774 times by 1593 viewers

Gui updates … oh my …

Viewed 27583 times by 1950 viewers

30TB flash disk, Parallel File System, massive network connectivity

This will be fun to watch run …

Scalable Informatics FastPath Unison for the win!

Viewed 28980 times by 2056 viewers

SC14 T minus 6 and counting

Scalable’s booth is #3053. We’ll have some good stuff, demos, talks, and people there. And coffee. Gotta have the coffee.

More soon, come by and visit us!

Viewed 28734 times by 2036 viewers

Mixing programming languages for fun and profit

I’ve been looking for a simple HTML5-ish way to represent our disk drives in our Unison units. I’ve been looking for some simple drawing libraries in javascript to make this higher level, so I don’t have to handle all the low level HTML5 bits.

I played with Raphael and a few others (including paper.js). I wound up implementing something in Raphael.

The code that generated this was a little unwieldly … as javascript doesn’t quite have all the constructs one might expect from a modern language. And thanks to its object orientation, its … er … somewhat more verbose than it really needs to be.

<div id="chassis" style="width:<% $win{x} %>px; height:< % $win{y} %>px" ></div>


var paper = Raphael({
	container: "chassis",
	width: < % $win{x} %>,
	height:< % $win{y} %>

var disks = [], labels = [], hoverArea = [], lx =[], ly = []; 
var x0,y0,x1,y1,label,count;

for (var row = 1; row < = <% $dim{rows} %> ; row++) {
	for (var col = 1; col < = <% $dim{cols} %> ; col++) {
		x0 = Math.round(< % $xpar{margin} %> + (col-1) * (< % $xpar{width}  + $xpar{spacing} %>) + 0.5);
		y0 = Math.round(< % $ypar{margin} %> + (< % $dim{rows} %> -(row-1)) * (< % $ypar{height} + $ypar{spacing} %>) + 0.5);
		x1 = Math.round(< % $xpar{width}  %> + 0.5 );
		y1 = Math.round(< % $ypar{height} %> + 0.5 );
		lx[count] = Math.round(x0 + x1/3 + 0.5 );
		ly[count] = Math.round(y0 + y1/3 + 0.5 );
		label = "R".concat(row.toString()).concat("\nC").concat(col.toString());
		disks[count] = paper.rect(x0,y0,x1,y1)
.attr({fill: "green", opacity: 0.2, stroke: 1})					.hover(	function () {    		
this.attr({fill: "blue", opacity: 0.2, stroke: 1});											  	},
function () {
this.attr({fill: "green", opacity: 0.2, stroke: 1});
}								  );
		labels[count]	= paper.text(lx[count],ly[count],label);		


Note that there are some constructs there not in javascript. They are the < % $variable %> bits. This is what we use to pass data from the perl template code (Mojolicious and HTML::Mason) into the HTML (and Javascript).

I guess I am finding it humorous that I am having Perl rewrite the Javascript as it emits it. This is minor, variable substitution, but I’ve done more major bits of editing which javascript goes out over the wire.

Viewed 23613 times by 1962 viewers

turnkey, low cost and high density 1PB usable at 20+ GB/s sustained

Fully turnkey, we’d ship a rack with everything pre-installed/configured. Some de-palletizing required, but its plug and play (power, disks) after that. More details, and a sign up to get a formal quote here.

This would be in 24U of rack space for less than $0.18/raw GB or $0.26/usable GB. Single file system name space, a single mount point. Leverages BeeGFS, and we have VMs to provide CIFS/SMB access, as well as NFS access, in addition to BeeGFS native client.

Oh, and we’ve got S3 object storage native on it as well.

How hard is it to make work:

De-palletize it, slide disks in, plug it in, turn on the manager nodes, run the external config script (via web page, console, serial port) to set up network, password, and other bits. And off you go.

What sort of usable capacity:

960TB in base configuration, scales to many 10s of PB.

What sort of performance:

~20GB/s streaming write performance, similar streaming read performance. Excellent metadata performance. Integrated flash cache for heavy IOP workloads.

What sort of network connectivity:

1GbE, 10GbE, 40GbE, FDR Infiniband. We provide an integrated 40GbE/Infiniband fabric. You may connect the fabric to your 10GbE/40GbE/Infiniband network.

But I already have storage:

No problem, you can add this on, and we have a trade in program. Contact us for more details.

But I need block based storage:

No problem, we provide high performance iSCSI/iSER/FCoE/SRP targets integrated into the unit.

But I need object based storage:

No problem, we include an S3 compatible object storage system.

Viewed 48172 times by 3596 viewers

Velocity matters

For the last decade plus, the day job has been preaching that performance is an advantage, a feature you need, a technological barrier for those with both inefficient infrastructures and built in resistance to address these issues. You find the latter usually at organizations with purchasing groups that dominate the users and the business owners.

The advent of big data, (ok, this is what the second or third time around now) with data sets that have been pushing performance capabilities of infrastructure has been putting the exclamation point on this for the past few years. Big data is all about turning massive data sets into actionable intelligence, and enabling people to make use of the data that they’ve gathered.

Our argument has been and remains, the faster you can make your analytics, both the larger data sets you can analyze, and you can analyze them with more frequency (more models per unit time), and more accuracy (more sophisticated and coupled models representing a more accurate and testable set of hypotheses) you can obtain.

Our voice has been a lonely one in the wilderness for years until the Data Mining/Business Intelligence/Big data revolution was (re)born. Now, all of a sudden, we see article after article saying what we’ve been saying all along.

Here’s one at techcrunch (yes, I know, but still its an ok high level read). Written by a KPCB person, so they’ve got a built in bias as to what is important. And who is important.

But points 8 and 9 matter.

8. Data is the new competitive advantage: The promise of Big Data is its power to bring new insights to light. Improved analytics have triggered new, non-obvious ideas about how businesses operate. For instance, InsideSales.com discovered that a sales rep who calls a lead back within 5 minutes of a request for information boosts the likelihood of closing a sale by 10X. By harnessing big data sets, companies will discover patterns like this for the first time.

Well, it used to be called Data Mining. Then Business Intelligence. Now Big Data. The idea is to analyze your data, build testable hypotheses, and iterate until you have enough of a sense of how things might play out to build a set of tactics to further a bigger picture strategy. This is nothing new in one sense, but whats new is you have the ability to look at data at scale. And that starts opening up whole new vistas of inquiry.

9. Speed kills ? your competitors: Faster application performance will distinguish the winners in enterprise. Studies by Walmart and Compuware show that one additional second of latency in application delivery can decrease revenue by 10 percent. An emerging set of companies is focused on speed: Instart Logic accelerates application delivery, AppDynamics monitors and helps address application response time, and PernixData and Fusion-io leverage flash-based storage to make data delivery many times faster.

Put a different way, you need performance to be able to respond to rapidly changing conditions. Obviously KPCB is biased about which of these matter, but the point is that there are products that can help make things go faster.

And at the end of the day, the most important aspect of being able to work with and gain insight from data, is your ability to interact with the data, ask questions of the data, construct testable hypotheses, and run many tests.

Which is why Scalable Informatics matters, as the FastPath software defined appliances combine the highest performance hardware with the best and fastest software stacks for analytics.

Just an observation on this, that its nice that the market as a whole has come around to our viewpoint.

Viewed 66621 times by 4628 viewers

Optimization WordPress Plugins & Solutions by W3 EDGE