The SSDs that failed

The OEM went silent. We reported the issues, opened RMAs. To say I am not pleased … well …
These are Corsair CMFSSD-32D1 units. According to their site

100+ Year Life Expectancy (MTBF)

Ummm … no. Not even close.
We are experiencing about a 70% failure rate, within 3 months of acquisition. In many different chassis, in many different parts of the world, with many different power supplies, many different motherboards.
This is a time correlated failure. I have never … ever … in 25+ years doing this stuff … ever … seen anything like this.
Its either a really … really bad silicon error in a controller chip or a firmware bug … or some other crappy part.
We firmly unrecommend this unit. Do not purchase it. Our customers have experienced grief and lost data from it.

Read moreThe SSDs that failed

9/11 memorium

Not an HPC topic, but one that all Americans can reflect upon … their thoughts, their experiences … and to resolve not to let this happen ever again, in any form.
Never again.
I had left SGI, and was working for a smaller engineering software company. I had lined up a bunch of interviews for an open position we had. My buddy Al was flying out from NY with his team to visit someone I know in Ann Arbor, and I was going to try to grab them for lunch or dinner.
I was getting dressed for work, my (at the time) 21 month old daughter was playing around the house. My mother-in-law had called to tell us that something had happened to the WTC. I turned on the coverage and watched. This was at 8:59am or so. Speculation was a plane hit the tower.
I debug stuff. I look around for contributing factors, and I eleminate things. Foggy day? No, very clear. Failure in control system? Possible.
Intentional? Hell no. Who could be so insane to do such an incredibly stupid thing?
That was my world view at 9:03am
Then I saw the second plane hit. On live TV.

Read more9/11 memorium

Ceph updates

rbd is in testing. Have a look at the link, but here are some of the highlights

The basic feature set:

  • network block device backed by objects in the Ceph distributed object store (rados)
  • thinly provisioned
  • image resizing
  • image export/import/copy/rename
  • read-only snapshots
  • revert to snapshot
  • Linux and qemu/kvm clients

We are doing something like this now, to a degree, with a mashup of tools in our target.pl creator. Though not likely as nice/clean as this.
Ceph builds upon BTRFS, which is an excellent underlying file system, also maturing alongside Ceph. BTRFS has been called Linux’s answer to ZFS, but if you go through a detailed design analysis comparison, you will see that BTRFS gets a number of things right that zfs doesn’t. From the article at the always wonderful LWN.net:

Read moreCeph updates

We've come a long way in 13 years …

Have a look at today’s Google home page, and you see the 25th anniversary of buckyballs, aka fullerene, which are particular structures made out of carbon. These fullerenes are very much related to graphite (pencil “lead”), and have some very interesting physics and chemistry of their own.
They were discovered when I was in my Sophmore/Junior years as an undergraduate. Not feeling old. Nosiree.
This isn’t what the post is about, and yes, there is a huge connection to HPC. Notice that on the Google home page, you can interact with the buckyball. Well, rotate it anyway. No special tools required.
In the late 90’s I worked for SGI. During that time I got to play with lots of chemistry codes, and I wanted a simple web based tool to enable a 3D viewer for the molecules. So I used VRML and associated viewers. I wrote a code that took an arbitrary molecule format, and converted it into a VRML representation. Even added an axes set in the middle. Then I added in bonds using a nearest neighbor calculation and some heuristics. For large molecules, the nearest neighbor calculation, done in Perl at the time was too slow, so I rewrote that part in C to speed it up.

Read moreWe've come a long way in 13 years …