Data loss, thanks to buggy driver or hardware

So this happened on the 3rd, on one of my systems Feb 3 03:02:39 calculon kernel: [195271.041118] INFO: task kworker/20:2:757 blocked for more than 120 seconds. Feb 3 03:02:39 calculon kernel: [195271.048116] Not tainted 4.20.6.nlytiq #1 Feb 3 03:02:39 calculon kernel: [195271.052678] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message. Feb 3 03:02:39 calculon kernel: [195271.060626] … Read more Data loss, thanks to buggy driver or hardware

Reflections on where we’ve been in HPC, and thoughts on where we are going

Looking back on past reviews from 2013 and a few other posts, and what has changed since then up to 2019 (its early, I know), I am struck by a particular thought I’ve expressed for decades now. In 2009 I wrote HPC has been moving relentlessly downmarket. Each wave of its motion has a destructive … Read more Reflections on where we’ve been in HPC, and thoughts on where we are going

Opening keynote @Supercomputing #SC18 : #HPC is an enabling technology …

… Ok, the speaker said far more than that. But one of his central theses is that in this “second” machine revolution, we are enabling data driven decision making, distributed decision and consensus, as well as expanding beyond the confines of specific expertise in a field. The latter I’ve heard described as cross fertilization … … Read more Opening keynote @Supercomputing #SC18 : #HPC is an enabling technology …

Looking forward to #SC18 next week and a discussion of all things #HPC

I’m attending SC18 next week. It’s been 3 years since I last attended (2015). Then we (@scalableinfo) had a large booth, lots of traffic, and showed off some of the first commercial NVMe high performance storage systems running BeeGFS over 100GbE. I am looking forward to talking with as many people as I can, to … Read more Looking forward to #SC18 next week and a discussion of all things #HPC

A bug in s3 buckets with no apparent way to request support to deal with it

This is a fun one, I’ve been playing with for the last 5 days or so. I’m helping someone out with backups, and they changed their mind on what they wanted backed up. So I started deleting the backups they didn’t want. One of the machines contained a set of directories for hashdeep which includes … Read more A bug in s3 buckets with no apparent way to request support to deal with it