Looking forward to #SC18 next week and a discussion of all things #HPC

By joe

November 6, 2018 - 5 minutes read - 913 words

I’m attending SC18 next week. It’s been 3 years since I last attended (2015). Then we (@scalableinfo) had a large booth, lots of traffic, and showed off some of the first commercial NVMe high performance storage systems running BeeGFS over 100GbE.

I am looking forward to talking with as many people as I can, to get their perspectives on things. To see what they are thinking, hear what they are doing, and in which direction they are going. Learn about new use cases I’ve not thought of before, or hear evolution of though on old “solved” use cases.

Feel free to ping me if you want to meet and grab a coffee. I’d love to hear what’s on your minds.

What makes me happy about all of this; HPC is everywhere … big data, analytics, ML/DL are all instances of specific HPC use case models. Companies are realizing how important HPC technologies and experience is in general. This would be a good time to be an HPC person, consultant with deep experience.

For the industry, 100GbE is everywhere now. No one is arguing over FEC vs non-FEC cables … this was a major problem at our booth, and actually killed a demo we paid thousands of dollars to run … because two 100GbE switch vendors couldn’t agree on this for cabling.

NVMe is everywhere, and people are posting “great” numbers. Well, their numbers are about the same as we posted in 2012 and 2013 using NAND SSD with our architecture … but hey … its a good try. I’ve been telling people for a while … there is no such thing as a silver bullet. If you don’t have a massively powerful data motion engine and architecture, it doesn’t matter what you put your sooper-dooper processing engine into.

HPC is in the cloud … in a way. Not traditional capability type HPC, but capacity/commodity HPC. Where the most important metric is cost per unit time, and performance is less sensitive to whole system architectural details. This is more about throughput performance. Which is perfect for a number of use cases (bioinformatics pipelines, ML training though you need fast local computing cores/GPUs/FPGAs, etc.).

Interestingly enough FPGAs are “advancing”, though I remain skeptical that they will ever be generally commercially viable outside of specific niches. However, I reserve the right to be proven wrong.

And of course, this is the year of ARM!

No, I’m kidding there. I don’t see ARM as being meaningful in any way to the market. This is not a slight at ARM. This is a deeply considered point related to the endless hype cycles around ARM. Remember when MIPS was everywhere (after taking over for motorola), in every embedded system, and they were taking over the world? No? I wonder why. MIPS cpus for HPC were great until about 1997, when Forest Basket and team at SGI killed Alien and Beast. After that … well … we got R10k respins. For ARM, the question is, which ISA, which ABI, which … well … you get it. If you haven’t converged on a single toolchain, their is no way that toolchain will gain adherents. You’ll have an expensive to maintain compiler with few users. This doesn’t work well.

Interestingly, AMD’s Zen architecture is gaining ground. If for no other reason, it is not Intel, though it is x86, and it provides a good bit of pricing and performance pressure to Intel. Competition in markets is important. AMDs chips are competitive with intel. If not, in a number of cases, dominating the competition on performance. The last time this happened, Intel worked overtime to kneecap AMD. I suspect they are trying again to do so, but I have the feeling that this will not be accepted by the market this time, given the number of Intel processor implementation flaws that keep coming to light.

In a similar manner, hardware hacking, and side channel attacks as active vulnerabilities are gaining in attention. The discovery that there are rings “below” ring 0 in x86 systems has caused some very serious angst, and rightly so. The impact of the vulnerabilities might mean that we have to choose between being able to remotely manage fleets of machines via tools like IPMI/Redfish, and using large numbers of serial concentrators and serial cables. FWIW, i’ve been an IPMI user for a long time … and I am distinctly worried about this.

I’ve even got an idea for a product for a “startup” to help out with this.

Apple hardware continues to evolve, and they have a T2 security chip with a secure enclave. I wonder how soon this will be broken. Security is not a feature/product, it is an implementation/architecture/process. Anyone telling you differently is selling you swamp land.

With accelerators dominating (gee, who could have predicted that … oh … I did, in 2003-7 when I tried to raise money to build accelerators) infra attacks are now possible, far faster than before. Imagine, someone figures out how to get a PCI GPU card to attack other cards in PCI space, or the CPUs, or RAM, or … Yeah, with great power comes great responsibility for secure processes, architectures, access, etc.

It is a very exciting market we are in, growing rapidly, with many players. Offload processors and heterogeneous computing continue to grow apace. I don’t see that changing any time soon. Actually, I think this may be the beginning of an evolutionary divergence for non-von Neumann architectures.