I read this announcement this morning. Our friends at Facebook releasing their reduced precision server side convolution and GEMM operations.
Many years ago, I tried to convince people that HPC moves both down market, into lower cost hardware, as well as more widely into more software toolchains. Basically, the decades of experience building very high performance applications and systems will have value downstream for many users over time.
GEMM is a generalized approach to a matrix multiply, which has been well optimized for HPC applications in various scientific libraries over time. It is a primitive upon which you can build other software, that performs matrix ops.
Curiously, 25 years ago, I was convinced that vector machines had pretty much lost out to scalar computing. Building big vector machines, with gigabytes of shared high performance memory, and many vector processors operating in a SIMD mode (massively oversimplified, but basically the case) were simply too expensive to build and operate.
Quick side note: A few months ago, I was in the market for a new car. I was comparing, among other things, Jeep Cherokee versions versus some similar Kia units. When I explained this to the Kia salesperson (nice guy), he said “no, you can’t compare them.” That’s when I realized something about the market, the market definition of segments, and how much some people care or not about that segmentation.
I could compare them, as to me, the buyer, these cars had similar functionality, similar features. The only real difference to me was the cost. I don’t care about brand, as I know perfectly well that brand is an almost completely meaningless measure of anything. It represents intangible qualities that may or may not be relevant. You can have a bad experience with well known brands, and a great experience with smaller, less well known brands. And vice versa.
The learning for me was just because the market is segmented by a vendor in a particular way, it doesn’t mean that the customers will honor that segmentation, or necessarily buy into it.
This is directly relevant for HPC, and the “death” of big vector machines.
Vector systems didn’t die. Oh, the large supers fell out of favor, in place came scalar machines of numerous flavors.
Similar arguments of RISC vs CISC many years ago, were in part about market segmentation. Many of these arguments broke down when customers started comparing these “incomparable” systems. And started using the lower cost systems. Thus pushing HPC down market, into a much wider potential set of users.
Put in more business speak, the total addressable market increased by orders of magnitude, while pricing dropped significantly. More toolkits made this processing power more easily available as primitives that people could easily leverage to solve higher level problems. This was a core impact of the HPC software development efforts over the years … create good tools that people can build upon, instead of having everyone write their own sparse solver or eigenvalue extractor, etc.
Fast forward to today. Accelerators, as heterogeneous GPU (vector) processors, with huge memory and memory bandwidths, execute these optimized kernels. CPUs have SIMD capability within themselves via SSE/AVX bits. So primitives are built upon them, and software development focuses on adding value further up the tool chain.
Sometimes, you revisit these primitives, see where you can make tradeoffs between precision, performance, and other important metrics to your higher end objectives. And you solve those problems.
FBGEMM is an example of that. Its HPC, embedded within a business focused software primitive that FB uses to perform their calculations with greater speed.
This said, I find this focus on performance satisfying, as it echos some of my previous company’s key theses.
- Performance is a competitive advantage. If you can do something to increase your performance without expending a great deal of precious capital, time, or energy, this can give you a significant boost in your own offerings
- What would you be able to do if you had far more throughput than you have today? Would it be a game changer or a wash? Would it allow you to tackle more problems per unit time, larger problems per unit time, try things you simply were unable to conceive of before?
- Performance is an architecture, not a product. There are no silver performance bullets (e.g. products that will magically make your system orders of magnitude faster), and poor designs often lead to sub-optimal performance and sub-optimal scaling.
Scalable Informatics had designed a massive data motion engine in its systems. I learned earlier this year that a set of systems with many new NVMe units, 4 generation later processors and memory, still … still … 5 years later … has not bested one of our STAC M3 metrics. Its all about the architecture. Whether its the code, or the hardware.
HPC continues its inexorable march downstream, into wider and wider use cases, products, X-as-a-Service offerings. It’s a great time to be in HPC!