Summary: Performing a complex computational science and engineering calculation today is more than about just buying a big supercomputer. Although HPC traditionally stands for “high-performance computing,” we believe that the real end-to-end solution should be about “high-productivity computing.” What we mean by “high-productivity computing” is the whole computational and data-handling infrastructure, as well as the tools, technologies, and platforms required to coordinate, execute, and monitor such a calculation end-to-end.
Many challenges are associated with delivering a general high-productivity computing (HPC) solution for engineering and scientific domain problems. In this article, we discuss these challenges based on the typical requirements of such problems, propose various solutions, and demonstrate how they have been deployed to users in a specific end-to-end environmental-science exemplar. Our general technical solution will potentially translate to any solution requiring controlling and interface layers for a distributed service-oriented HPC service.
Sounds great, they are going to teach us a set of best practices for HPC.
Cool. I like learning new things, so this should be helpful.
They continue a little later on …
Requirements of High-Productivity Computing Solutions
In the domains of engineering and science, HPC solutions can be used to crunch complex mathematical problems in a variety of areas, such as statistical calculations for genetic epidemiology, fluid dynamics calculations for the aerospace industry, and global environmental modeling. Increasingly, the challenge is in integrating all of the components required to compose, execute, and analyze the results from large-scale computational and data-handling problems.
Even with such diverse differences in the problems, the requirements for the solutions have similar features, because of the domain context and the complexity of the problem at hand.
Hmmm… If they think HPC is all about solutions can be used to crunch complex mathematical problems in a variety of area, then we have a problem. HPC has grown well beyond the boundaries of old. No longer is everyone working on CG solvers, or writing multi-grid methods for self-consistent field equations (something from grad school that I enjoyed developing). Now we see massive string searches (BLAST), HMMs, massive data mining, and other techniques which are not often represented in terms of large matrix equations … which is what Top500, HPCC, and friends are about. HPC has grown. It encompasses more. It is a brave new world, with more users and workers than before. Missing any sort of capitalization of note; as VCs are busy chasing web 3.0, and the next big social networking/movie upload/picture sharing sites.
But I digress.
Designed for a Solution to a Specific Problem
Because the calculations and industry involvement are diverse, there are no particular solution providers for any given problem, resulting in highly individualized solutions emerging in any given research department or corporation requiring these calculations. This individuality is compounded by the small number of teams actually seeking to solve such problems and perhaps the need to maintain the intellectual property of algorithms or other aspects of specific processes. Individuality is not in itself an issue; it might be a very good thing. But, given that the technical solutions are a means to an end, it is likely that these individual solutions are not “productized” and, thus, are probably difficult to interact with or obscure in other ways.
Hmmm. They must not have noticed all those companies selling productized clusters (Such as my day job). There are a number of them out there. Quite a few to service the market. Some have design and implementation expertise which is quite valuable, some simply rack and stack boxes at a very low price. The point is that, if you are not purposefully ignoring the market dynamics, you already know that there are many productized cluster solutions out there.
Sure, you can craft your own. Its worth doing once or twice. This way you have an appreciation for what it is that needs to be done.
The flip side of this is that unless your job is to be building clusters, it is probably a better idea to buy a cluster with the expertise to make it work correctly. It will lower your costs, reduce the time to solution, and enable you to achieve what you need to achieve, without locking you into a particular vendors platforms. That lockin is in part what raises your costs. The de-commoditization of a commodity based tool only increases your costs, reduces your choices. More about this later on.
But we finally get to the meat of this.
High-Performance Computing with Microsoft Windows Compute Cluster Server Edition
The core service required for the solution is the actual High Performance Computing cluster capability. Microsoft Windows Compute Cluster Server Edition (CCS) provides clustering capabilities to satisfy the Compute step of the problem scenarios
Ah… got it. Explains why they ignored the predominant solution.
The rest of the paper goes on to talk about how the Microsoft platforms provide something that is roughly equivalent to what a rich web application layer and REST can and do provide atop a cluster today. Using Microsoft tools of course.
Oddly enough, all these problems have been pretty well solved already. I see a wheel being re-invented. The Not-Invented-Here view reigns supreme from what I can see. If they didn’t invent it, it doesn’t exist. Clusters? Hey, sounds like a good idea, lets get into this, and create mainstream HPC. The existing solution? Why would anyone want to use it (ignoring IDC numbers and real world data showing massive sustained uptake of the dominating competitive platform).
But that isn’t the real reason for this. Microsoft wants to help you save money. To wit
Architecting for high-productivity computing is not just a case of ensuring the “best” performance in order to compute results as quickly as possible; that is more of an expectation than a design feature. In the context of the overall value stream, the architecture must drive value from other areas, such as ease of access and decreasing cost of specialist skills to operate the system.
Yup. Lets take this cluster, add $200k of Microsoft stuff atop it, and get rid of the $100k you spend on the specialist running the other cluster. See, you have saved money.
The business model … its so clear now.
A successful architecture for high-productivity computing solutions involves consideration of the overall process alongside the computationally intensive activities, and, therefore, might use several integrated components to perform the individual aspects of the process.
I couldn’t agree more. The architectures have been pretty much solidified over the last several years of use in HPC. Where flexibility is needed, it is designed in. Where single-purpose is required, it is designed in. People have to be able to access and use the units transparently, from any platform.
Given the widespread availability of web tools, a web page is a great paradigm for cluster access and usage. Given the increasing number of non-windows machine in the corporate environment, a standards based web interface is indicated. There is a need to present a single workflow to end users regardless of platform (windows, Linux, MacOSX), so workflows have to be portable, which indicates one of the standard scripting methods: I like Perl myself, though Python or Ruby ought to work fine. The workflow is the glue, and that is one of many things Perl excels at.
Microsoft Cluster Compute Server Edition is easy to include as a service gateway inside a general n-tier application structure, and is simple to integrate via command-line or API hooks.
… as is the dominant clustering solution based upon the Linux OS. Moreover, there is a long history of doing this with Linux, and a great deal of built of knowledge and expertise.
Other available technologies can provide the basis for an HPC solution. In particular, Windows Workflow Foundation is well-suited to provide an application interface to CCS, because the features and extensibility of WF, such as persistence and tracking, lend themselves to the requirements of HPC-based solutions. The use of WF also opens up the available choices of user-experience technologies to be applied in a given domain.
Yup. Microsoft tools to a great job working with Microsoft tools. Ok, kinda-sorta. Had some interesting PowerPoint-PowerPoint misinteractions and Word version fiascos a few months ago. Microsoft tools sometimes don’t play well together. In that case, I had to use OpenOffice to resave the document to get Office to work with it. And you will increase your overall outlay for the Microsoft tools. And your TCO will be far higher, on acquisition, on maintenance, and on getting end users/support staff trained on “specialist” skills.
But at least you won’t have to pay those specialists.
Look at it this way: Soon Microsoft will come out with Microsoft CFD, which will allow management to set up and run hard CFD problems at a click of a mouse, without paying for all those pesky expensive CFD specialists. And soon we might see Microsoft BrainSurgery, which will allow hospital administrators to perform brain surgery at the click of a mouse, without paying for all those pesky expensive Brain Surgeons. You wouldn’t want those guys running around mucking things up now. Microsoft has got you covered. They will send out the MCSE to design and implement a solution for you, and train you on CFD or Brain Surgery. Or HPC.
The business rationale for Microsoft is a replacement paradigm. Replace all those expensive pesky specialists who know what they are doing with far cheaper generalists who can just about spell HPC, while getting you to take all that money you have “saved” and plow it, and lots more, into Microsoft products. CCS is not a solution, it is a sales vehicle for Microsoft. They won’t make money on CCS, it is pretty much impossible for them to do so. They will try to make money around it.
All this said, Microsoft is throwing (as in giving away) huge amounts of “marketing” money to larger cluster shops to get their product out, to bundle it, or sell it. Customers I have spoken with, and we have asked many now, are not interested in a replacement paradigm. They want an augmentation paradigm. They want windows to work more closely with their Linux clusters. If they set up a windows cluster they want it to be able to run Linux as well. We can do that quite nicely as it turns out, there are no technological hurdles for this. There are business hurdles. Microsoft licensing makes this pretty close to impossible to do. Without CCS of course.
Customers want much better interoperability, and Microsoft wants simple replacement. They will need to be dragged kicking and screaming into interoperability, and will be actively working and preaching against it the entire time. Look at the ODF fiasco. If OpenOffice can create an Office document import/exporter, and they are an open source company, why can’t the worlds largest software shop? Aren’t they better? Can’t they do this if they want to?
Its the Not-Invented-Here approach, and a corporate ego the size of their bank account. And it will cost anyone who buys into this approach more money, and reduce their flexibility. Sad.