FWIW

We have been asked to do some benchmarking of CCS systems using a number of codes. I wanted us to do better ports of the codes, so that they get at least performance parity with Linux. There is lots of FUD eminating from the groups about superiority in one aspect or another, and we want to ignore that, fix the bottlenecks, and get good performance on windows.


The last time we dealt with something like this was with Solaris 10 (and to a lesser extent, OSX before that).

Sun claims/claimed that Solaris 10 just runs better/faster/yadda yadda yadda. So we took Scalable HMMer, and built it on Solaris 10. It turned out to be about 1/2 the speed of the Linux build. Ok, spoke to a bunch of Sun developer folks, read man pages, googled. Found the “optimal” options according to the developers, and … The solaris binary was 25% slower than the Linux binary, on the same hardware. Unit dual booted between the two OSes. We tried other codes and found a range of performance deltas. Building Linux codes with GCC, PGI, and PathScale, and Solaris codes with GCC and Studio 11. We worked hard to try to achieve performance parity between the OSes, but it proved to be too hard in the end, we would have to go back and do Solaris specific tuning in order to achieve this.

Well, if we are going to do a reasonable check of what a clustered windows system can do, it seems natural to at least get the best single CPU performance we can in a limited time frame. This means working with/tuning the code. Anyone who has done this (right) knows that this is a labor/time intensive process.

My hope is that the folks who asked us about this will realize these issues. We could simply benchmark the codes as they exist now. Something tells me this would not be a happy comparison for CCS. Data I have seen thus far suggests that it has a ways to go. And we wanted to help move some of these codes along the curve.

I am personally (highly) concerned about the impact of virus/spam scanners on IO operations, and all the OS-jitter that negatively impacts DMP performance on other platforms. Windows isn’t immune to this, and a large part of tuning is figuring out how to quiesce a machine. Turn stuff you don’t need off. Then tune your stack.

Viewed 10491 times by 2293 viewers

Facebooktwittergoogle_plusredditpinterestlinkedinmail