I just saw this about doing a divide and conquer approach to massive scale genomics calculation. While not specific to the code in question, it looked familiar. Yeah, I think I’ve seen something like this before … and wrote the code to do it.
It was called SGI GenomeCluster.
It was original and innovative at the time, hiding the massively parallel nature of the computation behind a comfortable interface that end users already knew. It divided the work up, queued up many runs, and reassembled output. In as much the same order as possible. One of my test matrices was taking the md5sum of output of my code and the original. If they differed, it failed.
There were many aspects of this that were (at the time, 1999-2000) quite novel. So we filed a patent on it. Which was granted. It is Patent number 7,249,357 if you care to look.
Next gen version avoiding all of the patented elements was developed at my next employer, whom subsequently had a financial meltdown due to a failed acquisition (or more correctly, failed due diligence during acquisition, so they didn’t uncover the slightly well done books in time). MSC.Life was lost to the ages.
I left there and started Scalable Informatics. 13 years ago this Saturday.
While the folks at Broad and Google seem to have done wonderful things, they may not have been the first to do this. I myself was inspired by the previous work of HT-BLAST from my colleagues at the time. Some whom insisted that there was no way a distributed version of this could ever scale … there were simply too many issues. I have great respect for them, but I set out to prove that it could scale. And scale it did.
Later on, a number of very smart folks at a number of places built mpiblast. I worked on helping to package it and automate builds of it.
Paraphrasing Newton, we’ve seen further because we stood on our predecessors shoulders, as they built the platforms that we could stand on.
This isn’t to minimize what was done. Sort of like the history of the “discovery” of the FFT. Seems to have been “discovered” a number of times. I find that amusing to some degree, but the history of scientific advancement is often composed of half forgotten and half remembered things. Quaternions anyone? Maxwell’s equations in Quaternion representation are a single equation. Not to mention their applicability to special relativity Lorentz transformations …