Short article on the growth of accelerators in life science work

I am quoted in there quite a bit.

This is GenomeWeb magazine covering the many aspects of what is called Bio-IT.

One of the massive problems around Bio-IT is moving data (go figure), storing data (again …), and processing data. I’ve heard some people provide arguments as to why accelerators won’t play there … and then I hear from people who have a limited time to get their work done, subject to an ever growing mound of data.

Their only real hope are accelerators. I suspect we will see more of these going on.

BTW: what got me looking at that was seeing people searching Google for GPU-HMMer from NVidia press releases … for our competitors. Its amusing … at some level …

BTW, I should point out that, due to a trademark just granted Sean Eddy at Janelia farms, anyone (including us) who have something that “isn’t HMMer”, will either have to a) rename it to avoid confusing the mark, or b) simply stop distributing it. The rationale behind this is explained in this post. I take issue with the discussion on forking, as from what I understand it, the team tried to submit the changes back to the core team. They were not accepted as I remember.

So, it is quite likely that our patches will be resubmitted. Question of whether or not they will be accepted is another story.

However, I should point out that in the GPL world, such use of trademark with GPL to control distribution and what gets called the product, usually winds up with something not unlike what has happened to Mozilla. In Debian, Ubuntu, and a myriad of other distributions, which have applied styling, or security patches, bug fixes, etc. you are not legally allowed to call that product Mozilla *.

So Icecat has been born. And Icefox. And …

You get the idea.

Basically, the effort that went into “protecting” the trademark simply pissed off the developers adding value and they forked it. Now there are multiple, partially incompatible versions of mozilla, which aren’t really mozilla for trademark reasons, but are in truth, mozilla. And mozilla corp has … well … not just egg on its face, but lost some of its … mojo … for lack of a better term, in the OSS community due to this.

I fear that may be the outcome here.

Others have written extensively on this. It is a far more common outcome than you might think.

I am wondering if there is a better approach here. Defining the algorithm correctness as a measure of whether or not you may use the trademark. That is, if it doesn’t pass all the tests, you can’t call it the same thing. Because it isn’t. And have the test suite adapt over time. So that the things to test don’t aim just for the test suite.

But it’s not my decision to make.

I am hoping for a good outcome on this, that lets everyone contribute to HMMer going forward.

Viewed 8331 times by 1832 viewers


4 thoughts on “Short article on the growth of accelerators in life science work

  1. Well, I stand corrected on a few of the issues I brought up. But others deserve appropriate clarification.

    I take issue with a number of statements. My commentary wasn’t “negative” per se, but it was concerned that the approach may not sit well with all willing contributers.

    As a willing contributer (speaking for myself), I’d like a process to work within, so we get a binary accept/reject on our contributions. I understand the rationale why this wasn’t the case (emailed patches can get lost, ignored, or annoy the receiver when they are quite busy).

    Specific disagreement on language is below.

    Now that there???s an HHMI trademark on ???HMMER???, it is true that we will be asking them to rename their products. Joe???s not happy about that.

    Not true … I am not unhappy about this. I am concerned on the impact of the trademark, how it has been indicated it will be used, and what could happen as a result of its use.

    The owner of the code (HHMI) is welcome to do whatever they want with the code. Thats theirs. They can open it up wide, or demand control at a very fine grain if it is open, or even close it. Its their code.

    Also, worthwhile noting, for the record, neither MPI-“HMMer” nor GPU-“HMMer” is commercial code. All is GPL, all source is available. Imputing that they are not OSS or that they are commercial products is not correct. Renaming is (relatively speaking) trivial. In fact, this was the point of something of the rest of the article, noting that this particular stage is usually about where (permanent) forks happen.

    Neither MPI nor GPU “HMMer” are products, of Scalable Informatics, or

    I do agree that a single up-to-date code base is best. I believed my colleagues did submit patches to that effect. I sent some myself several years ago (2005/6?) to improve the p7viterbi code performance. I never heard anything about this, so I maintained my own tree. I didn’t call it HMMer, I called it Scalable-HMMer, specifically to make sure that our users did NOT assume it was unmodified HMMer. I posted our patches with the source code on our download site. Sadly, most of that download site might be lost, but I can dig it out from other sources if people ask.

    As a result of these changes, and some of the other bits we and our partners worked on, we saw excellent performance deltas relative to the original code. Papers were submitted and published on this, as there is significant interest in how to increase performance of computationally bound code. Simply using “-O3” doesn’t do it.

    Similarly for MPI-“HMMer”, and later GPU-“HMMer” we built from trees we maintained. And in fact you can see the trees in our repository. And pull them down. This again is a usual scenario when the code owner doesn’t explicitly accept or reject the patches. I believed we had submitted them. I didn’t follow up what happened after that. Sean states that

    I explained to Joe and his collaborators that I was sorry they hadn???t talked with me earlier, because HMMER2 was end-of-life, and that I was focused on the HMMER3 project, so practically speaking, it was unlikely that any merge with respect to HMMER2 was going to happen.

    This said, Sean has every right to tell us “no”, that he doesn’t want the patches. Which, above, he does.

    This said, I don’t particularly recall this “no” coming at our meeting. But, I could have been caffeine deprived at the time or out of the room, so its possible that it was said, and I simply don’t recall it.

    All forks of old HMMER2 code, including MPI/GPU-HMMER, are obsolete, and the potential for confusion is maximal. The trademark is a stick we???ll use to drive that confusion out of the ???market??? as H3 rolls out.

    The latter highlighted sentence is the one I had thought was the rationale behind the trademark. I don’t think there is any confusion possible between the systems, but I am not the code owner. So we will rename. Since we know that the patches will not be accepted relative to the old code base, we won’t waste anyone’s time submitting them again.

    It is not our goal to confuse, nor to imply that MPI/GPU “HMMer” are in fact identical to HMMer. They aren’t. They are derivatives of HMMer code. Which is allowed and consistent under GPL. We will rename them to be consistent with trademark law so that the trademarks do not appear, minimizing the possible confusion due to the name.

    On his for the record section, I don’t seem to recall him picking up my airfare or my car rental. I do seem to remember getting us set up in some very nice rooms at Janelia farms. For the latter I thank him. If I was supposed to submit an expense report (but didn’t), then that was my loss. Still looking, but it looks like (apart from the room) my trip was at my expense. I don’t know about Vipin’s and JP’s (I did see prices for their tickets, so I assumed Vipin paid).

    I do remember during the discussions we were asked for our level of commitment to H3, and what we could do to help.

    And we offered what we could. It was pointed out to Sean that more than what we offered would require some sort of support on the academic side. Contrary to the language used in Sean’s post, it was a very congenial meeting, and no one was trying to bilk “uncle Howard”.

    Vipin has to demonstrate collaborations, publish papers, and do all these things … that require grant money. If he commits a person, which from what I could see, was being asked for, for a project, he needs a source of grant money or commercial grant to cover that. It is a reasonable question to ask, that being what support does an academic department have to committing resources to a project. Commercial entities have to ask and answer this all the time. We cannot provide services for free, we are not a charity. Every now and then we can do an equipment grant, but this is usually due to a customer trade in (had a linux cluster traded in once … )

    I noted to Sean that I ran a company, I had very limited free time to work on things (Vipin and JP can attest to this), as I have to focus upon revenue and products. It turns out that most of my work was during a very profitable year, where there was far less scrambling on my part so we had more time. I pointed out that we can work on revenue projects, and those get scheduled first. Non-revenue projects are on a “as time allows” basis. HMMer and our derivatives are definitively non-revenue for us.

    From this he may have thought we simply had hands out, asking “uncle Howard” for money. Not even remotely the case. Rather, we pointed out that there are costs, and we (Joe/Vipin) couldn’t cover them for a dedicated person.

    Moreover, I did see interest from JP and Vipin in continuing to work with Sean and his team (we met a few very nice people while we were there). I didn’t follow up on this aspect.

    Ok. All this said, its time to address an emerging meme I see in the response to my expressed concern on the use of a trademark to create more control over a GPL/OSS code.

    Basically, it looks like Sean thinks I was attacking him or the decisions to trademark the code. Let me be perfectly clear here. I was not. If Sean feels this way, then please accept this as a public apology for that. It was not intended as an attack.

    I did express concern that the trademark would be used as a stick (in the theoretical carrot-and-stick model). This was in fact confirmed.

    Further I postulated that people who were contributing to the code base might take exception to this use of a decidedly non-OSS method to implement control via name ownership rights over an OSS code. Again, everything being done is in the letter of the law, and the spirit of the law.

    Mozilla was the example I used. I did point out that contributers were somewhat miffed by this. Mozilla is within their rights to do this, but sometimes, as I intended to point out, this exercise of control can (and often does in OSS circles) have unintended consequences.

    One shouldn’t read in to what I wrote, beyond that.

    Onto the other important bits.

    I have nothing but respect for Sean, his team, his corpus of work. For the work going into H3. For his help, suggestions, guidance on H2 issues, H2 benchmarking.

    I do regret the direction I see this blog conversation going in, so let me try to derail that, and get a sensible direction.

    We have been, and IMO, as far as Scalable goes, remain committed to helping out. I don’t get to do much science these days, so every little bit I can play with, in whatever field, I enjoy. My favorite airplane problem (after laptop battery dies, as I code furiously on planes), is Goldbach’s conjecture. Yeah, I am a closet math person.

    This said, I would like to play with H3, and contribute as I can. This would be sparse contributions, take time, etc.

    It may be possible to forward port the MPI bits to H3. When we met last year, H3 wasn’t at a state (this was in my notes) where it was stable enough of a code base to do this port. Maybe it is now.

    It may be possible to forward port the GPU bits to H3. We have already had people ask about this. I suggested they try H3 first to see if they still need GPU given the performance delta.

    In both cases, I think it would be a good thing, if, while H3 should be 2 orders of magnitude faster than H2, there are problems that still take days and weeks that could be run on clusters in hours.

    That is, I think there is great value in a single code base. Since H2 patches won’t be accepted, we won’t submit them again. Since people are still using this, we will support it. Once H3 gets released, we’ll point over to it.

    I have been working with Vipin and JP to get the code/site renamed so that the HMMer trademark is not “confused” (I disagree that GPU/MPI-“HMMer” is confusable with “HMMer”, but we will make that change regardless).

    So going forward, I hope there is a process to submit patches against the code for features (MPI/GPU), or for functional plugins. There may be, there may not be. If there is one immutable base going forward, with no patches accepted (or explicitly rejected), we may need to maintain our own tree. Which we don’t want to do.

  2. i would be very interested in hearing more of your thoughts on goldbach’s conjecture 😉 i think of it as “every number is the midpoint of two primes” aka for all n there is an m such that n+m and n-m are both prime (and hence 2n = (n+m) + (n-m) the sum of two primes. incidentally if n+m or n-m are prime then n and m share no common factors.

    i’m partial to the study of the riemann zeta function.

  3. I really should take all I’ve done on GC and submit it somewhere. There are some, well, I think so :), pretty original things I’ve done with it. And I think I was making progress.

    Given that it has been unsolved for … what … 200+ years … that may be a massive illusion on my part.

    I came up with some (in retrospect) easy proofs of some things I thought I needed (adding two odd numbers results in an even number, all primes 3 and greater are odd numbers, and a bunch of other things … call them utility proofs that you bring in to beat the problem into shape.

    The most recent stuff of a few years ago, was I was trying to understand a different theorem on prime number distribution, when something stuck in my mind.

    Could have been that lack of coffee … or beer … I dunno.

Comments are closed.