The best API for parallel programming is …

Loaded question. OpenMP may be the simplest to work with. MPI is not. The differences are that OpenMP is integrated as a set of compiler hints and is restricted to shared memory machines. MPI are explicit calls to user level communication routines, that handle data motion for you, you simply point at what to move.

While I wish it were that simple in terms of the differences, there are other major ones. In OpenMP, you don’t have to change the way your program runs. The same flow can be maintained. In MPI, you need to alter your program to explicitly move data back and forth. In OpenMP, your program can operate without threading by simply omitting the compile time option to invoke OpenMP. In MPI, you would need to refactor your code.
Yes there are others, Linda, TCGMSG, PVM, threads/pthreads/… . All of them have a variety of issues. OpenMP is IMO the simplest.
Then you have UPC and friends which subtly alter the C language for parallel execution. HPF. And others.
Now we are about the get the HPCS languages. Chapel looks good. X10 looks like a Java clone, which means that it is overly verbose, and simple things are made harder than they need to be. Fortress … not sure what to think about it. My original read on it about a year ago was in solving a problem that did not need to be solved. Relooking at it, it appears I was looking at something else.
If parallelism is implicit, and easy to use, regardless of distributed memory or shared memory, I think it would be better for all concerned. I would be open to seeing more of the Intel OpenMP for clusters. Alas, I don’t have the hefty chunk of disposable change to play with that. Would be fun though, as that is the model that makes the most sense to me without changing languages.