Ok, it hit me today. I know what I want, or at least in part, in a language. I do not want to write loops. I want to write something like this:

range: i=1 .. N;
a[i] = b[i]+c*d[i];

You don’t see any explicit “for” loops. No explicit control structures. The rationale I have for this is that without an explicit set of control structures, the compiler is freer to transform the code to match the underlying machine architecture. Let the compiler emit the MPI calls, the OpenMP calls, and so on.
Look at it this way: when we write out a matrix multiplication in terms of the mathematical symbols, precisely how many “for” loops do you see? You know implicitly, that you have 3 such loops. This is because we have 4 sets of indicies, and one set are paired.

  A_i,j = B_i,k * C_k,j

This repeated index represents a reduction operation over that index, that is you reduce the rank of that tensor by 1. Einstein used to use this notation for his tensoral equations (general relativity). Makes the expression of the computation more compact, preserving the structure, while implicitly using specific rules.
That is, he developed an ASL, an application specific language.
To a degree, this is sort of what Matlab/Octave do. You still have explicit control structures though, and there isn’t enough intelligence in the language to handle what I am talking about.
The aspect I find interesting for programming is that a compiler might be able to emit the requisite parallel code if we remove some of the explicit control structures that could have side effects. Remember, every program line can have side effects, so a compiler has to be careful in removing/rewriting code for optimization. If the code isn’t there in the first place, it can’t have side effects. The side effect of

is that a variable i will wind up with a value of N at the end of the loop. Moreover, if N is small and we have a parallel region in OpenMP, then this code would operate inefficiently.
Worth thinking more about it.