Completely eliminating shared-memory multiprocessing in userspace seems like a non-starter to me. Message passing is great, but there are plenty of cases where it's slower by many orders of magnitude. A good example is sparse matrix solvers; there's no way to get any kind of acceptable performance if you can't share the underlying data.