What do you mean it wastes a whole cycle? It may indeed have worse performance due to blowing the instruction cache, but I don’t see why would out-of-order execution be slower on the hot path - I doubt there would be too many hot paths without any dependence on memory fetches outside specific benchmarks - the memory loads will take significantly more time even if they hit cache.