I think in some of your examples the global optimum might also have been the correct behaviour, it's just that the program failed to find it. For example the robot learning to use a hammer. It's hard to believe that throwing the hammer was just as good as using it properly.