Even worse: if simulations are used, you now have two problems - formulating correct incentives and protecting against abusing flaws in the simulation.
Isn’t this true about all systems, not just “AI”? The definition of a software bug is an unintended behavior. In a large system, myriad intents overlap and combine in unexpected ways. You might imagine a complex enough system where the confidence that a modification doesn’t introduce an unintended behavior is near zero.
It’s easy to sort out in narrowly specified areas, but an extremely hard problem as the tasks become more general.