There's definitely many ways to improve the output of the AI, and provide it extra hints. Also, some AIs are made for a specific use-case. Maybe I should rephrase it and say that those benchmarks are more about the single-reply intelligence of a model, and more like an AGI test then for specific use-cases.