There's definitely many ways to improve the output of the AI, and provide it ext...

There's definitely many ways to improve the output of the AI, and provide it extra hints. Also, some AIs are made for a specific use-case. Maybe I should rephrase it and say that those benchmarks are more about the single-reply intelligence of a model, and more like an AGI test then for specific use-cases.