And all that is kind of irrelevant, because if LLMs were human-level general intelligence, they would solve all these questions correctly without blinking.
No human would score high on that puzzle if the images were given to them as a series of tokens. Even previous LLMs scored much better than humans if tested in the same way.
And most humans would do well on maths problems if the input was given to them as binary. The reason that reversal isn't important is that the Tokens are an implementation detail for how an AI is meant to solve real world problems that humans face while noone cares about humans solving tokens.
Humans communicate with each other to get things done. We have to think carefully how we communicate with each other given the shortcomings of humans and shortcomings of different communication mediums.
The fact that we might need to be mindful of how we communicate with a person/system/whatever doesn't mean too much in the context of AI. Just like humans, the details of how they work will need to be considered, and the standard trope of "that's an implementation detail" won't work.
But they don't. Not even the best ones.