While that's true, the tokenizer is half the problem. The important fault demonstrated is it doesn't _know_ it can't see the letters, and won't express this unless it has been trained or instructed to. "I can't see letters through the tokenizer" never appears in a corpus of human writing.