Can you expand on what the security concerns are as to confusables in symbol names in C source code? Clearly there's a security concern with cut-n-paste, but that's true regardless of what the rules might be for C identifiers.
It's not like it's obvious that UTR#39 applies literally everywhere that there are "identifiers".
Also, can you speak to what is the security concern with form-insensitivity (rather than confusables) as to symbols in input source files? I just don't see a concern at all there, but maybe I'm missing something.
Lastly, I think `#include` is the most important place to get this right since that does interface with the world outside the compiler (specifically: the filesystem), but as you note the filesystems mostly are just-use-8bit -- very few filesystems normalize on create (HFS+) or are form-insensitive (ZFS). The other place to get this right is on the object file output side, where symbols definitely must be normalized.
Oh, one more thing: the platform might impose some rules regarding symbols in ELF and any other object file formats. Are they known to? I suppose C can't necessarily cater to all platform-imposed limitations on symbol naming, but it'd be useful to know about them.
It's not like it's obvious that UTR#39 applies literally everywhere that there are "identifiers".
Also, can you speak to what is the security concern with form-insensitivity (rather than confusables) as to symbols in input source files? I just don't see a concern at all there, but maybe I'm missing something.
Lastly, I think `#include` is the most important place to get this right since that does interface with the world outside the compiler (specifically: the filesystem), but as you note the filesystems mostly are just-use-8bit -- very few filesystems normalize on create (HFS+) or are form-insensitive (ZFS). The other place to get this right is on the object file output side, where symbols definitely must be normalized.
Oh, one more thing: the platform might impose some rules regarding symbols in ELF and any other object file formats. Are they known to? I suppose C can't necessarily cater to all platform-imposed limitations on symbol naming, but it'd be useful to know about them.