While I agree ont he general sentiment of your post, I remember in the case of memcpy that at least the GCC implementation is a "most general case" implementation which tries to have good performance in average usage.
I remember reading about this some time ago when I was doing some tinkering. I read that for example, John Carmack decided to implement his own memcpy tailored for the games they programmed. Additionally, I read about different alternative implementations o memcpy which were better than the standard for specific domains. I even experimented using Microsoft detours to replace the vanilla memcpy with a faster version.
I remember reading about this some time ago when I was doing some tinkering. I read that for example, John Carmack decided to implement his own memcpy tailored for the games they programmed. Additionally, I read about different alternative implementations o memcpy which were better than the standard for specific domains. I even experimented using Microsoft detours to replace the vanilla memcpy with a faster version.