There is no illusion of pitch, and it is a common misconception that the fundamental frequency must be present in a tone. Pitch is the perceived periodicity of a tone, which is roughly the greatest common divisor of the harmonics.
If perceived pitch without fundamental is considered an auditory illusion with, common pitch detection techniques should fail if the fundamental is not present, but they work quite well in the absence of the fundamental. So either there is no illusion of pitch or algorithms have illusions too.
consider 2 ideal harmonic notes with a frequency ratio of 3:2, say 3kHz and 2kHz ... The brain / algorithm must doubt between interpreting the collection of frequency peaks at m * 3kHz, and n * 2kHz as either (occasionally overlapping) harmonics of 2 notes at 2kHz and 3kHz, OR it could interpret this as harmonics of a single note at 1kHz (as you say the GCD of the frequencies).
There is inherent ambiguity between interpreting as 2 notes of each a timbre, vs interpreting as 1 note with another timbre...
One could physically construct 3 bowed strings with modekilling on the 1kHz string, such that these could make perceptually identical sounds whether the 2kHz and 3kHz strings are played simultaneously vs the 1kHz string.
at that point from the sound alone one can not discern in an ABX test which is the case, neither a human brain nor any algorithm. The doubt forces to guess (deterministically or not).
The sound is a projection of properties occuring in reality, and loses information.
Tones and harmonics get clustered into pitches, e.g. mistuned harmonics as seen in bass guitar or piano still get decoded into pitches via some sort of best match if the mistuning does not exceed certain percentage. And it works even if some harmonics disappear and reappear.
this is correct, and the reason we are tolerant is because of dispersion: even though the different harmonics are present on the same string of the same length, the resonant frequencies don't need to be integer multiples of the fundamantal since waves of different frequency have different propagation speeds on the string.
in the case of bowed strings mode-locking ensures the phases of all the harmmonics are reset each cycle (the bow sticks and slips), so that bowed instruments can be played harmonically to parts per billion.
since a lot of sounds are plucking we must be tolerant for frequency dependent propagations speeds in regular strings / media
In the case of either the 2 strings being bowed vs the 1 string, there is an actual underlying reality, that can not be deduced from the limited information available in the sound, so any guess risks being an illusion (with probability 50%).
Assuming we agree that "illusion" merely means mismatch between interpretation and reality.
yes that is a second type of ambiguity, and it does occur in audio as well:
an lower frequenncy sinusoidally amplitude modulated higher frequency sinusoid can be indistinguishable from 2 constant amplitude sinusoids at the sum and difference of the frequencies.
see an article by Plomp and Leveldt for the determination of the bandwidth of the auditory frequency bins (or filter bank)
Here is a video I enjoyed that explains the basics of how our sense of sound works. It makes it easier to understand why some of the illusions happen. https://vimeo.com/147902575
Typo in the video (I think, not an anatomist), it's "basilar" membrane, not "vasilar." Awesome video though, I wish my speech processing professor had used that instead of teaching hearing like a filterbank, even if that is how we needed to understand it.
Another weird thing about hearing: the hairs that vibrate aren't just tuned to particular frequencies, they actually vibrate over a range, and the response isn't symmetric (although iirc, part of that is from the fact the hairs are mechanically coupled). That's why low frequency noise masks high frequency noise more than vice versa, which is exploited in lossy codecs (if there's low frequency energy, you don't need high frequency energy that it masks).