So you're saying that a given 4k source compressed to Xmbit will look better if it's not downscaled to 1080p before compression, but instead downscaled by the display (I don't believe there are any phones with a 2160 line display)?
What they're saying is that a 1080p source upscaled to 4K and then uploaded to youtube and then downscaled by the client to a 1080p screen will look better than a 1080p upload because it has more bits.
But doesn't YouTube negotiate a "channel sized" rate? It seems like it switches between bitrates based on what you can tolerate. All of this is to say that it seems like a tough-to-nail-down metric to evaluate.
Of course it is, a 1080p video at 10mbit will almost always look better than a 4k video at 1mbit, and a 4k video at 10mbit will almost always look better than a 1080p video at 1mbit.
The far more interesting question is if you accept you have Xmbit to play with, what is better on a given platform (screen size, resolution, viewing situation, how well compression works, how much battery is used in decoding, etc)
And bitrate goes up for 4k tiers. Even if bitrate only doubles for 4x the amount of pixels, that'll probably look better with modern codecs (depending on where in the quantization curve you are), and most services use at least 3x the bitrate for 4k over 2k, across the same codec.
Compressed by what? Are you just talking about videos you record on your phone?
I do not know what phone you have but an iPhone 12 ( as an example ) is under 1300 pixels on the short dimension. So you are not getting much more than 1080 pixels no matter what. It seems any experiential difference would have more to do with compression quality than with resolution.
Speaking for myself, I do not think I could tell the difference between 4K and 1080p on a phone ( on a decent AV1 clip ).