I find the title of the article rather exaggerating... As of the first differenc...

I find the title of the article rather exaggerating...

As of the first difference pointed out in the article, one of the CS224D lectures on word2vec did addressed it:

It was also mentioned later in the lecture that having two vectors representing each word is meant to make the optimisation easier (so it's kind of a trick); at the end, the two vectors learnt will have to be averaged over in order to reach a single vector for each word.

To be fair, the fact that each word is represented by two vectors was also mentioned in the original paper describing word2vec:

https://arxiv.org/pdf/1310.4546.pdf

On page 3, just beneath equation (2).

Why so surprised?