Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find the title of the article rather exaggerating...

As of the first difference pointed out in the article, one of the CS224D lectures on word2vec did addressed it:

https://youtu.be/aRqn8t1hLxs?t=2650

It was also mentioned later in the lecture that having two vectors representing each word is meant to make the optimisation easier (so it's kind of a trick); at the end, the two vectors learnt will have to be averaged over in order to reach a single vector for each word.

To be fair, the fact that each word is represented by two vectors was also mentioned in the original paper describing word2vec:

https://arxiv.org/pdf/1310.4546.pdf

On page 3, just beneath equation (2).

Why so surprised?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: