Blogmark
The Illustrated Word2vec
via jbranchaud@gmail.com
Two central ideas behind word2vec are skipgram and negative sampling -- SGNS (Skipgram with Negative Sampling).
We start with random vectors for the embeddings and then we cycle through a bunch of training steps where we compute error values for the results of each step to use as feedback to update the model parameters for all involved embeddings. This process nudges the vectors of words toward or away from others based on their similarity via the error values.
Random notes from the post:
"Continuous Bag of Words" architecture can train against text by using a sliding window that both looks some number of words back and some number of words forward in order to predict the current word. It is described in this Word2Vec paper.
Skipgram architecture is a method of training where instead of using the surrounding words as context, you look at the current word and try to guess the words around it.
In training, you go step by step looking at the mostly likely predicted word produced by your model and then producing an error vector based on what words it should have ranked higher in its prediction. That error vector is then applied to the model to improve its subsequent predictions.
Cosine Similarity is a way to measure how similar two vectors are. For 2D vectors, this would be trivially measured by computing the distance between the two points. However, vectors can be many dimensions. "The good thing is, though, that cosine_similarity still works. It works with any number of dimensions."