CBOW
- Continous implementation of Bag of words
- tries to predict the current target word (the center word) based on the source context words (surrounding words)
- “the quick brown fox jumps over the lazy dog”, this can be pairs of (context_window, target_word) where if we consider a context window of size 2, we have examples like ([quick, fox], brown), ([the, brown], quick), ([the, dog], lazy) and so on
- context window

- several times faster to train than the skip-gram, slightly better accuracy for the frequent words.
- CBOW is prone to overfit frequent words because they appear several time along with the same context.
- tends to find the probability of a word occurring in a context
- it generalizes over all the different contexts in which a word can be used
- also a 1-hidden-layer neural network
- The synthetic training task now uses the average of multiple input context words, rather than a single word as in skip-gram, to predict the center word.
- Again, the projection weights that turn one-hot words into averageable vectors, of the same width as the hidden layer, are interpreted as the word embeddings.