RETRO
- Improving Language Models by Retrieving from Trillions of Tokens
- Retrieval-Enhanced Transformer
- RETRO
- enhances auto-regressive language models by conditioning on document chunks retrieved from a large corpus
- based on local similarity with preceding tokens
- comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25x fewer parameters
- frozen BERT retriever, a differentiable encoder and a chunked cross-Attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training
- Wikitext103
- Pile
- improving semi-parametric language models through explicit memory can provide an orthogonal, more efficient approach than raw parameter scaling as they seek to build more powerful language models