Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning
collection of auto-regressive/decoder-only pre-trained transformer-based language models ranging in size from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers
replicate the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data curation and training efficiency
OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop