Galactica
- new large model for automatically organizing science developed by Meta AI and Papers with Code
- ability to train on it for multiple epochs without overfitting, where upstream and downstream performance improves with use of repeated token
- The dataset design is critical to the approach as all of it is processed in a common markdown format to blend knowledge between sources.
- Citations are processed via a certain token that allows researchers to predict a citation given any input context
- The capability of the model of predicting citations improves with scale and the model becomes better at the distribution of citations
- the model can perform multi-modal tasks involving SMILES chemical formulas and protein sequences
- transformer architecture in a decoder-only setup with GeLU activation for all model sizes.