Galactica

  • new large model for automatically organizing science developed by Meta AI and Papers with Code
  • ability to train on it for multiple epochs without overfitting, where upstream and downstream performance improves with use of repeated token
  • The dataset design is critical to the approach as all of it is processed in a common markdown format to blend knowledge between sources.
  • Citations are processed via a certain token that allows researchers to predict a citation given any input context
  • The capability of the model of predicting citations improves with scale and the model becomes better at the distribution of citations
  • the model can perform multi-modal tasks involving SMILES chemical formulas and protein sequences
  • transformer architecture in a decoder-only setup with GeLU activation for all model sizes.