Muse

  • Text-to-image transformer model
  • state-ofthe-art image generation while being more ecient than difusion or autoregressive models
  • it is trained on a masked modelling task in discrete token space
  • more ecient because of the use of discrete tokens and requiring fewer sampling iterations
  • parallel decoding
  • Muse is 10x faster at inference time than Imagen-3B or Parti-3B and 3x faster than Stable Difusion v 1.4
  • Muse is also faster than than Stable Difusion in spite of both models working in the latent space of a VQGAN