Interpreting Attention

  • Attention Interpretability Across NLP Tasks
  • empirically prove the hypothesis that Attention weights are interpretable and are correlated with feature importance measures
  • n both single and pair sequence tasks, the Attention weights in samples with original weights do make sense in general
  • However, in the former case, the Attention mechanism learns to give higher weights to tokens relevant to both kinds of sentiment.
  • They show that Attention weights in single sequence tasks do not provide a reason for the prediction, which in the case of pairwise tasks, Attention do reflect the reasoning behind model output
  • BertViz repo