empirically prove the hypothesis that Attention weights are interpretable and are correlated with feature importance measures
n both single and pair sequence tasks, the Attention weights in samples with original weights do make sense in general
However, in the former case, the Attention mechanism learns to give higher weights to tokens relevant to both kinds of sentiment.
They show that Attention weights in single sequence tasks do not provide a reason for the prediction, which in the case of pairwise tasks, Attention do reflect the reasoning behind model output