That is to say, knowledge about feature embedding is transferred using Attention map functions. Unlike the Attention maps, a different attentive knowledge distillation method was proposed by Song et al. (2018). An Attention mechanism is used to assign different confidence rules (Song et al., 2018).