Large Kernel in Attention

  • self-attention can be viewed as a global depth-wise kernel that enables each layer to have a global receptive field.
  • Swin Transformer (Liu et al., 2021e) is a ViTs variant that adopts local attention with a shifted window manner
  • greatly improve the memory and computation efficiency with appealing performance
  • Since the size of attention windows is at least 7, it can be seen as an alternative class of large kernel
  • recent work (Guo et al., 2022b) proposes a novel large kernel attention module that
  • uses stacked depthwise, small convolution, dilated convolution as well as pointwise convolution to capture both local and global structure