toc: true title: Swin Transformer
tags: [‘temp’]
Swin Transformer
- Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
- Vision Transformer
- general-purpose backbone for computer vision
- hierarchical feature representation
- linear computational complexity with respect to input image size
- shifted window based Self Attention
- address the challenges in adapting Transformer from language to vision
- limiting self-Attention computation to non-overlapping local windows while also allowing for cross-window connection
- flexibility to model at various scales
- linear computational complexity with respect to image size
- ImageNet
- COCO
- ADE20K
- The hierarchical design and the shifted window approach also prove beneficial for all Perception Architectures.
- Ratio of 1:1:3:1