Subhaditya's KB

❯

❯

CvT

Sep 18, 20241 min read

temp

toc: true title: CvT

tags: [‘temp’]

CvT

CvT: Introducing Convolutions to Vision Transformers
- improves Vision Transformer
- introducing Conv
- a hierarchy of Transformers containing a new convolutional token Embedding
- convolutional Transformer block leveraging a convolutional projection
- shift, scale, and distortion invariance
- dynamic Attention , global context, and better generalization
- ImageNet
- Position Encoding , a crucial component in existing Vision Transformers, can be safely removed in our model
- potential advantage for adaption
- built-in local context structure introduced by convolutions, CvT no longer requires a position Embedding

Graph View

Backlinks

No backlinks found

Created with Quartz v4.3.1 © 2025

GitHub