냥냥파워

ViT for Image Classification (Visual Transformer) 본문

POSTECH AI연구원 인턴

ViT for Image Classification (Visual Transformer)

hjhjhj0028 2022. 6. 21. 09:20

https://github.com/FrancescoSaverioZuppichini/ViT

 

GitHub - FrancescoSaverioZuppichini/ViT: Implementing Vi(sion)T(transformer)

Implementing Vi(sion)T(transformer). Contribute to FrancescoSaverioZuppichini/ViT development by creating an account on GitHub.

github.com

https://arxiv.org/abs/2010.11929

 

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to rep

arxiv.org

https://viso.ai/deep-learning/vision-transformer-vit/

 

Vision Transformers (ViT) in Image Recognition - 2022 Guide - viso.ai

Vision Transformers (ViT) brought recent breakthroughs in Computer Vision achieving state-of-the-art accuracy with better efficiency.

viso.ai

 

Recently, Vision Transformers (ViT) have achieved highly competitive performance in benchmarks for several computer vision applications, such as image classification, object detection, and semantic image segmentation.

 

https://github.com/google-research/vision_transformer

 

GitHub - google-research/vision_transformer

Contribute to google-research/vision_transformer development by creating an account on GitHub.

github.com

 

The ViT models were pre-trained on the ImageNet and ImageNet-21k datasets.

  1. Split an image into patches (fixed sizes)
  2. Flatten the image patches
  3. Create lower-dimensional linear embeddings from these flattened image patches
  4. Include positional embeddings
  5. Feed the sequence as an input to a state-of-the-art transformer encoder
  6. Pre-train the ViT model with image labels, which is then fully supervised on a big dataset
  7. Fine-tune on the downstream dataset for image classification

BIG