This repository contains an implementation of the Vision Transformer (ViT) model as described in the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale". The Vision Transformer leverages the power of transformers, typically used in NLP, to achieve state-of-the-art results in image classification tasks.
dusky04/vit-pytorch
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|