Vision Transformer Basics
Summary: A lecture covering the basics of vision transformers.
Topics: Vision Transformers
Slides: link (pdf)
References/links
- J. A. Fodor and Z. W. Pylyshyn, "Connectionism and cognitive architecture: A critical analysis", Cognition (1988)
- D.E. Rumelhart, G. E. Hinton and J. L. McClelland, "A general framework for parallel distributed processing", PDP: Explorations in the microstructure of cognition (1986)
- A. Vaswani, et al. "Attention is all you need." Advances in neural information processing systems (2017)
- A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”, ICLR (2021)
- K. He et al., "Deep residual learning for image recognition", CVPR (2016)
- S. Shen et al., "Powernorm: Rethinking batch normalization in transformers", ICML (2020)
- J-B Alayrac et al., "Flamingo: a visual language model for few-shot learning", NeurIPS (2022)
- D. Amodei and D. Hernandez, "AI and Compute", 2018
- D. Hernandez and T. Brown, "Measuring the Algorithmic Efficiency of Neural Networks”, arXiv (2020)
- P. Anderson, “More is different”, Science (1972)
- R. Hamming, “The Art of Doing Science and Engineering: Learning to Learn” (1997)
- J. Kaplan et al., “Scaling Laws for Neural Language Models”, arxiv (2020)
- J. Hoffmann et al., "Training Compute-Optimal Large Language Models", arXiv (2022)
- M. Raghu et al., "Do vision transformers see like convolutional neural networks?" NeurIPS (2021)