Vision Transformer Basics

Summary: A lecture covering the basics of vision transformers.
Topics: Vision Transformers
Slides: link (pdf)

References/links

J. A. Fodor and Z. W. Pylyshyn, "Connectionism and cognitive architecture: A critical analysis", Cognition (1988)
D.E. Rumelhart, G. E. Hinton and J. L. McClelland, "A general framework for parallel distributed processing", PDP: Explorations in the microstructure of cognition (1986)
A. Vaswani, et al. "Attention is all you need." Advances in neural information processing systems (2017)
A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”, ICLR (2021)
K. He et al., "Deep residual learning for image recognition", CVPR (2016)
S. Shen et al., "Powernorm: Rethinking batch normalization in transformers", ICML (2020)
J-B Alayrac et al., "Flamingo: a visual language model for few-shot learning", NeurIPS (2022)
D. Amodei and D. Hernandez, "AI and Compute", 2018
D. Hernandez and T. Brown, "Measuring the Algorithmic Efficiency of Neural Networks”, arXiv (2020)
P. Anderson, “More is different”, Science (1972)
R. Hamming, “The Art of Doing Science and Engineering: Learning to Learn” (1997)
J. Kaplan et al., “Scaling Laws for Neural Language Models”, arxiv (2020)
J. Hoffmann et al., "Training Compute-Optimal Large Language Models", arXiv (2022)
M. Raghu et al., "Do vision transformers see like convolutional neural networks?" NeurIPS (2021)