Vision Transformer Basics



Summary: A lecture covering the basics of vision transformers.
Topics: Vision Transformers
Slides: link (pdf)


References/links
  • J. A. Fodor and Z. W. Pylyshyn, "Connectionism and cognitive architecture: A critical analysis", Cognition (1988)
  • D.E. Rumelhart, G. E. Hinton and J. L. McClelland, "A general framework for parallel distributed processing", PDP: Explorations in the microstructure of cognition (1986)
  • A. Vaswani, et al. "Attention is all you need." Advances in neural information processing systems (2017)
  • A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”, ICLR (2021)
  • K. He et al., "Deep residual learning for image recognition", CVPR (2016)
  • S. Shen et al., "Powernorm: Rethinking batch normalization in transformers", ICML (2020)
  • J-B Alayrac et al., "Flamingo: a visual language model for few-shot learning", NeurIPS (2022)
  • D. Amodei and D. Hernandez, "AI and Compute", 2018
  • D. Hernandez and T. Brown, "Measuring the Algorithmic Efficiency of Neural Networks”, arXiv (2020)
  • P. Anderson, “More is different”, Science (1972)
  • R. Hamming, “The Art of Doing Science and Engineering: Learning to Learn” (1997)
  • J. Kaplan et al., “Scaling Laws for Neural Language Models”, arxiv (2020)
  • J. Hoffmann et al., "Training Compute-Optimal Large Language Models", arXiv (2022)
  • M. Raghu et al., "Do vision transformers see like convolutional neural networks?" NeurIPS (2021)