Neural network architectures, scaling laws and transformers


Summary: A video digest of ideas related to neural network architectures, scaling laws and transformers. This material is based on part of a lecture series I gave for the 2021 4F12 engineering course at Cambridge University.
Topics: neural network architectures, scaling laws, transformers
Slides: link (pdf)

References
  • H. Helmholtz, “The Facts in Perception” (1878)
  • A. M. Turing, “Intelligent Machinery" (1948)
  • H. J. Scudder, “Probability of error of some adaptive pattern-recognition machines”, IEEE Trans. Inf. Theory (1965)
  • H. B. Barlow, "Unsupervised learning", Neural computation (1989)
  • V. de Sa, “Learning Classification with Unlabeled Data”, NeurIPS (1993)
  • D. Yarowsky, “Unsupervised Word Sense Disambiguation Rivaling Supervised Methods”, ACL (1995)
  • J. Schmidhuber and S. Heil., “Sequential neural text compression”, IEEE Trans. on Neural Networks (1996)
  • Y. Bengio et al., “A Neural Probabilistic Language Model”, Journal of Machine Learning Research (2000)
  • L. B. Smith, and M. Gasser, “The Development of Embodied Cognition: Six Lessons from Babies”, Artificial Life (2005)
  • T. Mikolov et al., “Efficient Estimation of Word Representations in Vector Space”, ICLR (2013)
  • C. Doersch et al., “Unsupervised Visual Representation Learning by Context Prediction”, ICCV (2015)
  • D. Pathak et al. “Context Encoders: Feature Learning by Inpainting”, CVPR (2016)
  • M. Noroozi and P. Favaro, “Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles”, ECCV (2016)
  • R. Zhang et al., “Colorful Image Colorization”, ECCV (2016)
  • B. Fernando et al., “Self-Supervised Video Representation Learning with Odd-One-Out Networks”, CVPR (2017)
  • M. Noroozi et al., “Representation Learning by Learning to Count”, ICCV (2017)
  • J. Thewlis et al., “Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings”, ICCV (2017)
  • J. Donahue et al., “Adversarial Feature Learning”, ICLR (2017)
  • A. Mahendran et al., “Cross Pixel Optical Flow Similarity for Self-Supervised Learning”, ACCV (2018)
  • L. B. Smith, et al., “The Developing Infant Creates a Curriculum for Statistical Learning”, Trends in Cognitive Sciences (2018)
  • A. van den Oord et al. “Representation Learning with Contrastive Predictive Coding”, arxiv (2018)
  • C. Vondrick et al., “Tracking Emerges by Colorizing Videos”, ECCV (2018)
  • Z. Wu et al., “Unsupervised Feature Learning via Non-parametric Instance Discrimination”, CVPR (2018)
  • S. Gidaris et al., “Unsupervised Representation Learning by Predicting Image Rotations”, ICLR 2018
  • M. Caron et al., “Deep Clustering for Unsupervised Learning of Visual Features”, ECCV (2018)
  • J. Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, NAACL (2019)
  • Q. Xie et al., “Self-Training With Noisy Student Improves ImageNet Classification”, CVPR (2020)
  • K. He et al. “Momentum Contrast for Unsupervised Visual Representation Learning”, CVPR (2020)