Neural network architectures, scaling laws and transformers

Summary: A video digest of ideas related to neural network architectures, scaling laws and transformers. This material is based on part of a lecture series I gave for the 2021 4F12 engineering course at Cambridge University.
Topics: neural network architectures, scaling laws, transformers
Slides: link (pdf)

References

H. Helmholtz, “The Facts in Perception” (1878)
A. M. Turing, “Intelligent Machinery" (1948)
H. J. Scudder, “Probability of error of some adaptive pattern-recognition machines”, IEEE Trans. Inf. Theory (1965)
H. B. Barlow, "Unsupervised learning", Neural computation (1989)
V. de Sa, “Learning Classification with Unlabeled Data”, NeurIPS (1993)
D. Yarowsky, “Unsupervised Word Sense Disambiguation Rivaling Supervised Methods”, ACL (1995)
J. Schmidhuber and S. Heil., “Sequential neural text compression”, IEEE Trans. on Neural Networks (1996)
Y. Bengio et al., “A Neural Probabilistic Language Model”, Journal of Machine Learning Research (2000)
L. B. Smith, and M. Gasser, “The Development of Embodied Cognition: Six Lessons from Babies”, Artificial Life (2005)
T. Mikolov et al., “Efficient Estimation of Word Representations in Vector Space”, ICLR (2013)
C. Doersch et al., “Unsupervised Visual Representation Learning by Context Prediction”, ICCV (2015)
D. Pathak et al. “Context Encoders: Feature Learning by Inpainting”, CVPR (2016)
M. Noroozi and P. Favaro, “Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles”, ECCV (2016)
R. Zhang et al., “Colorful Image Colorization”, ECCV (2016)
B. Fernando et al., “Self-Supervised Video Representation Learning with Odd-One-Out Networks”, CVPR (2017)
M. Noroozi et al., “Representation Learning by Learning to Count”, ICCV (2017)
J. Thewlis et al., “Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings”, ICCV (2017)
J. Donahue et al., “Adversarial Feature Learning”, ICLR (2017)
A. Mahendran et al., “Cross Pixel Optical Flow Similarity for Self-Supervised Learning”, ACCV (2018)
L. B. Smith, et al., “The Developing Infant Creates a Curriculum for Statistical Learning”, Trends in Cognitive Sciences (2018)
A. van den Oord et al. “Representation Learning with Contrastive Predictive Coding”, arxiv (2018)
C. Vondrick et al., “Tracking Emerges by Colorizing Videos”, ECCV (2018)
Z. Wu et al., “Unsupervised Feature Learning via Non-parametric Instance Discrimination”, CVPR (2018)
S. Gidaris et al., “Unsupervised Representation Learning by Predicting Image Rotations”, ICLR 2018
M. Caron et al., “Deep Clustering for Unsupervised Learning of Visual Features”, ECCV (2018)
J. Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, NAACL (2019)
Q. Xie et al., “Self-Training With Noisy Student Improves ImageNet Classification”, CVPR (2020)
K. He et al. “Momentum Contrast for Unsupervised Visual Representation Learning”, CVPR (2020)

Samuel Albanie