Neural network architectures, scaling laws and transformers
Summary: A video digest of ideas related to neural network architectures, scaling laws and transformers. This material is based on part of a lecture series I gave for the 2021 4F12 engineering course at Cambridge University.
Topics: neural network architectures, scaling laws, transformers
Slides: link (pdf)
References
- H. Helmholtz, “The Facts in Perception” (1878)
- A. M. Turing, “Intelligent Machinery" (1948)
- H. J. Scudder, “Probability of error of some adaptive pattern-recognition machines”, IEEE Trans. Inf. Theory (1965)
- H. B. Barlow, "Unsupervised learning", Neural computation (1989)
- V. de Sa, “Learning Classification with Unlabeled Data”, NeurIPS (1993)
- D. Yarowsky, “Unsupervised Word Sense Disambiguation Rivaling Supervised Methods”, ACL (1995)
- J. Schmidhuber and S. Heil., “Sequential neural text compression”, IEEE Trans. on Neural Networks (1996)
- Y. Bengio et al., “A Neural Probabilistic Language Model”, Journal of Machine Learning Research (2000)
- L. B. Smith, and M. Gasser, “The Development of Embodied Cognition: Six Lessons from Babies”, Artificial Life (2005)
- T. Mikolov et al., “Efficient Estimation of Word Representations in Vector Space”, ICLR (2013)
- C. Doersch et al., “Unsupervised Visual Representation Learning by Context Prediction”, ICCV (2015)
- D. Pathak et al. “Context Encoders: Feature Learning by Inpainting”, CVPR (2016)
- M. Noroozi and P. Favaro, “Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles”, ECCV (2016)
- R. Zhang et al., “Colorful Image Colorization”, ECCV (2016)
- B. Fernando et al., “Self-Supervised Video Representation Learning with Odd-One-Out Networks”, CVPR (2017)
- M. Noroozi et al., “Representation Learning by Learning to Count”, ICCV (2017)
- J. Thewlis et al., “Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings”, ICCV (2017)
- J. Donahue et al., “Adversarial Feature Learning”, ICLR (2017)
- A. Mahendran et al., “Cross Pixel Optical Flow Similarity for Self-Supervised Learning”, ACCV (2018)
- L. B. Smith, et al., “The Developing Infant Creates a Curriculum for Statistical Learning”, Trends in Cognitive Sciences (2018)
- A. van den Oord et al. “Representation Learning with Contrastive Predictive Coding”, arxiv (2018)
- C. Vondrick et al., “Tracking Emerges by Colorizing Videos”, ECCV (2018)
- Z. Wu et al., “Unsupervised Feature Learning via Non-parametric Instance Discrimination”, CVPR (2018)
- S. Gidaris et al., “Unsupervised Representation Learning by Predicting Image Rotations”, ICLR 2018
- M. Caron et al., “Deep Clustering for Unsupervised Learning of Visual Features”, ECCV (2018)
- J. Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, NAACL (2019)
- Q. Xie et al., “Self-Training With Noisy Student Improves ImageNet Classification”, CVPR (2020)
- K. He et al. “Momentum Contrast for Unsupervised Visual Representation Learning”, CVPR (2020)