Language Models are Few-shot Learners (GPT-3)



Summary: A video description of the paper entitled "Language Models are Few-shot Learners" by T. Brown et al. published at NeurIPS in 2020.
Paper: The paper can be found on arxiv here.
Topics: language models, foundation models, GPT-3, scaling
Slides: link (pdf)

References
  • S. Carey et al., "Acquiring a single new word", ERIC (1978)
  • M. Marcus et al., "The Penn treebank: Annotating predicate argument structure", HLT Workshop (1994)
  • S. Hochreiter et al., "Learning to learn using gradient descent", ICANN (2001)
  • E. Loper et al., "NLTK: The natural language toolkit", arxiv (2002)
  • P. Turney et al., "Combining independent modules to solve multiple-choice synonym and analogy problems", arxiv (2003)
  • P. Turney et al., "Corpus-based learning of analogies and semantic relations", Machine Learning (2005)
  • P. Norvig, "Natural language corpus data", Beautiful data (2009)
  • S. Baccianella et al., "Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining", LREC (2010)
  • M. Roemmele et al., "Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning", AAAI symposium (2011)
  • R. Ross, "Guide for Conducting Risk Assessments", Special Publication NIST SP (2012)
  • H. Levesque et al., "The Winograd schema challenge", KR (2012)
  • T. Mikolov et al., "Efficient estimation of word representations in vector space", arxiv (2013)
  • J. Berant et al., "Semantic parsing on freebase from question-answer pairs", EMNLP (2013)
  • J. Pennington et al., "Glove: Global vectors for word representation", EMNLP (2014)
  • N. Durrani et al., "Edinburgh’s phrase-based machine translation systems for WMT-14", WMT (2014)
  • A. Dai et al., "Semi-supervised sequence learning", NeurIPS (2015)
  • D. P. Kingma et al., "Adam: A method for stochastic optimization", ICLR (2015)
  • R. Sennrich et al., "Improving neural machine translation models with monolingual data", (2015)
  • G. Hinton et al., "Distilling the knowledge in a neural network", arxiv (2015)
  • O. Vinyals et al., "Matching networks for one shot learning", NeurIPS (2016)
  • K. He et al., "Identity mappings in deep residual networks", ECCV (2016)
  • D. Paperno et al., "The LAMBADA dataset: Word prediction requiring a broad discourse context", ACL (2016)
  • J. Ba, "Layer Normalization", arxiv (2016)
  • N. Mostafazadeh et al., "A corpus and cloze evaluation for deeper understanding of commonsense stories", NAACL HLT (2016)
  • A. Vaswani et al., "Attention is all you need", NeurIPS (2017)
  • G. Lai et al., "RACE: Large-scale ReAding Comprehension Dataset From Examinations", EMNLP (2017)
  • M. Joshi et al., "TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension", ACL (2017)
  • I. Loshchilov et al., "Decoupled weight decay regularization", arxiv (2017)
  • K. Crawford, "The trouble with bias", NeurIPS (2017)
  • D. Amodei et al., "AI and Compute", https://openai.com/blog/ai-and-compute/ (2018)
  • S. Gururangan et al., "Annotation artifacts in natural language inference data", arxiv (2018)
  • A. Radford et al. "Improving language understanding by generative pre-training" (2018)
  • E. Choi et al., "QuAC: Question Answering in Context", EMNLP (2018)
  • S. McCandlish et al., "An empirical model of large-batch training", arxiv (2018)
  • P. Clark et al., "Think you have solved question answering? try arc, the ai2 reasoning challenge" arxiv (2018)
  • T. Mihaylov et al., "Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering", EMNLP (2018)
  • S. Edunov et al., "Understanding Back-Translation at Scale", EMNLP (2018)
  • T. Trinh et al., "A simple method for commonsense reasoning", arxiv (2018)
  • P. Rajpurkar et al., "Know What You Don’t Know: Unanswerable Questions for SQuAD", ACL (2018)
  • S. Zhang et al., "Record: Bridging the gap between human and machine commonsense reading comprehension", arxiv (2018)
  • A. Wang et al., "GLUE: A multi-task benchmark and analysis platform for natural language understanding", ICLR (2018)
  • D. Khashabi et al., "Looking beyond the surface: A challenge set for reading comprehension over multiple sentences", NAACL-HLT (2018)
  • R. Rudinger et al., "Gender bias in coreference resolution", arxiv (2018)
  • M. Mitchell et al., "Model cards for model reporting", FAccT (2018)
  • Y. Qian et al., "Reducing gender bias in word-level language models with a gender-equalizing loss function", arxiv (2019)
  • B. McCann et al., "The natural language decathlon: Multitask learning as question answering", arxiv (2018)
  • A. Radford et al., "Language models are unsupervised multitask learners", OpenAI (2019)
  • J. Devlin et al., "Bert: Pre-training of deep bidirectional transformers for language understanding", NAACL-HLT (2019)
  • T. Kwiatkowski et al., "Natural questions: a benchmark for question answering research", ACL (2019)
  • M. Shoeybi et al., "Megatron-lm: Training multi-billion parameter language models using model parallelism", arxiv (2019)
  • S. Reddy et al. "CoQA: A conversational question answering challenge", ACL (2019)
  • R. Child et al., "Generating long sequences with sparse transformers", arxiv (2019)
  • A. Wang et al., "SuperGLUE: A stickier benchmark for general-purpose language understanding systems", NeurIPS (2019)
  • R. Zellers et al., "HellaSwag: Can a Machine Really Finish Your Sentence?" ACL (2019)
  • Y. Liu et al., "RoBERTa: A robustly optimized bert pretraining approach", arxiv (2019)
  • Z. Li, "Story ending prediction by transferable BERT", arxiv (2019)
  • Z. Lan et al., "ALBERT: A lite BERT for self-supervised learning of language representations", arxiv (2019)
  • Y. Wang et al., "Multi-agent dual learning", ICLR (2019)
  • K. Song et al., "MASS: Masked sequence to sequence pre-training for language generation", ICML (2019)
  • A. Conneau et al., "Cross-lingual language model pretraining", NeurIPS (2019)
  • Y. Ju, et al. "Technical report on conversational question answering", arxiv (2019)
  • D. Dua et al., "DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs", NAACL-HLT (2019)
  • M. Pilehvar, "WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations", NAACL-HLT (2019)
  • C. Clark et al., "BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions", NAACL-HLT (2019)
  • M-C. De Marneffe et al., "The commitmentbank: Investigating projection in naturally occurring discourse", Sinn und Bedeutung (2019)
  • R. Zellers et al., "Defending against neural fake news", NeurIPS (2019)
  • D. Ippolito et al., "Automatic detection of generated text is easiest when humans are fooled", ACL (2019)
  • S. Gehrmann et al. "GLTR: Statistical detection and visualization of generated text", ACL (2019)
  • A. Holtzman et al., "The Curious Case of Neural Text Degeneration", ICLR (2019)
  • X. Liu et al., "Improving multi-task deep neural networks via knowledge distillation for natural language understanding", arxiv (2019)
  • I. Solaiman et al., "Release strategies and the social impacts of language models", arxiv (2019)
  • P-S. Huang et al., "Reducing Sentiment Bias in Language Models via Counterfactual Evaluation", EMNLP (2020)
  • D. Hernandez et al., "Measuring the algorithmic efficiency of neural networks", arxiv (2020)
  • R. Schwartz et al., "Green AI", Communications of the ACM (2020)
  • C. Raffel et al., "Exploring the limits of transfer learning with a unified text-to-text transformer", JMLR (2020)
  • T. Brown et al., "Language models are few-shot learners", NeurIPS (2020)
  • J. Kaplan et al., "Scaling laws for neural language models", arxiv (2020)
  • Y. Nie et al. "Adversarial NLI: A new benchmark for natural language understanding", ACL (2020)
  • Y. Bisk et al., "PiQA: Reasoning about physical commonsense in natural language", AAAI (2020)
  • Y. Bisk et al. "Experience grounds language", arxiv (2020)
  • X. Liu et al., "Adversarial training for large neural language models", arxiv (2020)
  • A. Roberts, "How much knowledge can you pack into the parameters of a language model?", arxiv (2020)
  • P. Lewis et al., "Retrieval-augmented generation for knowledge-intensive nlp tasks", NeurIPS (2020)
  • Y. Liu et al., "Multilingual denoising pre-training for neural machine translation", ACL (2020)
  • S-C. Lin et al., "Tttttackling winogrande schemas", arxiv (2020)
  • D. Khashabi et al. "UnifiedQA: Crossing Format Boundaries with a Single QA System", EMNLP (2020)
  • J. Zheng, "Numeric Transformer - ALBERT", AI2 leaderboard (2020)
  • K. Guu et al,. "REALM: Retrieval-Augmented Language Model Pretraining" arxiv (2020)
  • P-S. Huang et al., "Reducing Sentiment Bias in Language Models via Counterfactual Evaluation", EMNLP (2020)
  • K. Sakaguchi et al., "Winogrande: An adversarial winograd schema challenge at scale", Communications of the ACM (2021)
  • A. Radford et al., "Learning transferable visual models from natural language supervision", ICML (2021)
  • S. Kreps et al., "All the news that’s fit to fabricate: AI-generated text as a tool of media misinformation", JEPS (2022)