Language Models are Few-shot Learners (GPT-3)
Summary: A video description of the paper entitled "Language Models are Few-shot Learners" by T. Brown et al. published at NeurIPS in 2020.
Paper: The paper can be found on arxiv here.
Topics: language models, foundation models, GPT-3, scaling
Slides: link (pdf)
References
- S. Carey et al., "Acquiring a single new word", ERIC (1978)
- M. Marcus et al., "The Penn treebank: Annotating predicate argument structure", HLT Workshop (1994)
- S. Hochreiter et al., "Learning to learn using gradient descent", ICANN (2001)
- E. Loper et al., "NLTK: The natural language toolkit", arxiv (2002)
- P. Turney et al., "Combining independent modules to solve multiple-choice synonym and analogy problems", arxiv (2003)
- P. Turney et al., "Corpus-based learning of analogies and semantic relations", Machine Learning (2005)
- P. Norvig, "Natural language corpus data", Beautiful data (2009)
- S. Baccianella et al., "Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining", LREC (2010)
- M. Roemmele et al., "Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning", AAAI symposium (2011)
- R. Ross, "Guide for Conducting Risk Assessments", Special Publication NIST SP (2012)
- H. Levesque et al., "The Winograd schema challenge", KR (2012)
- T. Mikolov et al., "Efficient estimation of word representations in vector space", arxiv (2013)
- J. Berant et al., "Semantic parsing on freebase from question-answer pairs", EMNLP (2013)
- J. Pennington et al., "Glove: Global vectors for word representation", EMNLP (2014)
- N. Durrani et al., "Edinburgh’s phrase-based machine translation systems for WMT-14", WMT (2014)
- A. Dai et al., "Semi-supervised sequence learning", NeurIPS (2015)
- D. P. Kingma et al., "Adam: A method for stochastic optimization", ICLR (2015)
- R. Sennrich et al., "Improving neural machine translation models with monolingual data", (2015)
- G. Hinton et al., "Distilling the knowledge in a neural network", arxiv (2015)
- O. Vinyals et al., "Matching networks for one shot learning", NeurIPS (2016)
- K. He et al., "Identity mappings in deep residual networks", ECCV (2016)
- D. Paperno et al., "The LAMBADA dataset: Word prediction requiring a broad discourse context", ACL (2016)
- J. Ba, "Layer Normalization", arxiv (2016)
- N. Mostafazadeh et al., "A corpus and cloze evaluation for deeper understanding of commonsense stories", NAACL HLT (2016)
- A. Vaswani et al., "Attention is all you need", NeurIPS (2017)
- G. Lai et al., "RACE: Large-scale ReAding Comprehension Dataset From Examinations", EMNLP (2017)
- M. Joshi et al., "TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension", ACL (2017)
- I. Loshchilov et al., "Decoupled weight decay regularization", arxiv (2017)
- K. Crawford, "The trouble with bias", NeurIPS (2017)
- D. Amodei et al., "AI and Compute", https://openai.com/blog/ai-and-compute/ (2018)
- S. Gururangan et al., "Annotation artifacts in natural language inference data", arxiv (2018)
- A. Radford et al. "Improving language understanding by generative pre-training" (2018)
- E. Choi et al., "QuAC: Question Answering in Context", EMNLP (2018)
- S. McCandlish et al., "An empirical model of large-batch training", arxiv (2018)
- P. Clark et al., "Think you have solved question answering? try arc, the ai2 reasoning challenge" arxiv (2018)
- T. Mihaylov et al., "Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering", EMNLP (2018)
- S. Edunov et al., "Understanding Back-Translation at Scale", EMNLP (2018)
- T. Trinh et al., "A simple method for commonsense reasoning", arxiv (2018)
- P. Rajpurkar et al., "Know What You Don’t Know: Unanswerable Questions for SQuAD", ACL (2018)
- S. Zhang et al., "Record: Bridging the gap between human and machine commonsense reading comprehension", arxiv (2018)
- A. Wang et al., "GLUE: A multi-task benchmark and analysis platform for natural language understanding", ICLR (2018)
- D. Khashabi et al., "Looking beyond the surface: A challenge set for reading comprehension over multiple sentences", NAACL-HLT (2018)
- R. Rudinger et al., "Gender bias in coreference resolution", arxiv (2018)
- M. Mitchell et al., "Model cards for model reporting", FAccT (2018)
- Y. Qian et al., "Reducing gender bias in word-level language models with a gender-equalizing loss function", arxiv (2019)
- B. McCann et al., "The natural language decathlon: Multitask learning as question answering", arxiv (2018)
- A. Radford et al., "Language models are unsupervised multitask learners", OpenAI (2019)
- J. Devlin et al., "Bert: Pre-training of deep bidirectional transformers for language understanding", NAACL-HLT (2019)
- T. Kwiatkowski et al., "Natural questions: a benchmark for question answering research", ACL (2019)
- M. Shoeybi et al., "Megatron-lm: Training multi-billion parameter language models using model parallelism", arxiv (2019)
- S. Reddy et al. "CoQA: A conversational question answering challenge", ACL (2019)
- R. Child et al., "Generating long sequences with sparse transformers", arxiv (2019)
- A. Wang et al., "SuperGLUE: A stickier benchmark for general-purpose language understanding systems", NeurIPS (2019)
- R. Zellers et al., "HellaSwag: Can a Machine Really Finish Your Sentence?" ACL (2019)
- Y. Liu et al., "RoBERTa: A robustly optimized bert pretraining approach", arxiv (2019)
- Z. Li, "Story ending prediction by transferable BERT", arxiv (2019)
- Z. Lan et al., "ALBERT: A lite BERT for self-supervised learning of language representations", arxiv (2019)
- Y. Wang et al., "Multi-agent dual learning", ICLR (2019)
- K. Song et al., "MASS: Masked sequence to sequence pre-training for language generation", ICML (2019)
- A. Conneau et al., "Cross-lingual language model pretraining", NeurIPS (2019)
- Y. Ju, et al. "Technical report on conversational question answering", arxiv (2019)
- D. Dua et al., "DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs", NAACL-HLT (2019)
- M. Pilehvar, "WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations", NAACL-HLT (2019)
- C. Clark et al., "BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions", NAACL-HLT (2019)
- M-C. De Marneffe et al., "The commitmentbank: Investigating projection in naturally occurring discourse", Sinn und Bedeutung (2019)
- R. Zellers et al., "Defending against neural fake news", NeurIPS (2019)
- D. Ippolito et al., "Automatic detection of generated text is easiest when humans are fooled", ACL (2019)
- S. Gehrmann et al. "GLTR: Statistical detection and visualization of generated text", ACL (2019)
- A. Holtzman et al., "The Curious Case of Neural Text Degeneration", ICLR (2019)
- X. Liu et al., "Improving multi-task deep neural networks via knowledge distillation for natural language understanding", arxiv (2019)
- I. Solaiman et al., "Release strategies and the social impacts of language models", arxiv (2019)
- P-S. Huang et al., "Reducing Sentiment Bias in Language Models via Counterfactual Evaluation", EMNLP (2020)
- D. Hernandez et al., "Measuring the algorithmic efficiency of neural networks", arxiv (2020)
- R. Schwartz et al., "Green AI", Communications of the ACM (2020)
- C. Raffel et al., "Exploring the limits of transfer learning with a unified text-to-text transformer", JMLR (2020)
- T. Brown et al., "Language models are few-shot learners", NeurIPS (2020)
- J. Kaplan et al., "Scaling laws for neural language models", arxiv (2020)
- Y. Nie et al. "Adversarial NLI: A new benchmark for natural language understanding", ACL (2020)
- Y. Bisk et al., "PiQA: Reasoning about physical commonsense in natural language", AAAI (2020)
- Y. Bisk et al. "Experience grounds language", arxiv (2020)
- X. Liu et al., "Adversarial training for large neural language models", arxiv (2020)
- A. Roberts, "How much knowledge can you pack into the parameters of a language model?", arxiv (2020)
- P. Lewis et al., "Retrieval-augmented generation for knowledge-intensive nlp tasks", NeurIPS (2020)
- Y. Liu et al., "Multilingual denoising pre-training for neural machine translation", ACL (2020)
- S-C. Lin et al., "Tttttackling winogrande schemas", arxiv (2020)
- D. Khashabi et al. "UnifiedQA: Crossing Format Boundaries with a Single QA System", EMNLP (2020)
- J. Zheng, "Numeric Transformer - ALBERT", AI2 leaderboard (2020)
- K. Guu et al,. "REALM: Retrieval-Augmented Language Model Pretraining" arxiv (2020)
- P-S. Huang et al., "Reducing Sentiment Bias in Language Models via Counterfactual Evaluation", EMNLP (2020)
- K. Sakaguchi et al., "Winogrande: An adversarial winograd schema challenge at scale", Communications of the ACM (2021)
- A. Radford et al., "Learning transferable visual models from natural language supervision", ICML (2021)
- S. Kreps et al., "All the news that’s fit to fabricate: AI-generated text as a tool of media misinformation", JEPS (2022)