Scaling Instruction-Finetuned Language Models (Flan-PaLM)
Summary: A description of the the work "Scaling Instruction-Finetuned Language Models" by Hyung Won Chung et al. published on arxiv in October 2022. This work introduced the Flan-PaLM 540B model.
Paper: arxiv link
Topics: instruction finetuning, foundation models, large language models
Slides: link (pdf)
References
- L. Ling et al., "Program induction by rationale generation: Learning to solve and explain algebraic word problems", ACL (2017)
- O-M. Camburu et al., "e-snli: Natural language inference with natural language explanations", NeurIPS (2018)
- N. Shazeer et al., "Adafactor: Adaptive learning rates with sublinear memory cost." ICML (2018)
- T. Brown et al., "Language models are few-shot learners", NeurIPS (2020)
- D. Hendrycks et al., "Measuring Massive Multitask Language Understanding", ICLR (2020)
- J. Clark et al., "TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages", ACL (2020)
- J. Kaplan et al., "Scaling laws for neural language models." arXiv preprint arXiv:2001.08361 (2020)
- D. So et al., "Searching for Efficient Transformers for Language Modeling", NeurIPS (2021)
- M. Chen et al. "Evaluating large language models trained on code", arxiv (2021)
- A. Srivastava et al., "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models", arxiv (2022)
- A. Roberts, "Scaling Up Models and Data with t5x and seqio", arxiv (2022)
- L. Ouyang et al., "Training language models to follow instructions with human feedback", arxiv (2022)
- J. Wei et al., "Finetuned Language Models are Zero-Shot Learners", ICLR (2022)
- V. Sanh et al., "Multitask Prompted Training Enables Zero-Shot Task Generalization", ICLR (2022)
- H. Chung et al., "Scaling Instruction-Finetuned Language Models", arxiv (2022)
- X. Wang et al., "Self-consistency improves chain of thought reasoning in language models", arxiv (2022)
- A. Chowdhery et al., "Palm: Scaling language modeling with pathways", arxiv (2022)
- C. Raffel et al., "Exploring the limits of transfer learning with a unified text-to-text transformer", JMLR (2020)
- J. Hoffmann et al., "Training Compute-Optimal Large Language Models", arxiv (2022)
- Y. Tay et al., "Unifying Language Learning Paradigms", arxiv (2022)
- Y. Tay et al., "Transcending Scaling Laws with 0.1% Extra Compute", arxiv (2022)
- Y. Wang et al., "Benchmarking generalization via in-context instructions on 1,600+ language tasks", arxiv (2022)
- M. Suzgun et al., "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them", arxiv (2022)
- F. Shi et al., "Language Models are Multilingual Chain-of-Thought Reasoners", arxiv (2022)
- L. Xue et al., "Byt5: Towards a token-free future with pre-trained byte-to-byte models", TACL (2022)
- T. Kojima et al., "Large Language Models are Zero-Shot Reasoners", arxiv (2022)
- V. Padmakumar et al., "Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning", arxiv (2022)
- N. Du et al., "Glam: Efficient scaling of language models with mixture-of-experts", ICML (2022)
- J. Huang et al., "Large Language Models Can Self-Improve", arxiv (2022)