Scaling Instruction-Finetuned Language Models (Flan-PaLM)



Summary: A description of the the work "Scaling Instruction-Finetuned Language Models" by Hyung Won Chung et al. published on arxiv in October 2022. This work introduced the Flan-PaLM 540B model.
Paper: arxiv link
Topics: instruction finetuning, foundation models, large language models
Slides: link (pdf)

References
  • L. Ling et al., "Program induction by rationale generation: Learning to solve and explain algebraic word problems", ACL (2017)
  • O-M. Camburu et al., "e-snli: Natural language inference with natural language explanations", NeurIPS (2018)
  • N. Shazeer et al., "Adafactor: Adaptive learning rates with sublinear memory cost." ICML (2018)
  • T. Brown et al., "Language models are few-shot learners", NeurIPS (2020)
  • D. Hendrycks et al., "Measuring Massive Multitask Language Understanding", ICLR (2020)
  • J. Clark et al., "TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages", ACL (2020)
  • J. Kaplan et al., "Scaling laws for neural language models." arXiv preprint arXiv:2001.08361 (2020)
  • D. So et al., "Searching for Efficient Transformers for Language Modeling", NeurIPS (2021)
  • M. Chen et al. "Evaluating large language models trained on code", arxiv (2021)
  • A. Srivastava et al., "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models", arxiv (2022)
  • A. Roberts, "Scaling Up Models and Data with t5x and seqio", arxiv (2022)
  • L. Ouyang et al., "Training language models to follow instructions with human feedback", arxiv (2022)
  • J. Wei et al., "Finetuned Language Models are Zero-Shot Learners", ICLR (2022)
  • V. Sanh et al., "Multitask Prompted Training Enables Zero-Shot Task Generalization", ICLR (2022)
  • H. Chung et al., "Scaling Instruction-Finetuned Language Models", arxiv (2022)
  • X. Wang et al., "Self-consistency improves chain of thought reasoning in language models", arxiv (2022)
  • A. Chowdhery et al., "Palm: Scaling language modeling with pathways", arxiv (2022)
  • C. Raffel et al., "Exploring the limits of transfer learning with a unified text-to-text transformer", JMLR (2020)
  • J. Hoffmann et al., "Training Compute-Optimal Large Language Models", arxiv (2022)
  • Y. Tay et al., "Unifying Language Learning Paradigms", arxiv (2022)
  • Y. Tay et al., "Transcending Scaling Laws with 0.1% Extra Compute", arxiv (2022)
  • Y. Wang et al., "Benchmarking generalization via in-context instructions on 1,600+ language tasks", arxiv (2022)
  • M. Suzgun et al., "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them", arxiv (2022)
  • F. Shi et al., "Language Models are Multilingual Chain-of-Thought Reasoners", arxiv (2022)
  • L. Xue et al., "Byt5: Towards a token-free future with pre-trained byte-to-byte models", TACL (2022)
  • T. Kojima et al., "Large Language Models are Zero-Shot Reasoners", arxiv (2022)
  • V. Padmakumar et al., "Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning", arxiv (2022)
  • N. Du et al., "Glam: Efficient scaling of language models with mixture-of-experts", ICML (2022)
  • J. Huang et al., "Large Language Models Can Self-Improve", arxiv (2022)