Scaling Instruction-Finetuned Language Models (Flan-PaLM)

Summary: A description of the the work "Scaling Instruction-Finetuned Language Models" by Hyung Won Chung et al. published on arxiv in October 2022. This work introduced the Flan-PaLM 540B model.
Paper: arxiv link
Topics: instruction finetuning, foundation models, large language models
Slides: link (pdf)

References

L. Ling et al., "Program induction by rationale generation: Learning to solve and explain algebraic word problems", ACL (2017)
O-M. Camburu et al., "e-snli: Natural language inference with natural language explanations", NeurIPS (2018)
N. Shazeer et al., "Adafactor: Adaptive learning rates with sublinear memory cost." ICML (2018)
T. Brown et al., "Language models are few-shot learners", NeurIPS (2020)
D. Hendrycks et al., "Measuring Massive Multitask Language Understanding", ICLR (2020)
J. Clark et al., "TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages", ACL (2020)
J. Kaplan et al., "Scaling laws for neural language models." arXiv preprint arXiv:2001.08361 (2020)
D. So et al., "Searching for Efficient Transformers for Language Modeling", NeurIPS (2021)
M. Chen et al. "Evaluating large language models trained on code", arxiv (2021)
A. Srivastava et al., "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models", arxiv (2022)
A. Roberts, "Scaling Up Models and Data with t5x and seqio", arxiv (2022)
L. Ouyang et al., "Training language models to follow instructions with human feedback", arxiv (2022)
J. Wei et al., "Finetuned Language Models are Zero-Shot Learners", ICLR (2022)
V. Sanh et al., "Multitask Prompted Training Enables Zero-Shot Task Generalization", ICLR (2022)
H. Chung et al., "Scaling Instruction-Finetuned Language Models", arxiv (2022)
X. Wang et al., "Self-consistency improves chain of thought reasoning in language models", arxiv (2022)
A. Chowdhery et al., "Palm: Scaling language modeling with pathways", arxiv (2022)
C. Raffel et al., "Exploring the limits of transfer learning with a unified text-to-text transformer", JMLR (2020)
J. Hoffmann et al., "Training Compute-Optimal Large Language Models", arxiv (2022)
Y. Tay et al., "Unifying Language Learning Paradigms", arxiv (2022)
Y. Tay et al., "Transcending Scaling Laws with 0.1% Extra Compute", arxiv (2022)
Y. Wang et al., "Benchmarking generalization via in-context instructions on 1,600+ language tasks", arxiv (2022)
M. Suzgun et al., "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them", arxiv (2022)
F. Shi et al., "Language Models are Multilingual Chain-of-Thought Reasoners", arxiv (2022)
L. Xue et al., "Byt5: Towards a token-free future with pre-trained byte-to-byte models", TACL (2022)
T. Kojima et al., "Large Language Models are Zero-Shot Reasoners", arxiv (2022)
V. Padmakumar et al., "Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning", arxiv (2022)
N. Du et al., "Glam: Efficient scaling of language models with mixture-of-experts", ICML (2022)
J. Huang et al., "Large Language Models Can Self-Improve", arxiv (2022)

Samuel Albanie