Crosslingual Generalization through Multitask Finetuning (BLOOMZ & mT0)

Summary: A description of the the work 'Crosslingual Generalization through Multitask Finetuning' by Niklas Muennighoff et al. published on arxiv in November 2022 as part of the BigScience Workshop. This work introduced the BLOOMZ and mT0 models.
Paper: arxiv link
Topics: multitask finetuning, foundation models, large language models, multilingual models
Slides: link (pdf)
Code and models: link (GitHub)

  • A. Conneau et al., "XNLI: Evaluating Cross-lingual Sentence Representations", EMNLP (2018)
  • Y. Zhang et al., "PAWS: Paraphrase Adversaries from Word Scrambling", NAACL-HLT (2019)
  • Y. Yang et al., "PAWS-X: A cross-lingual adversarial dataset for paraphrase identification", EMNLP (2019)
  • A. Radford et al., "Language models are unsupervised multitask learners", Technical Report (2019)
  • s
  • A. Conneau and G. Lample, "Cross-lingual language model pretraining", NeurIPS (2019)
  • s
  • T. Brown et al., "Language models are few-shot learners", NeurIPS (2020)
  • A. Conneau et al., "Unsupervised Cross-lingual Representation Learning at Scale", ACL (2020)
  • s
  • L. Xue et al., "mT5: A massively multilingual pre-trained text-to-text transformer", arxiv (2020)
  • E. Ponti et al., "XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning", EMNLP, (2020)
  • Y. Liu et al., "Multilingual denoising pre-training for neural machine translation", ACL (2020)
  • s
  • S. Min et al., "MetaICL: Learning to learn in context", arXiv (2021)
  • N. Goyal et al., "Larger-scale transformers for multilingual masked language modeling", arxiv (2021)
  • V. Lin et al., "Few-shot learning with multilingual language models", arxiv (2021)
  • M. Chen et al. "Evaluating large language models trained on code", arxiv (2021)
  • L. Gao et al., "A framework for few-shot language model evaluation." Version v0.0.1 (2021)
  • A. Tikhonov et al., "It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning", ACL/IJCNLP (2021)
  • M. Kosec et al., "Packing: Towards 2x nlp bert acceleration", arxiv (2021)
  • A. Fan et al., "Beyond English-Centric Multilingual Machine Translation", JMLR (2021)
  • s
  • J. Wei et al., "Finetuned Language Models are Zero-Shot Learners", ICLR (2022)
  • O. Shliazhko et al., "mGPT: Few-Shot Learners Go Multilingual", arxiv (2022)
  • s
  • V. Sanh et al., "Multitask Prompted Training Enables Zero-Shot Task Generalization", ICLR (2022)
  • S. Bach et al., "PromptSource: An IDE and Repository for Natural Language Prompts", ACL Demo (2022)
  • A. Patel et al., "Bidirectional Language Models Are Also Few-shot Learners", arxiv (2022)
  • S. Soltan et al., "AlexaTM 20B: Few-shot learning using a large-scale multilingual seq2seq model", arxiv (2022)
  • Big Science Workshop, "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model" (2022)
  • H. Laurençon et al., "The BigScience ROOTS Corpus: A 1.6 TB Composite Multilingual Dataset", NeurIPS Datasets Track (2022)
  • M. Costa-jussà et al., "No language left behind: Scaling human-centered machine translation", arxiv (2022)
  • Y. Wang et al., "Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks", arxiv (2022)
  • Y. Tay et al., "Unifying Language Learning Paradigms", arxiv (2022)
  • J. Fries et al. "BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing", arxiv (2022)
  • s
  • T. Scialom et al., "Continual-T0: Progressively Instructing 50+ Tasks to Language Models Without Forgetting", arxiv (2022)
  • s
  • S. Mishra et al., "Natural instructions: Benchmarking generalization to new tasks from natural language instructions", ACL (2022)
  • s
  • H. Chung et al., "Scaling Instruction-Finetuned Language Models", arxiv (2022)
  • A. Roberts et al., "Scaling Up Models and Data with t5x and seqio", arxiv (2022)
  • T. Wang et al., "What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?", arxiv (2022)
  • N. Muennighoff et al., "Crosslingual Generalization through Multitask Finetuning", arxiv (2022)