Evaluating Large Language Models Trained on Code (Codex)



Summary: A video description of the paper entitled "Evaluating Large Language Models Trained on Code" by M. Chen et al. published on arxiv in July 2021.
Paper: The paper can be found on arxiv here.
Topics: codex, language models, foundation models, coding
Slides: link (pdf)

Acknowledgements: Particular thanks are due to Almut Sohpia Koepke for her help with decoding some of the code produced by Codex and other technical details.

References
  • H. Simon, "Experiments with a heuristic compiler", JACM (1963)
  • Z. Manna et al., "Toward automatic program synthesis", Comm. of ACM (1971)
  • T. McCabe, "A complexity measure", IEEE Trans. Softw. Eng. (1976)
  • S. Hochreiter et al., "Long short-term memory", Neural Computation (1997)
  • K. Papineni et al., "Bleu: a method for automatic evaluation of machine translation", ACL (2002)
  • A. Hindle et al., "On the naturalness of software", ICSE (2012)
  • A. Graves, "Generating sequences with recurrent neural networks", arxiv (2013)
  • T. Mikolov et al., "Efficient estimation of word representations in vector space", arxiv (2013)
  • W. Zaremba et al., "Learning to execute", arxiv (2014)
  • M. Clarkson et al., "Temporal logics for hyperproperties", ICPST (2014)
  • I. Sutskever et al., "Sequence to sequence learning with neural networks", NeurIPS (2014)
  • A. Dai et al., "Semi-supervised sequence learning", NeurIPS (2015)
  • D. P. Kingma et al., "Adam: A method for stochastic optimization", ICLR (2015)
  • T. Helmuth et al., "General program synthesis benchmark suite", GECCO (2015)
  • A. van den Oord et al., "Pixel recurrent neural networks", ICML (2016)
  • A. van den Oord et al., "Wavenet: A generative model for raw audio", arxiv (2016)
  • A. Gaunt et al., "TerpreT: A probabilistic programming language for program induction", arxiv (2016)
  • A. Vaswani et al., "Attention is all you need", NeurIPS (2017)
  • A. Das et al., "Visual dialog", CVPR (2017)
  • E. Pantridge et al., "On the difficulty of benchmarking inductive program synthesis methods", GECCO (2017)
  • K. Crawford, "The trouble with bias", NeurIPS (2017)
  • M. E. Peters, "Deep contextualised word representations", arxiv (2018)
  • A. Radford et al. "Improving language understanding by generative pre-training" (2018)
  • J. Menick et al., "Generating high fidelity images with subscale pixel networks and multidimensional upscaling", arxiv (2018)
  • P. Christiano, "Clarifying 'AI alignment'", https://ai-alignment.com/clarifying-ai-alignment-cec47cd69dd6 (2018)
  • A. van den Oord et al., "Representation learning with contrastive predictive coding", arxiv (2018)
  • D. Amodei et al., "AI and Compute", https://openai.com/blog/ai-and-compute/ (2018)
  • R. Child et al., "Generating long sequences with sparse transformers", arxiv (2019)
  • J. Devlin et al., "Bert: Pre-training of deep bidirectional transformers for language understanding", NAACL-HLT (2019)
  • E. Alley et al., "Unified rational protein engineering with sequence-based deep representation learning", Nature methods (2019)
  • J. Lu et al., "Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks", NeurIPS (2019)
  • H. Husain et al., "CodeSearchNet challenge: Evaluating the state of semantic code search", arxiv (2019)
  • S. Kulal et al., "SPoC: Search-based pseudocode to code", NeurIPS (2019)
  • J. Devlin et al., "Bert: Pre-training of deep bidirectional transformers for language understanding", NAACL-HLT (2019)
  • A. Holtzman et al., "The Curious Case of Neural Text Degeneration", ICLR (2019)
  • N. Keskar et al., "CTRL: A conditional transformer language model for controllable generation", arxiv (2019)
  • N. Leveson, "Improving the Standard Risk Matrix: Part 1" (2019)
  • M. Chen et al., "Generative pretraining from pixels", ICML (2020)
  • P. Dhariwal et al., "Jukebox: A generative model for music", arxiv (2020)
  • A. Baevski et al., "wav2vec 2.0: A framework for self-supervised learning of speech representations", NeurIPS (2020)
  • L. Gao et al., "The Pile: An 800gb dataset of diverse text for language modeling", arxiv (2020)
  • Z. Feng et al., "CodeBERT: A pre-trained model for programming and natural languages", EMNLP (2020)
  • C. Raffel et al., "Exploring the limits of transfer learning with a unified text-to-text transformer", JMLR (2020)
  • C. Clement et al. "PyMT5: multi-mode translation of natural language and Python code with transformers", arxiv (2020)
  • T. Brown et al., "Language models are few-shot learners", NeurIPS (2020)
  • S. Ren et al., "CodeBLEU: a method for automatic evaluation of code synthesis", arxiv (2020)
  • B. Roziere et al., "Unsupervised translation of programming languages", NeurIPS (2020)
  • J. Kaplan et al., "Scaling laws for neural language models", arxiv (2020)
  • M. O’Neill et al., "Automatic programming: The open issue?", GPEM (2020)
  • N. Stiennon et al., "Learning to summarize with human feedback", NeurIPS (2020)
  • S. L. Blodgett et al., "Language (Technology) is Power: A Critical Survey of “Bias” in NLP", ACL (2020)
  • D. Acemoglu, "The wrong kind of AI? Artificial intelligence and the future of labour demand", CJRES (2020)
  • R. Schwartz et al., "Green AI", Communications of the ACM (2020)
  • B. Smith, "Microsoft will be carbon negative by 2030", https://blogs.microsoft.com/blog/2020/01/16/microsoft-will-be-carbon-negative-by-2030/ (2020)
  • A. Ramesh et al., "Zero-shot text-to-image generation", ICML (2021)
  • M. Chen et al., "Evaluating large language models trained on code", arxiv (2021)
  • H. Bao et al., "BEiT: BERT Pre-Training of Image Transformers", ICLR (2021)
  • A. Rives et al., "Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences", PNAS (2021)
  • R. Zellers et al., "Merlot: Multimodal neural script knowledge models", NeurIPS (2021)
  • D. Hendrycks et al., "Measuring Coding Challenge Competence With APPS", NeurIPS (2021)
  • B. Wang et al., "GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model" (2021)
  • S. Black et al., "GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow" (2021)
  • Z. Kenton et al., "Alignment of Language Agents" (2021)
  • E. M. Bender et al., "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big", FAccT (2021)
  • A. Abid et al., "Persistent anti-muslim bias in large language models", AIES (2021)
  • N. Carlini et al., "Extracting training data from large language models", USENIX Security (2021)
  • K. Crawford, "The atlas of AI: Power, politics, and the planetary costs of artificial intelligence", Yale University Press (2021)
  • A. Ziegler, "A first look at rote learning in github copilot suggestions" (2021)
  • F. Xu et al., "In-IDE code generation from natural language: Promise and challenges", TOSEM (2022)