What is the alignment problem?



Summary: A short description of >What is the alignment problem? by Jan Leike.
Topics: AI alignment, capabilities, the hard problem of alignment
Slides: link (pdf)

References
  • N. Wiener, "Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers", Science (1960)
  • J. Leike, M. Martic, S. Legg, https://www.deepmind.com/blog/learning-through-human-feedback (2017)
  • L. Ouyang et al., "Training language models to follow instructions with human feedback", arxiv (2022)
  • J. Leike, https://aligned.substack.com/p/what-is-alignment (2022)