Foundational challenges in assuring alignment and safety of large language models

Usman Anwar and Abulhair Saparov and Javier Rando and Daniel Paleka and Miles Turpin and Peter Hase and Ekdeep Singh Lubana and Erik Jenner and Stephen Casper and Oliver Sourbut and others, Transactions on Machine Learning Research, 2024
paper

@article{Anwar2024foundational,
    author = "Anwar, Usman and Saparov, Abulhair and Rando, Javier and Paleka, Daniel and Turpin, Miles and Hase, Peter and Lubana, Ekdeep Singh and Jenner, Erik and Casper, Stephen and Sourbut, Oliver and others",
    title = "Foundational challenges in assuring alignment and safety of large language models",
    year = "2024",
    month = "September",
    journal = "Transactions on Machine Learning Research"
}