Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021)

Vamplew, Peter; Smith, Benjamin J.; Kallstrom, Johan; Ramos, Gabriel; Radulescu, Roxana; Roijers, Diederik M.; Hayes, Conor F.; Heintz, Fredrik; Mannion, Patrick; LIBIN, Pieter; Dazeley, Richard; Foale, Cameron

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/37946

Full metadata record

DC Field	Value	Language
dc.contributor.author	Vamplew, Peter	-
dc.contributor.author	Smith, Benjamin J.	-
dc.contributor.author	Kallstrom, Johan	-
dc.contributor.author	Ramos, Gabriel	-
dc.contributor.author	Radulescu, Roxana	-
dc.contributor.author	Roijers, Diederik M.	-
dc.contributor.author	Hayes, Conor F.	-
dc.contributor.author	Heintz, Fredrik	-
dc.contributor.author	Mannion, Patrick	-
dc.contributor.author	LIBIN, Pieter	-
dc.contributor.author	Dazeley, Richard	-
dc.contributor.author	Foale, Cameron	-
dc.date.accessioned	2022-09-01T09:13:19Z	-
dc.date.available	2022-09-01T09:13:19Z	-
dc.date.issued	2022	-
dc.date.submitted	2022-08-16T11:29:58Z	-
dc.identifier.citation	Autonomous Agents and Multi-agent Systems, 36 (2) (Art N° 41)	-
dc.identifier.uri	http://hdl.handle.net/1942/37946	-
dc.description.abstract	The recent paper "Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, this type of reward is insufficient for the development of human-aligned artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour.	-
dc.description.sponsorship	Open Access funding enabled and organized by CAUL and its Member Institutions. This research was supported by funding from the Flemish Government under the “Onderzoeksprogramma Artifciële Intelligentie (AI) Vlaanderen” program, and by the National Cancer Institute of the U.S. National Institutes of Health under Award Number 1R01CA240452-01A1. The content is solely the responsibility of the authors and does not necessarily represent the ofcial views of the National Institutes of Health or of other funders. Pieter J.K. Libin acknowledges support from the Research Foundation Flanders (FWO, fwo.be) (postdoctoral fellowship 1242021N). Johan Källström and Fredrik Heintz were partially supported by the Swedish Governmental Agency for Innovation Systems (Grant NFFP7/2017- 04885), and the Wallenberg Artifcial Intelligence, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. Conor F. Hayes is funded by the National University of Ireland Galway Hardiman Scholarship. Gabriel Ramos was partially supported by FAPERGS (Grant 19/2551-0001277-2) and FAPESP (Grant 2020/05165-1).	-
dc.language.iso	en	-
dc.publisher	SPRINGER	-
dc.rights	Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.	-
dc.subject.other	Scalar rewards	-
dc.subject.other	Vector rewards	-
dc.subject.other	Artificial general intelligence	-
dc.subject.other	Reinforcement learning	-
dc.subject.other	Multi-objective decision making	-
dc.subject.other	Multi-objective reinforcement learning	-
dc.subject.other	Safe and ethical AI	-
dc.title	Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021)	-
dc.type	Journal Contribution	-
dc.identifier.issue	2	-
dc.identifier.volume	36	-
local.format.pages	19	-
local.bibliographicCitation.jcat	A1	-
dc.description.notes	Vamplew, P (corresponding author), Federat Univ Australia, Ballarat, Vic, Australia.	-
dc.description.notes	p.vamplew@federation.edu.au; benjsmith@gmail.com;	-
dc.description.notes	johan.kallstrom@liu.se; gdoramos@unisinos.br; roxana.radulescu@vub.be;	-
dc.description.notes	diederik.roijers@vub.be; c.hayes13@nuigalway.ie; fredrik.heintz@liu.se;	-
dc.description.notes	patrickmannion@nuigalway.ie; pieter.libin@vub.be;	-
dc.description.notes	richard.dazeley@deakin.edu.au; c.foale@federation.edu.au	-
local.publisher.place	VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS	-
local.type.refereed	Refereed	-
local.type.specified	Article	-
local.bibliographicCitation.artnr	41	-
dc.identifier.doi	10.1007/s10458-022-09575-5	-
dc.identifier.isi	000826149200001	-
dc.contributor.orcid	Vamplew, Peter/0000-0002-8687-4424	-
local.provider.type	wosris	-
local.description.affiliation	[Vamplew, Peter; Foale, Cameron] Federat Univ Australia, Ballarat, Vic, Australia.	-
local.description.affiliation	[Smith, Benjamin J.] Univ Oregon, Ctr Translat Neurosci, Eugene, OR 97403 USA.	-
local.description.affiliation	[Kallstrom, Johan; Heintz, Fredrik] Linkoping Univ, Linkoping, Sweden.	-
local.description.affiliation	[Ramos, Gabriel] Univ Vale Rio dos Sinos, Sao Leopoldo, RS, Brazil.	-
local.description.affiliation	[Radulescu, Roxana] Vrije Univ Brussel, AI Lab, Brussels, Belgium.	-
local.description.affiliation	[Roijers, Diederik M.; Libin, Pieter J. K.] Vrije Univ Brussel, Brussels, Belgium.	-
local.description.affiliation	[Roijers, Diederik M.] HU Univ Appl Sci Utrecht, Utrecht, Netherlands.	-
local.description.affiliation	[Hayes, Conor F.; Mannion, Patrick] Natl Univ Ireland Galway, Galway, Ireland.	-
local.description.affiliation	[Libin, Pieter J. K.] Univ Hasselt, Hasselt, Belgium.	-
local.description.affiliation	[Libin, Pieter J. K.] Katholieke Univ Leuven, Leuven, Belgium.	-
local.description.affiliation	[Dazeley, Richard] Deakin Univ, Geelong, Vic, Australia.	-
local.uhasselt.international	yes	-
item.fullcitation	Vamplew, Peter; Smith, Benjamin J.; Kallstrom, Johan; Ramos, Gabriel; Radulescu, Roxana; Roijers, Diederik M.; Hayes, Conor F.; Heintz, Fredrik; Mannion, Patrick; LIBIN, Pieter; Dazeley, Richard & Foale, Cameron (2022) Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021). In: Autonomous Agents and Multi-agent Systems, 36 (2) (Art N° 41).	-
item.accessRights	Open Access	-
item.contributor	Vamplew, Peter	-
item.contributor	Smith, Benjamin J.	-
item.contributor	Kallstrom, Johan	-
item.contributor	Ramos, Gabriel	-
item.contributor	Radulescu, Roxana	-
item.contributor	Roijers, Diederik M.	-
item.contributor	Hayes, Conor F.	-
item.contributor	Heintz, Fredrik	-
item.contributor	Mannion, Patrick	-
item.contributor	LIBIN, Pieter	-
item.contributor	Dazeley, Richard	-
item.contributor	Foale, Cameron	-
item.fulltext	With Fulltext	-
item.validation	ecoom 2023	-
crisitem.journal.issn	1387-2532	-
crisitem.journal.eissn	1573-7454	-
Appears in Collections:	Research publications

Files in This Item:

File	Description	Size	Format
Scalar reward is not enough_ a response to Silver, Singh, Precup and Sutton (2021).pdf	Published version	645.59 kB	Adobe PDF	View/Open

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM