Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021)

Vamplew, Peter; Smith, Benjamin J.; Kallstrom, Johan; Ramos, Gabriel; Radulescu, Roxana; Roijers, Diederik M.; Hayes, Conor F.; Heintz, Fredrik; Mannion, Patrick; LIBIN, Pieter; Dazeley, Richard; Foale, Cameron

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/37946

Title:	Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021)
Authors:	Vamplew, Peter Smith, Benjamin J. Kallstrom, Johan Ramos, Gabriel Radulescu, Roxana Roijers, Diederik M. Hayes, Conor F. Heintz, Fredrik Mannion, Patrick LIBIN, Pieter Dazeley, Richard Foale, Cameron
Issue Date:	2022
Publisher:	SPRINGER
Source:	Autonomous Agents and Multi-agent Systems, 36 (2) (Art N° 41)
Abstract:	The recent paper "Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, this type of reward is insufficient for the development of human-aligned artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour.
Notes:	Vamplew, P (corresponding author), Federat Univ Australia, Ballarat, Vic, Australia. p.vamplew@federation.edu.au; benjsmith@gmail.com; johan.kallstrom@liu.se; gdoramos@unisinos.br; roxana.radulescu@vub.be; diederik.roijers@vub.be; c.hayes13@nuigalway.ie; fredrik.heintz@liu.se; patrickmannion@nuigalway.ie; pieter.libin@vub.be; richard.dazeley@deakin.edu.au; c.foale@federation.edu.au
Keywords:	Scalar rewards;Vector rewards;Artificial general intelligence;Reinforcement learning;Multi-objective decision making;Multi-objective reinforcement learning;Safe and ethical AI
Document URI:	http://hdl.handle.net/1942/37946
ISSN:	1387-2532
e-ISSN:	1573-7454
DOI:	10.1007/s10458-022-09575-5
ISI #:	000826149200001
Rights:	Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Category:	A1
Type:	Journal Contribution
Validations:	ecoom 2023
Appears in Collections:	Research publications

Files in This Item:

File	Description	Size	Format
Scalar reward is not enough_ a response to Silver, Singh, Precup and Sutton (2021).pdf	Published version	645.59 kB	Adobe PDF	View/Open

Show full item record

SCOPUS^TM
Citations

58

checked on Apr 28, 2026

WEB OF SCIENCE^TM
Citations

35

checked on Apr 23, 2026

Google Scholar^TM

Check

Files in This Item:

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM