SW-SGD: The Sliding Window Stochastic Gradient Descent Algorithm

Chakroun, Imen; HABER, Tom; Ashby, Thomas J.

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/26396

Full metadata record

DC Field	Value	Language
dc.contributor.author	Chakroun, Imen	-
dc.contributor.author	HABER, Tom	-
dc.contributor.author	Ashby, Thomas J.	-
dc.date.accessioned	2018-07-20T15:15:48Z	-
dc.date.available	2018-07-20T15:15:48Z	-
dc.date.issued	2017	-
dc.identifier.citation	Koumoutsakos, Pedro; Lees, Michael; Krzhizhanovskaya, Valeria; Dongarra, Jack; Sloot, Peter M. A. (Ed.). International Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland, Elsevier Science BV, p. 2318-2322	-
dc.identifier.issn	1877-0509	-
dc.identifier.uri	http://hdl.handle.net/1942/26396	-
dc.description.abstract	Stochastic Gradient Descent (SGD, or 1-SGD in our notation) is probably the most popular family of optimisation algorithms used in machine learning on large data sets due to its ability to optimise efficiently with respect to the number of complete training set data touches (epochs) used. Various authors have worked on data or model parallelism for SGD, but there is little work on how SGD fits with memory hierarchies ubiquitous in HPC machines. Standard practice suggests randomising the order of training points and streaming the whole set through the learner, which results in extremely low temporal locality of access to the training set and thus, when dealing with large data sets, makes minimal use of the small, fast layers of memory in an HPC memory hierarchy. Mini-batch SGD with batch size n (n-SGD) is often used to control the noise on the gradient and make convergence smoother and more easy to identify, but this can reduce the learning efficiency wrt. epochs when compared to 1-SGD whilst also having the same extremely low temporal locality. In this paper we introduce Sliding Window SGD (SW-SGD) which uses temporal locality of training point access in an attempt to combine the advantages of 1-SGD (epoch efficiency) with n-SGD (smoother convergence and easier identification of convergence) by leveraging HPC memory hierarchies. We give initial results on part of the Pascal dataset that show that memory hierarchies can be used to improve SGD performance. (C) 2017 The Authors. Published by Elsevier B.V.	-
dc.description.sponsorship	This work is funded by the European project ExCAPE [2] which received funding from the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement no. 671555.	-
dc.language.iso	en	-
dc.publisher	Elsevier Science BV	-
dc.relation.ispartofseries	Procedia Computer Science	-
dc.rights	2017 The Authors. Published by Elsevier B.V	-
dc.subject.other	SGD	-
dc.subject.other	sliding window	-
dc.subject.other	machine learning	-
dc.subject.other	SVM	-
dc.subject.other	logistic regression	-
dc.title	SW-SGD: The Sliding Window Stochastic Gradient Descent Algorithm	-
dc.type	Proceedings Paper	-
local.bibliographicCitation.authors	Koumoutsakos, Pedro	-
local.bibliographicCitation.authors	Lees, Michael	-
local.bibliographicCitation.authors	Krzhizhanovskaya, Valeria	-
local.bibliographicCitation.authors	Dongarra, Jack	-
local.bibliographicCitation.authors	Sloot, Peter M. A.	-
local.bibliographicCitation.conferencedate	2017, Juli 12-14	-
local.bibliographicCitation.conferencename	International Conference on Computational Science (ICCS)	-
local.bibliographicCitation.conferenceplace	Zurich, Switzerland	-
dc.identifier.epage	2322	-
dc.identifier.spage	2318	-
dc.identifier.volume	108	-
local.format.pages	5	-
local.bibliographicCitation.jcat	C1	-
dc.description.notes	[Chakroun, Imen; Ashby, Thomas J.] IMEC, Kapeldreef 75, B-3001 Leuven, Belgium. [Haber, Tom] Expertise Ctr Digital Media, Wetenschapspk 2, B-3590 Diepenbeek, Belgium. [Chakroun, Imen; Haber, Tom; Ashby, Thomas J.] ExaSci Life Lab, Kapeldreef 75, B-3001 Leuven, Belgium.	-
local.publisher.place	Amsterdam, The Netherlands	-
local.type.refereed	Refereed	-
local.type.specified	Proceedings Paper	-
local.relation.ispartofseriesnr	108	-
local.class	dsPublValOverrule/author_version_not_expected	-
local.type.programme	H2020	-
local.relation.h2020	671555	-
dc.identifier.doi	10.1016/j.procs.2017.05.082	-
dc.identifier.isi	000404959000243	-
local.bibliographicCitation.btitle	International Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland	-
local.uhasselt.international	no	-
item.contributor	Chakroun, Imen	-
item.contributor	HABER, Tom	-
item.contributor	Ashby, Thomas J.	-
item.fulltext	With Fulltext	-
item.validation	ecoom 2018	-
item.fullcitation	Chakroun, Imen; HABER, Tom & Ashby, Thomas J. (2017) SW-SGD: The Sliding Window Stochastic Gradient Descent Algorithm. In: Koumoutsakos, Pedro; Lees, Michael; Krzhizhanovskaya, Valeria; Dongarra, Jack; Sloot, Peter M. A. (Ed.). International Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland, Elsevier Science BV, p. 2318-2322.	-
item.accessRights	Open Access	-
crisitem.journal.issn	1877-0509	-
Appears in Collections:	Research publications

Files in This Item:

File	Description	Size	Format
Haber.pdf	Published version	473.7 kB	Adobe PDF	View/Open

Show simple item record

SCOPUS^TM
Citations

24

checked on Oct 12, 2025

WEB OF SCIENCE^TM
Citations

13

checked on Oct 11, 2025

Google Scholar^TM

Check

Files in This Item:

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM