Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/23351
Title: Reasoning on data partitioning for single-round multi-join evaluation in massively parallel systems
Authors: AMELOOT, Tom 
GECK, Gaetano 
KETSMAN, Bas 
NEVEN, Frank 
Schwentick, Thomas
Issue Date: 2017
Source: Communications of the ACM, 60(3), p. 93-100
Abstract: Evaluating queries over massive amounts of data is a major challenge in the big data era. Modern massively parallel systems, like e.g. Spark, organize query answering as a sequence of rounds each consisting of a distinct communication phase followed by a computation phase. The communication phase redistributes data over the available servers, while in the subsequent computation phase each server performs the actual computation on its local data. There is a growing interest in single-round algorithms for evaluating multiway joins where data is first reshuffled over the servers and then evaluated in a parallel but communication-free way. As the amount of communication induced by a reshuffling of the data is a dominating cost in such systems, we introduce a framework for reasoning about data partitioning to detect when we can avoid the data reshuffling step. Specifically, we formalize the decision problems parallel-correctness and transfer of parallel-correctness, provide semantical characterizations, and obtain tight complexity bounds.
Notes: Ameloot, TJ (reprint author), Hasselt Univ, Hasselt, Belgium. tom.ameloot@uhasselt.be; gaetano.geck@udo.edu; bas.ketsman@uhasselt.be; frank.neven@uhasselt.be; thomas.schwentick@udo.edu
Document URI: http://hdl.handle.net/1942/23351
ISSN: 0001-0782
e-ISSN: 1557-7317
DOI: 10.1145/3041063
ISI #: 000396058600024
Category: A1
Type: Journal Contribution
Validations: ecoom 2018
Appears in Collections:Research publications

Files in This Item:
File Description SizeFormat 
cacm.pdfPeer-reviewed author version333.7 kBAdobe PDFView/Open
p93-ameloot.pdf
  Restricted Access
Published version1.19 MBAdobe PDFView/Open    Request a copy
Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.