Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/23344
Title: Parallel Evaluation of Multi-Semi-Joins
Authors: DAENEN, Jonny 
NEVEN, Frank 
TAN, Tony 
VANSUMMEREN, Stijn 
Issue Date: 2016
Source: Proceedings of the VLDB Endowmen, 9(10), p. 732-743
Abstract: While services such as Amazon AWS make computing power abundantly available, adding more computing nodes can in- cur high costs in, for instance, pay-as-you-go plans while not always significantly improving the net running time (aka wall-clock time) of queries. In this work, we provide algo- rithms for parallel evaluation of SGF queries in MapReduce that optimize total time, while retaining low net time. Not only can SGF queries specify all semi-join reducers, but also more expressive queries involving disjunction and negation. Since SGF queries can be seen as Boolean combinations of (potentially nested) semi-joins, we introduce a novel multi- semi-join (MSJ) MapReduce operator that enables the eval- uation of a set of semi-joins in one job. We use this op- erator to obtain parallel query plans for SGF queries that outvalue sequential plans w.r.t. net time and provide addi- tional optimizations aimed at minimizing total time without severely affecting net time. Even though the latter optimiza- tions are NP-hard, we present effective greedy algorithms. Our experiments, conducted using our own implementation Gumbo on top of Hadoop, confirm the usefulness of parallel query plans, and the effectiveness and scalability of our op- timizations, all with a significant improvement over Pig and Hive.
Document URI: http://hdl.handle.net/1942/23344
Link to publication/dataset: http://www.vldb.org/pvldb/vol9/p732-daenen.pdf
ISSN: 2150-8097
e-ISSN: 2150-8097
DOI: 10.14778/2977797.2977800
Rights: Copyright 2016 VLDB Endowment 2150-8097/16/06.
Category: A1
Type: Journal Contribution
Appears in Collections:Research publications

Files in This Item:
File Description SizeFormat 
p732-daenen.pdfPublished version324.84 kBAdobe PDFView/Open
Show full item record

SCOPUSTM   
Citations

3
checked on Sep 3, 2020

Page view(s)

104
checked on Sep 6, 2022

Download(s)

200
checked on Sep 6, 2022

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.