Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/40734
Title: Technical Perspective: Conjunctive Queries with Comparisons
Authors: VANSUMMEREN, Stijn 
Issue Date: 2023
Publisher: ASSOC COMPUTING MACHINERY
Source: SIGMOD RECORD, 52 (1) , p. 53
Abstract: Query processing, the art of efficiently executing a rela-tional query on a given database, is a foundational and core area in data management research. Established at the dawn of relational database systems in the 1970's, relational query processing remains a highly relevant and vibrant research topic today as recent work shows that, apart from its application in traditional database scenarios, it is also highly effective in optimizing machine learning workloads [1]. Join ordering is a crucial part of efficiently processing re-lational queries. In the join ordering problem, we are given a join query like Q1 = R(A, B) 1 S(B, C) 1 T (C, D), and we need to determine the order in which we will execute this join. It could be, for example that the subquery R 1 S yields a very large subresult compared to the full join output, while S 1 T is very selective. In that case, we should first execute S 1 T , and then R 1 (S 1 T) to avoid wastefully computing the unnecessary tuples in R 1 S that do not join with any tuple in T. Most practical systems use statistics of the input relations to estimate the sizes of subresults, and reorder joins based on these estimates. Since size estimates are difficult to always get correct, however, we are not guaranteed to always obtain an optimal join order. A seminal result by Yannakakis [2] states that instead of looking at data statistics we may obtain optimal join orders by looking at query structure. In particular, because the hypergraph of our query Q1 above is α-acyclic [2] it can be executed in instance-optimal time O(IN + OUT) on any database, where IN denotes the size of the input database; OUT denotes the size of the query output; and the notation O suppresses a log O(1) factor. The core insight required to obtain this result is that for α-acyclic join queries we may remove so-called "dangling tuples" from the input in O(IN) time, after which any reasonable join order allows to compute Q1 in O(OUT) time. Here, an input tuple is dangling if it does not appear as part of a tuple in the output. The paper "Conjunctive Queries with Comparisons" by Wang and Yi generalizes Yannakakis' result from equi-join queries to theta-joins involving comparisons. Join queries with comparisons naturally appear in OLAP and spatio-temporal queries, but also in machine learning over rela-Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Q2 = R(A, B) 1 S(B, C) 1 T (C, D) s.t. A ≤ C Q3 = R(A, B) 1 S(B, C) 1 T (C, D) s.t. A ≤ D The comparison A ≤ C in Q2 is called a short comparison because it can be applied after joining two relations, while A ≤ D in Q3 is a long comparison because more than two relations need to be joined before it can be applied. Long comparisons are particularly important in the context of machine learning over relational data. Note that processing queries like Q2 and Q3 by first computing the equi-join between R, S, T and then filtering afterwards yields a complexity of O(IN + EJ) where EJ is the size of the equi-join result, which may be significantly larger than OUT. Wang and Yi generalize Yannakakis' result to join queries containing multiple comparisons, both short and long. They require that the query is α-acyclic in terms of equi-joins (cf. Yannakakis) while the comparisons must be Berge-acyclic. Using this structure, they propose a novel way for removing dangling tuples in the presence of comparisons in O(IN) time. This non-trivial insight hinges on grouping the data of input relations on equi-join attributes; computing minimum or maximum values of the comparison attributes per group; propagating the minima/maxima to other relations; and filtering the dangling tuples using orthogonal range searching. Once dangling tuples are removed, the minima/maxima are again used to enumerate the output tuples with constant delay , which implies O(OUT) total time for output construction. The total time is hence O(IN + OUT). While we have restricted ourselves in our discussion so far to join queries, i.e., queries without projections, Wang and Yi also discuss join queries with projections (a.k.a. conjunc-tive queries) and they show how queries that are not acyclic according to their definition can be transformed, based on generalized hypertree decompositions, into acyclic queries. Furthermore, they empirically show that while the complexity analysis above focuses on asymptotic complexity, and may therefore hide large constant, the algorithm is very efficient in practice. In short, this is an exemplary paper that proposes founda-tional innovation for query processing on the highly relevant class of conjunctive queries with comparisons.
Notes: Vansummeren, S (corresponding author), UHasselt, Data Sci Inst, Hasselt, Belgium.
stijn.vansummeren@uhasselt.be
Document URI: http://hdl.handle.net/1942/40734
ISSN: 0163-5808
e-ISSN: 1943-5835
ISI #: 001005918900011
Category: A2
Type: Journal Contribution
Appears in Collections:Research publications

Files in This Item:
File Description SizeFormat 
Technical Perspective_ Conjunctive Queries with Comparisons.pdf
  Restricted Access
Published version313.35 kBAdobe PDFView/Open    Request a copy
Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.