Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/48882
Full metadata record
DC FieldValueLanguage
dc.contributor.authorBEKKERS, Liese-
dc.contributor.authorNEVEN, Frank-
dc.contributor.authorPantelis, Lorrens-
dc.contributor.authorVANSUMMEREN, Stijn-
dc.date.accessioned2026-04-09T13:36:55Z-
dc.date.available2026-04-09T13:36:55Z-
dc.date.issued2026-
dc.date.submitted2026-03-17T07:21:32Z-
dc.identifier.urihttp://hdl.handle.net/1942/48882-
dc.description.abstractWe introduce the problem of Poisson sampling over joins: compute a sample of the result of a join query by conceptually performing a Bernoulli trial for each join tuple, using a non-uniform and tuple-specific probability. We propose an algorithm for Poisson sampling over acyclic joins that is nearly instance-optimal, running in time O(N + k \log N) where N is the size of the input database, and k is the size of the resulting sample. Our algorithm hinges on two building blocks: (1) The construction of a random-access index that allows, given a number i, to randomly access the i-th join tuple without fully materializing the (possibly large) join result; (2) The probing of this index to construct the result sample. We study the engineering trade-offs required to make both components practical, focusing on their implementation in column stores, and identify best-performing alternatives for both. Our experiments on real-world data demonstrate that this pair of alternatives significantly outperforms the repeated-Bernoulli-trial algorithm for Poisson sampling while also demonstrating that the random-access index by itself can be used to competively implement Yannakakis' acyclic join processing algorithm when no sampling is required. This shows that, as far a query engine design is concerned, it is possible to adopt a uniform basis for both classical acyclic join processing and Poisson sampling, both without regret compared to classical join and sampling algorithms.-
dc.language.isoen-
dc.publisherAssociation for Computing Machinery (ACM)-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. ©2026 Copyright held by the owner/author(s).-
dc.subject.otherComputer Science - Databases-
dc.titlePoisson Sampling over Acyclic Joins-
dc.typePreprint-
local.bibliographicCitation.conferencenameACM SIGMOD/PODS Conference-
local.bibliographicCitation.conferenceplaceBengaluru, India-
dc.identifier.issue3-
dc.identifier.volume4-
local.bibliographicCitation.jcatO-
local.type.refereedRefereed-
local.type.specifiedPreprint-
local.bibliographicCitation.artnr224-
dc.identifier.doi10.1145/3802101-
dc.identifier.arxivhttps://arxiv.org/abs/2603.10982-
dc.description.otherACMReference Format: Liese Bekkers, Frank Neven, Lorrens Pantelis, and Stijn Vansummeren. 2026. Poisson Sampling over Acyclic Joins. Proc. ACM Manag. Data 4, 3 (SIGMOD), Article 224 (June 2026), 34 pages. https://doi.org/10.1145/3802101-
local.provider.typePdf-
local.uhasselt.internationalno-
item.fullcitationBEKKERS, Liese; NEVEN, Frank; Pantelis, Lorrens & VANSUMMEREN, Stijn (2026) Poisson Sampling over Acyclic Joins.-
item.contributorBEKKERS, Liese-
item.contributorNEVEN, Frank-
item.contributorPantelis, Lorrens-
item.contributorVANSUMMEREN, Stijn-
item.accessRightsOpen Access-
item.fulltextWith Fulltext-
crisitem.journal.eissn2836-6573-
Appears in Collections:Research publications
Files in This Item:
File Description SizeFormat 
224-bekkers.pdfPublished version851.99 kBAdobe PDFView/Open
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.