Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/48882
Title: Poisson Sampling over Acyclic Joins
Authors: BEKKERS, Liese 
NEVEN, Frank 
Pantelis, Lorrens
VANSUMMEREN, Stijn 
Issue Date: 2026
Publisher: Association for Computing Machinery (ACM)
Abstract: We introduce the problem of Poisson sampling over joins: compute a sample of the result of a join query by conceptually performing a Bernoulli trial for each join tuple, using a non-uniform and tuple-specific probability. We propose an algorithm for Poisson sampling over acyclic joins that is nearly instance-optimal, running in time O(N + k \log N) where N is the size of the input database, and k is the size of the resulting sample. Our algorithm hinges on two building blocks: (1) The construction of a random-access index that allows, given a number i, to randomly access the i-th join tuple without fully materializing the (possibly large) join result; (2) The probing of this index to construct the result sample. We study the engineering trade-offs required to make both components practical, focusing on their implementation in column stores, and identify best-performing alternatives for both. Our experiments on real-world data demonstrate that this pair of alternatives significantly outperforms the repeated-Bernoulli-trial algorithm for Poisson sampling while also demonstrating that the random-access index by itself can be used to competively implement Yannakakis' acyclic join processing algorithm when no sampling is required. This shows that, as far a query engine design is concerned, it is possible to adopt a uniform basis for both classical acyclic join processing and Poisson sampling, both without regret compared to classical join and sampling algorithms.
Other: ACMReference Format: Liese Bekkers, Frank Neven, Lorrens Pantelis, and Stijn Vansummeren. 2026. Poisson Sampling over Acyclic Joins. Proc. ACM Manag. Data 4, 3 (SIGMOD), Article 224 (June 2026), 34 pages. https://doi.org/10.1145/3802101
Keywords: Computer Science - Databases
Document URI: http://hdl.handle.net/1942/48882
e-ISSN: 2836-6573
DOI: 10.1145/3802101
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. ©2026 Copyright held by the owner/author(s).
Category: O
Type: Preprint
Appears in Collections:Research publications

Files in This Item:
File Description SizeFormat 
224-bekkers.pdfPublished version851.99 kBAdobe PDFView/Open
Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.