Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/48882| Title: | Poisson Sampling over Acyclic Joins | Authors: | BEKKERS, Liese NEVEN, Frank Pantelis, Lorrens VANSUMMEREN, Stijn |
Issue Date: | 2026 | Publisher: | Association for Computing Machinery (ACM) | Abstract: | We introduce the problem of Poisson sampling over joins: compute a sample of the result of a join query by conceptually performing a Bernoulli trial for each join tuple, using a non-uniform and tuple-specific probability. We propose an algorithm for Poisson sampling over acyclic joins that is nearly instance-optimal, running in time O(N + k \log N) where N is the size of the input database, and k is the size of the resulting sample. Our algorithm hinges on two building blocks: (1) The construction of a random-access index that allows, given a number i, to randomly access the i-th join tuple without fully materializing the (possibly large) join result; (2) The probing of this index to construct the result sample. We study the engineering trade-offs required to make both components practical, focusing on their implementation in column stores, and identify best-performing alternatives for both. Our experiments on real-world data demonstrate that this pair of alternatives significantly outperforms the repeated-Bernoulli-trial algorithm for Poisson sampling while also demonstrating that the random-access index by itself can be used to competively implement Yannakakis' acyclic join processing algorithm when no sampling is required. This shows that, as far a query engine design is concerned, it is possible to adopt a uniform basis for both classical acyclic join processing and Poisson sampling, both without regret compared to classical join and sampling algorithms. | Other: | ACMReference Format: Liese Bekkers, Frank Neven, Lorrens Pantelis, and Stijn Vansummeren. 2026. Poisson Sampling over Acyclic Joins. Proc. ACM Manag. Data 4, 3 (SIGMOD), Article 224 (June 2026), 34 pages. https://doi.org/10.1145/3802101 | Keywords: | Computer Science - Databases | Document URI: | http://hdl.handle.net/1942/48882 | e-ISSN: | 2836-6573 | DOI: | 10.1145/3802101 | Rights: | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. ©2026 Copyright held by the owner/author(s). | Category: | O | Type: | Preprint |
| Appears in Collections: | Research publications |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 224-bekkers.pdf | Published version | 851.99 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.