Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/10651
Title: Efficient constraint evaluation in categorical sequential pattern mining for trajectory databases
Authors: Gomez, Letitia
VAISMAN, Alejandro 
Issue Date: 2009
Publisher: ACM International Conference Proceeding Series
Source: Kersten, Martin & Novikov, Boris & Teubner, Jens & Polutin, Vladimir & Manegold, Stefan (Ed.) Proceedings of the 12th International Conference on Extending Database Technology. p. 541-552.
Abstract: The classic Generalized Sequential Patterns (GSP) algorithm returns all frequent sequences present in a database. However, usually a few ones are interesting from a user's point of view. Thus, post-processing tasks are required in order to discard uninteresting sequences. To avoid this drawback, languages based on regular expressions (RE) were proposed to restrict frequent sequences to the ones that satisfy user-specified constraints. In all of these languages, REs are applied over items, which limits their applicability in complex real-world situations. We propose a much powerful language, based on regular expressions, denoted RE-SPaM, where the basic elements are constraints defined over the (temporal and non-temporal) attributes of the items to be mined. Expressions in this language may include attributes, functions over attributes, and variables. We specify the syntax and semantics of RE-SPaM, and present a comprehensive set of examples to illustrate its expressive power. We study in detail how the expressions can be used to prune the resulting sequences in the mining process. In addition, we introduce techniques that allow pruning sequences in the early stages of the process, reducing the need to access the database, making use of the categorization of the attributes that compose the items, and of the automaton that accepts the language generated by the RE. Finally, we present experimental results. Although in this paper we focus on trajectory databases, our approach is general enough for being applied to other settings.
Document URI: http://hdl.handle.net/1942/10651
Link to publication/dataset: http://doi.acm.org/10.1145/1516360.1516423
ISBN: 978-1-60558-422-5
Category: C1
Type: Proceedings Paper
Appears in Collections:Research publications

Show full item record

Page view(s)

68
checked on Nov 7, 2023

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.