Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/13632
Title: | Mining frequent itemsets in a stream | Authors: | Calders, Toon Dexters, Nele GILLIS, Joris GOETHALS, Bart |
Issue Date: | 2014 | Source: | INFORMATION SYSTEMS, 39, p. 233-255 | Abstract: | Mining frequent itemsets in a datastream proves to be a difficult problem, as itemsets arrive in rapid succession and storing parts of the stream is typically impossible. Nonetheless, it has many useful applications; e.g., opinion and sentiment analysis from social networks. Current stream mining algorithms are based on approximations. In earlier work, mining frequent items in a stream under the max-frequency measure proved to be effective for items. In this paper, we extended our work from items to itemsets. Firstly, an optimized incremental algorithm for mining frequent itemsets in a stream is presented. The algorithm maintains a very compact summary of the stream for selected itemsets. Secondly, we show that further compacting the summary is non-trivial. Thirdly, we establish a connection between the size of a summary and results from number theory. Fourthly, we report results of extensive experimentation, both of synthetic and real-world datasets, showing the efficiency of the algorithm both in terms of time and space. | Notes: | Gillis, JJM (reprint author),Hasselt Univ, Agoralaan Gebouw D, B-3590 Diepenbeek, Belgium, joris.gillis@uhasselt.be | Keywords: | Frequent itemset mining; Datastream; Theory; Algorithm; Experiments | Document URI: | http://hdl.handle.net/1942/13632 | ISSN: | 0306-4379 | e-ISSN: | 1873-6076 | DOI: | 10.1016/j.is.2012.01.005 | ISI #: | 000329531300012 | Category: | A1 | Type: | Journal Contribution | Validations: | ecoom 2015 |
Appears in Collections: | Research publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
paper.pdf | Peer-reviewed author version | 420.41 kB | Adobe PDF | View/Open |
calders 1.pdf | Published version | 1.61 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.