Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/22025
Title: | Topics in Data Mining: Pattern Enumeration, XML Key Inference and Big Data Query Optimization | Authors: | DAENEN, Jonny | Advisors: | NEVEN, Frank TAN, Tony VAN DEN BUSSCHE, Jan |
Issue Date: | 2016 | Abstract: | In this work, we identify three challenging subtopics in regard to optimizing Big data mining workflows. First, we focus on pattern mining and investigate the problem of enumerating string patterns described by a context-free language. We derive guarantees on the delay between generated items when using a naive algorithm. Our results contribute to the foundational aspects of computer science and provide a basis for obtaining similar guarantees in more complex enumeration problems. The second topic remains in the domain of pattern mining: we study the pattern mining problem applied to XML keys. We discuss the complexity of several important decision problems and devise an algorithm for discovering XML keys from a given set of XML data. The presented algorithm leverages previous results from search space exploration and relational key mining and is experimentally validated. For our final topic, we shift our attention to Big data mining, where query engines answer questions about data that exceeds the capacity of traditional relational database systems. To construct answers within a reasonable amount of time, we focus on parallel evaluation. We present a two-tiered strategy for optimizing query plans for a collection of strictly guarded fragment queries. The nature of these queries allows for a low-cost MapReduce evaluation (in terms of total and net time) that takes up to two rounds per subquery. We provide an implementation in our system called Gumbo and extensively compare it to existing systems. | Keywords: | Big Data; Data Mining; XML; Pattern Enumeration; Pattern Mining; MapReduce; | Document URI: | http://hdl.handle.net/1942/22025 | Category: | T1 | Type: | Theses and Dissertations |
Appears in Collections: | PhD theses Research publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
phd_daenen_final.pdf | 1.77 MB | Adobe PDF | View/Open |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.