Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/22025
Title: Topics in Data Mining: Pattern Enumeration, XML Key Inference and Big Data Query Optimization
Authors: DAENEN, Jonny 
Advisors: NEVEN, Frank
TAN, Tony
VAN DEN BUSSCHE, Jan
Issue Date: 2016
Abstract: In this work, we identify three challenging subtopics in regard to optimizing Big data mining workflows. First, we focus on pattern mining and investigate the problem of enumerating string patterns described by a context-free language. We derive guarantees on the delay between generated items when using a naive algorithm. Our results contribute to the foundational aspects of computer science and provide a basis for obtaining similar guarantees in more complex enumeration problems. The second topic remains in the domain of pattern mining: we study the pattern mining problem applied to XML keys. We discuss the complexity of several important decision problems and devise an algorithm for discovering XML keys from a given set of XML data. The presented algorithm leverages previous results from search space exploration and relational key mining and is experimentally validated. For our final topic, we shift our attention to Big data mining, where query engines answer questions about data that exceeds the capacity of traditional relational database systems. To construct answers within a reasonable amount of time, we focus on parallel evaluation. We present a two-tiered strategy for optimizing query plans for a collection of strictly guarded fragment queries. The nature of these queries allows for a low-cost MapReduce evaluation (in terms of total and net time) that takes up to two rounds per subquery. We provide an implementation in our system called Gumbo and extensively compare it to existing systems.
Keywords: Big Data; Data Mining; XML; Pattern Enumeration; Pattern Mining; MapReduce;
Document URI: http://hdl.handle.net/1942/22025
Category: T1
Type: Theses and Dissertations
Appears in Collections:PhD theses
Research publications

Files in This Item:
File Description SizeFormat 
phd_daenen_final.pdf1.77 MBAdobe PDFView/Open
Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.