Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/22025
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorNEVEN, Frank-
dc.contributor.advisorTAN, Tony-
dc.contributor.advisorVAN DEN BUSSCHE, Jan-
dc.contributor.authorDAENEN, Jonny-
dc.date.accessioned2016-09-15T09:41:17Z-
dc.date.available2016-09-15T09:41:17Z-
dc.date.issued2016-
dc.identifier.urihttp://hdl.handle.net/1942/22025-
dc.description.abstractIn this work, we identify three challenging subtopics in regard to optimizing Big data mining workflows. First, we focus on pattern mining and investigate the problem of enumerating string patterns described by a context-free language. We derive guarantees on the delay between generated items when using a naive algorithm. Our results contribute to the foundational aspects of computer science and provide a basis for obtaining similar guarantees in more complex enumeration problems. The second topic remains in the domain of pattern mining: we study the pattern mining problem applied to XML keys. We discuss the complexity of several important decision problems and devise an algorithm for discovering XML keys from a given set of XML data. The presented algorithm leverages previous results from search space exploration and relational key mining and is experimentally validated. For our final topic, we shift our attention to Big data mining, where query engines answer questions about data that exceeds the capacity of traditional relational database systems. To construct answers within a reasonable amount of time, we focus on parallel evaluation. We present a two-tiered strategy for optimizing query plans for a collection of strictly guarded fragment queries. The nature of these queries allows for a low-cost MapReduce evaluation (in terms of total and net time) that takes up to two rounds per subquery. We provide an implementation in our system called Gumbo and extensively compare it to existing systems.-
dc.language.isoen-
dc.subject.otherBig Data; Data Mining; XML; Pattern Enumeration; Pattern Mining; MapReduce;-
dc.titleTopics in Data Mining: Pattern Enumeration, XML Key Inference and Big Data Query Optimization-
dc.typeTheses and Dissertations-
local.format.pages208-
local.bibliographicCitation.jcatT1-
local.type.refereedNon-Refereed-
local.type.specifiedPhd thesis-
item.accessRightsOpen Access-
item.fullcitationDAENEN, Jonny (2016) Topics in Data Mining: Pattern Enumeration, XML Key Inference and Big Data Query Optimization.-
item.fulltextWith Fulltext-
item.contributorDAENEN, Jonny-
Appears in Collections:PhD theses
Research publications
Files in This Item:
File Description SizeFormat 
phd_daenen_final.pdf1.77 MBAdobe PDFView/Open
Show simple item record

Page view(s)

44
checked on Sep 6, 2022

Download(s)

26
checked on Sep 6, 2022

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.