Topics in Data Mining: Pattern Enumeration, XML Key Inference and Big Data Query Optimization

DAENEN, Jonny

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/22025

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	NEVEN, Frank	-
dc.contributor.advisor	TAN, Tony	-
dc.contributor.advisor	VAN DEN BUSSCHE, Jan	-
dc.contributor.author	DAENEN, Jonny	-
dc.date.accessioned	2016-09-15T09:41:17Z	-
dc.date.available	2016-09-15T09:41:17Z	-
dc.date.issued	2016	-
dc.identifier.uri	http://hdl.handle.net/1942/22025	-
dc.description.abstract	In this work, we identify three challenging subtopics in regard to optimizing Big data mining workflows. First, we focus on pattern mining and investigate the problem of enumerating string patterns described by a context-free language. We derive guarantees on the delay between generated items when using a naive algorithm. Our results contribute to the foundational aspects of computer science and provide a basis for obtaining similar guarantees in more complex enumeration problems. The second topic remains in the domain of pattern mining: we study the pattern mining problem applied to XML keys. We discuss the complexity of several important decision problems and devise an algorithm for discovering XML keys from a given set of XML data. The presented algorithm leverages previous results from search space exploration and relational key mining and is experimentally validated. For our final topic, we shift our attention to Big data mining, where query engines answer questions about data that exceeds the capacity of traditional relational database systems. To construct answers within a reasonable amount of time, we focus on parallel evaluation. We present a two-tiered strategy for optimizing query plans for a collection of strictly guarded fragment queries. The nature of these queries allows for a low-cost MapReduce evaluation (in terms of total and net time) that takes up to two rounds per subquery. We provide an implementation in our system called Gumbo and extensively compare it to existing systems.	-
dc.language.iso	en	-
dc.subject.other	Big Data; Data Mining; XML; Pattern Enumeration; Pattern Mining; MapReduce;	-
dc.title	Topics in Data Mining: Pattern Enumeration, XML Key Inference and Big Data Query Optimization	-
dc.type	Theses and Dissertations	-
local.format.pages	208	-
local.bibliographicCitation.jcat	T1	-
local.type.refereed	Non-Refereed	-
local.type.specified	Phd thesis	-
item.fulltext	With Fulltext	-
item.accessRights	Open Access	-
item.fullcitation	DAENEN, Jonny (2016) Topics in Data Mining: Pattern Enumeration, XML Key Inference and Big Data Query Optimization.	-
item.contributor	DAENEN, Jonny	-
Appears in Collections:	PhD theses Research publications

Files in This Item:

File	Description	Size	Format
phd_daenen_final.pdf		1.77 MB	Adobe PDF	View/Open

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM