Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/18330
Title: Discovering XSD Keys from XML Data
Authors: Arenas, Marcelo
DAENEN, Jonny 
NEVEN, Frank 
UGARTE, Martin
VAN DEN BUSSCHE, Jan 
VANSUMMEREN, Stijn 
Issue Date: 2014
Source: ACM TRANSACTIONS ON DATABASE SYSTEMS, 39 (4), p. 28-28
Abstract: A great deal of research into the learning of schemas from XML data has been conducted in recent years to enable the automatic discovery of XML schemas from XML documents when no schema or only a low-quality one is available. Unfortunately, and in strong contrast to, for instance, the relational model, the automatic discovery of even the simplest of XML constraints, namely XML keys, has been left largely unexplored in this context. A major obstacle here is the unavailability of a theory on reasoning about XML keys in the presence of XML schemas, which is needed to validate the quality of candidate keys. The present article embarks on a fundamental study of such a theory and classifies the complexity of several crucial properties concerning XML keys in the presence of an XSD, like, for instance, testing for consistency, boundedness, satisfiability, universality, and equivalence. Of independent interest, novel results are obtained related to cardinality estimation of XPath result sets. A mining algorithm is then developed within the framework of levelwise search. The algorithm leverages known discovery algorithms for functional dependencies in the relational model, but incorporates the properties mentioned before to assess and refine the quality of derived keys. An experimental study on an extensive body of real-world XML data evaluating the effectiveness of the proposed algorithm is provided.
Notes: Neven, F (reprint author), Hasselt Univ, B-3900 Diepenbeek, Belgium. frank.neven@uhasselt.be
Keywords: information systems; database management; database applications; data mining; algorithms; languages; experimentation; theory; XML key
Document URI: http://hdl.handle.net/1942/18330
ISSN: 0362-5915
e-ISSN: 1557-4644
DOI: 10.1145/2638547
ISI #: 000347799000003
Rights: © 2014 ACM 0362-5915/2014/12-ART28 $15.00
Category: A1
Type: Journal Contribution
Validations: ecoom 2016
Appears in Collections:Research publications

Files in This Item:
File Description SizeFormat 
a28-arenas.pdf1.14 MBAdobe PDFView/Open
Show full item record

SCOPUSTM   
Citations

4
checked on Sep 3, 2020

WEB OF SCIENCETM
Citations

5
checked on Apr 14, 2024

Page view(s)

86
checked on Apr 26, 2023

Download(s)

168
checked on Apr 26, 2023

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.