Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/18330
Full metadata record
DC FieldValueLanguage
dc.contributor.authorArenas, Marcelo-
dc.contributor.authorDAENEN, Jonny-
dc.contributor.authorNEVEN, Frank-
dc.contributor.authorUGARTE, Martin-
dc.contributor.authorVAN DEN BUSSCHE, Jan-
dc.contributor.authorVANSUMMEREN, Stijn-
dc.date.accessioned2015-02-12T12:34:59Z-
dc.date.available2015-02-12T12:34:59Z-
dc.date.issued2014-
dc.identifier.citationACM TRANSACTIONS ON DATABASE SYSTEMS, 39 (4), p. 28-28-
dc.identifier.issn0362-5915-
dc.identifier.urihttp://hdl.handle.net/1942/18330-
dc.description.abstractA great deal of research into the learning of schemas from XML data has been conducted in recent years to enable the automatic discovery of XML schemas from XML documents when no schema or only a low-quality one is available. Unfortunately, and in strong contrast to, for instance, the relational model, the automatic discovery of even the simplest of XML constraints, namely XML keys, has been left largely unexplored in this context. A major obstacle here is the unavailability of a theory on reasoning about XML keys in the presence of XML schemas, which is needed to validate the quality of candidate keys. The present article embarks on a fundamental study of such a theory and classifies the complexity of several crucial properties concerning XML keys in the presence of an XSD, like, for instance, testing for consistency, boundedness, satisfiability, universality, and equivalence. Of independent interest, novel results are obtained related to cardinality estimation of XPath result sets. A mining algorithm is then developed within the framework of levelwise search. The algorithm leverages known discovery algorithms for functional dependencies in the relational model, but incorporates the properties mentioned before to assess and refine the quality of derived keys. An experimental study on an extensive body of real-world XML data evaluating the effectiveness of the proposed algorithm is provided.-
dc.description.sponsorshipThe computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government – department EWI. The authors acknowledge the financial support of the Fondecyt grant no. 1131049, FP7-ICT-233599, FWO G082109, and ERC grant agreement DIADEM, no. 246858.-
dc.language.isoen-
dc.rights© 2014 ACM 0362-5915/2014/12-ART28 $15.00-
dc.subject.otherinformation systems; database management; database applications; data mining; algorithms; languages; experimentation; theory; XML key-
dc.titleDiscovering XSD Keys from XML Data-
dc.typeJournal Contribution-
dc.identifier.epage28-
dc.identifier.issue4-
dc.identifier.spage28-
dc.identifier.volume39-
local.bibliographicCitation.jcatA1-
dc.description.notesNeven, F (reprint author), Hasselt Univ, B-3900 Diepenbeek, Belgium. frank.neven@uhasselt.be-
dc.relation.referencesSerge Abiteboul, Yael Amsterdamer, Daniel Deutch, Tova Milo, and Pierre Senellart. 2012. Finding optimal probabilistic generators for XML collections. In Proceedings of the 15th International Conference on Database Theory (ICDT'12). ACM Press, New York, 127--139. Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley. Marcelo Arenas, Jonny Daenen, Frank Neven, Martin Ugarte, Jan Van den Bussche, and Stijn Vansummeren. 2013. Discovering XSD keys from XML data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'13). 61--72. Marcelo Arenas, Wenfei Fan, and Leonid Libkin. 2002. What's hard about XML schema constraints? In Proceedings of the 13th International Conference on Database and Expert Systems Applications (DEXA'02). Lecture Notes in Computer Science, vol. 2453, Springer, 269--278. 5 Denilson Barbosa and Alberto O. Mendelzon. 2003. Finding id attributes in XML documents. In Proceedings of the 1st International XML Database Symposium (XSym'03). Lecture Notes in Computer Science, vol. 2824, Springer, 180--194. 6 Geert Jan Bex, Wouter Gelade, Wim Martens, and Frank Neven. 2009. Simplifying XML schema: Effortless handling of nondeterministic regular expressions. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'09). 731--744. Geert Jan Bex, Wouter Gelade, Frank Neven, and Stijn Vansummeren. 2010a. Learning deterministic regular expressions for the inference of schemas from XML data. ACM Trans. Web 4, 4. Geert Jan Bex, Frank Neven, Thomas Schwentick, and Stijn Vansummeren. 2010b. Inference of concise regular expressions and dtds. ACM Trans. Database Syst. 35, 2. Geert Jan Bex, Frank Neven, and Stijn Vansummeren. 2007. Inferring XML schema definitions from XML data. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07). 998--1009. Geert Jan Bex, Frank Neven, and Stijn Vansummeren. 2008. SchemaScope: A system for inferring and cleaning XML schemas. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'08). 1259--1262. Dina Bitton, Jeffrey Millman, and Solveig Torgersen. 1989. A feasibility and performance study of dependency inference. In Proceedings of the International Conference on Data Engineering (ICDE'89). 635--641. Henrik Björklund, Wim Martens, and Thomas Schwentick. 2013. Validity of tree pattern queries with respect to schema information. In Proceedings of the 38th International Symposium on Mathematical Foundations of Computer Science (MFCS'13). Lecture Notes in Computer Science, vol. 8087, Springer, 171--182. Mikolaj Bojanczyk. 2008. Tree-walking automata. In Proceedings of the 2nd International Conference on Language and Automata Theory and Applications (LATA'08). Lecture Notes in Computer Science, vol. 5196, Springer, 1--2. Anne Bruggemann-Klein and Derick Wood. 1998. One-unambiguous regular languages. Inf. Comput. 140, 2, 229--253. Peter Buneman, Susan B. Davidson, Wenfei Fan, Carmem S. Hara, and Wang Chiew Tan. 2002. Keys for XML. Comput. Netw. 39, 5, 473--487. Peter Buneman, Susan B. Davidson, Wenfei Fan, Carmem S. Hara, and Wang Chiew Tan. 2003. Reasoning about keys for XML. Inf. Syst. 28, 8, 1037--1063. Stanislav Fajt, Irena Mlynkova, and Martin Necasky. 2011. On mining xml integrity constraints. In Proceedings of the 6th IEEE International Conference on Digital Information Management (ICDIM'11). 23--29. Wenfei Fan and Leonid Libkin. 2002. On XML integrity constraints in the presence of dtds. J. ACM 49, 3, 368--406. Minos N. Garofalakis, Aristides Gionis, Rajeev Rastogi, Sridhar Seshadri, and Kyuseok Shim. 2003. XTRACT: Learning document type descriptors from XML document collections. Data Min. Knowl. Discov. 7, 1, 23--56. Gösta Grahne and Jianfei Zhu. 2002. Discovering approximate keys in XML data. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'02). 453--460. Steven Grijzenhout and Maarten Marx. 2010. University of amsterdam XML web collection. http://data.politicalmashup.nl/sgrijzen/xmlweb/. Sven Hartmann and Sebastian Link. 2009. Efficient reasoning about a robust XML key fragment. ACM Trans. Database Syst. 34, 2. John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. 2003. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. Donald E. Knuth, James H. Morris Jr., and Vaughan R. PRATT. 1977. Fast pattern matching in strings. SIAM J. Comput. 6, 2, 323--350. Heiki Mannila and Kari-Jouko Räihä. 1989. Practical algorithms for finding prime attributes and testing normal forms. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS'89). ACM Press, New York, 128--133. Heikki Mannila and Kari-Jouko Räihä. 1991. The Design of Relational Databases. Addison-Wesley. Heikki Mannila and Kari-Jouko Räihä. 1994. Algorithms for inferring functional dependencies from relations. Data Knowl. Engin. 12, 1, 83--99. Heikki Mannila and Hannu Toivonen. 1997. Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Discov. 1, 3, 241--258. Wim Martens, Frank Neven, and Thomas Schwentick. 2007. Simple off the shelf abstractions for XML schema. SIGMOD Rec. 36, 3, 15--22. Wim Martens, Frank Neven, Thomas Schwentick, and Geert Jan Bex. 2006. Expressiveness and complexity of XML schema. ACM Trans. Database Syst. 31, 3, 770--813. Makoto Murata, Dongwon Lee, Murali Mani, and Kohsuke Kawaguchi. 2005. Taxonomy of XML schema languages using formal language theory. ACM Trans. Internet Technol. 5, 4, 660--704. Martin Necaský and Irena Mlýnkova. 2009. Discovering XML keys and foreign keys in queries. In Proceedings of the ACM Symposium on Applied Computing (SAC'09). ACM Press, New York, 632--638. Raghu Ramakrishnan and Johannes Gehrke. 2003. Database Management Systems 3rd Ed. McGraw-Hill. Helmut Seidl. 1990. Deciding equivalence of finite tree automata. SIAM J. Comput. 19, 3, 424--437. Richard Edwin Stearns and Harry B. Hunt Iii. 1985. On the equivalence and containment problems for unambiguous regular expressions, regular grammars and finite automata. SIAM J. Comput. 14, 3, 598--611. Larry J. Stockmeyer and Albert R. Meyer. 1973. Word problems requiring exponential time: Preliminary report. In Proceedings of the 5th Annual ACM Symposium on Theory of Computing (STOC'73). 1--9. Peter Van Emde Boas. 1997. The convenience of tilings. In Complexity, Logic, and Recursion Theory, Marcel Dekker, 331--363. W3C. 2004. XML Schema Part 1: Structures 2nd Ed. http://www.w3.org/TR/xmlschema-1/#cIdentity-constraint Cong Yu and H. V. Jagadish. 2008. XML schema refinement through redundancy detection and normalization. VLDB J. 17, 2, 203--223.-
local.type.refereedRefereed-
local.type.specifiedArticle-
local.type.programmeVSC-
dc.identifier.doi10.1145/2638547-
dc.identifier.isi000347799000003-
item.validationecoom 2016-
item.fulltextWith Fulltext-
item.accessRightsOpen Access-
item.fullcitationArenas, Marcelo; DAENEN, Jonny; NEVEN, Frank; UGARTE, Martin; VAN DEN BUSSCHE, Jan & VANSUMMEREN, Stijn (2014) Discovering XSD Keys from XML Data. In: ACM TRANSACTIONS ON DATABASE SYSTEMS, 39 (4), p. 28-28.-
item.contributorArenas, Marcelo-
item.contributorDAENEN, Jonny-
item.contributorNEVEN, Frank-
item.contributorUGARTE, Martin-
item.contributorVAN DEN BUSSCHE, Jan-
item.contributorVANSUMMEREN, Stijn-
crisitem.journal.issn0362-5915-
crisitem.journal.eissn1557-4644-
Appears in Collections:Research publications
Files in This Item:
File Description SizeFormat 
a28-arenas.pdf1.14 MBAdobe PDFView/Open
Show simple item record

SCOPUSTM   
Citations

4
checked on Sep 3, 2020

WEB OF SCIENCETM
Citations

5
checked on Apr 30, 2024

Page view(s)

86
checked on Apr 26, 2023

Download(s)

168
checked on Apr 26, 2023

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.