Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/11703
Full metadata record
DC FieldValueLanguage
dc.contributor.authorANTONOPOULOS, Timos-
dc.contributor.authorGEERTS, Floris-
dc.contributor.authorMARTENS, Wim-
dc.contributor.authorNEVEN, Frank-
dc.date.accessioned2011-02-25T11:21:07Z-
dc.date.availableNO_RESTRICTION-
dc.date.available2011-02-25T11:21:07Z-
dc.date.issued2011-
dc.identifier.citationMilo, Tova (Ed.) Proceedings of 14th International Conference on Database Theory. p. 30-41.-
dc.identifier.isbn9781450305297-
dc.identifier.urihttp://hdl.handle.net/1942/11703-
dc.description.abstractTo experimentally validate learning and approximation algorithms for XML Schema Definitions (XSDs), we need algorithms to generate uniformly at random a corpus of XSDs as well as a similarity measure to compare how close the generated XSD resembles the target schema. In this paper, we provide the formal foundation for such a testbed. We adopt similarity measures based on counting the number of common and different trees in the two languages, and we develop the necessary machinery for computing them. We use the formalism of extended DTDs (EDTDs) to represent the unranked regular tree languages. In particular, we obtain an efficient algorithm to count the number of trees up to a certain size in an unambiguous EDTD. The latter class of unambiguous EDTDs encompasses the more familiar classes of single-type, restrained competition and bottom-up deterministic EDTDs. The single-type EDTDs correspond precisely to the core of XML Schema, while the others are strictly more expressive. We also show how constraints on the shape of allowed trees can be incorporated. As we make use of a translation into a well-known formalism for combinatorial specifications, we get for free a sampling procedure to draw members of any unambiguous EDTD. When dropping the restriction to unambiguous EDTDs, i.e. taking the full class of EDTDs into account, we show that the counting problem becomes #P-complete and provide an approximation algorithm. Finally, we discuss uniform generation of single-type EDTDs, i.e., the formal abstraction of XSDs. To this end, we provide an algorithm to generate k-occurrence automata (k-OAs) uniformly at random and show how this leads to uniform generation of single-type EDTDs.-
dc.description.sponsorshipWe acknowledge the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under the FET-Open grant agreement FOX, number FP7-ICT-233599.-
dc.language.isoen-
dc.publisherACM-
dc.rightsCopyright 2011 ACM-
dc.subject.otheralgorithms; design; theory-
dc.titleGenerating, sampling and counting subclasses of regular tree languages-
dc.typeProceedings Paper-
local.bibliographicCitation.authorsMilo, Tova-
local.bibliographicCitation.conferencedateUppsala, Sweden-
local.bibliographicCitation.conferencenameDatabase Theory - ICDT 2011, 14th International Conference-
dc.bibliographicCitation.conferencenr14-
local.bibliographicCitation.conferenceplace21-24/03/2011-
dc.identifier.epage41-
dc.identifier.spage30-
local.bibliographicCitation.jcatC1-
local.type.refereedRefereed-
local.type.specifiedProceedings Paper-
dc.bibliographicCitation.oldjcatC2-
local.bibliographicCitation.btitleProceedings of 14th International Conference on Database Theory-
item.contributorANTONOPOULOS, Timos-
item.contributorGEERTS, Floris-
item.contributorMARTENS, Wim-
item.contributorNEVEN, Frank-
item.fullcitationANTONOPOULOS, Timos; GEERTS, Floris; MARTENS, Wim & NEVEN, Frank (2011) Generating, sampling and counting subclasses of regular tree languages. In: Milo, Tova (Ed.) Proceedings of 14th International Conference on Database Theory. p. 30-41..-
item.accessRightsOpen Access-
item.fulltextWith Fulltext-
Appears in Collections:Research publications
Files in This Item:
File Description SizeFormat 
p30-antonopoulos.pdfPublished version895.81 kBAdobe PDFView/Open
Show simple item record

Page view(s)

78
checked on Sep 7, 2022

Download(s)

244
checked on Sep 7, 2022

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.