Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/33439
Full metadata record
DC FieldValueLanguage
dc.contributor.authorFlorenzano, Fernando-
dc.contributor.authorRiveros, Cristian-
dc.contributor.authorUgarte, Martín-
dc.contributor.authorVANSUMMEREN, Stijn-
dc.contributor.authorVrgoc, Domagoj-
dc.date.accessioned2021-02-12T12:27:12Z-
dc.date.available2021-02-12T12:27:12Z-
dc.date.issued2020-
dc.date.submitted2021-02-11T19:06:58Z-
dc.identifier.citationACM TRANSACTIONS ON DATABASE SYSTEMS, 45 (1) , p. 1 -42 (Art N° 3)-
dc.identifier.urihttp://hdl.handle.net/1942/33439-
dc.description.abstractRegular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages to locate the data that a user wants to extract from a text document and then store this data into variables. Since document spanners can easily generate large outputs, it is important to have efficient evaluation algorithms that can generate the extracted data in a quick succession, and with relatively little precomputation time. Toward this goal, we present a practical evaluation algorithm that allows output-linear delay enumeration of a spanner's result after a precomputation phase that is linear in the document. Although the algorithm assumes that the spanner is specified in a syntactic variant of variable-set automata, we also study how it can be applied when the spanner is specified by general variable-set automata, regex formulas, or spanner algebras. Finally, we study the related problem of counting the number of outputs of a document spanner and provide a fine-grained analysis of the classes of document spanners that support efficient enumeration of their results.-
dc.language.isoen-
dc.publisherAssociation for Computing Machinery-
dc.subject.otherInformation extraction-
dc.subject.otherspanners-
dc.subject.otherenumeration delay-
dc.subject.otherautomata-
dc.subject.othercapture variables-
dc.titleEfficient Enumeration Algorithms for Regular Document Spanners-
dc.typeJournal Contribution-
dc.identifier.epage42-
dc.identifier.issue1-
dc.identifier.spage1-
dc.identifier.volume45-
local.bibliographicCitation.jcatA1-
local.publisher.place2 PENN PLAZA, STE 701, NEW YORK, NY 10121-0701 USA-
local.type.refereedRefereed-
local.type.specifiedArticle-
local.bibliographicCitation.artnr3-
dc.identifier.doi10.1145/3351451-
dc.identifier.isiWOS:000583687500004-
local.provider.typeWeb of Science-
local.uhasselt.uhpubno-
local.uhasselt.internationalyes-
item.fulltextNo Fulltext-
item.contributorFlorenzano, Fernando-
item.contributorRiveros, Cristian-
item.contributorUgarte, Martín-
item.contributorVANSUMMEREN, Stijn-
item.contributorVrgoc, Domagoj-
item.fullcitationFlorenzano, Fernando; Riveros, Cristian; Ugarte, Martín; VANSUMMEREN, Stijn & Vrgoc, Domagoj (2020) Efficient Enumeration Algorithms for Regular Document Spanners. In: ACM TRANSACTIONS ON DATABASE SYSTEMS, 45 (1) , p. 1 -42 (Art N° 3).-
item.accessRightsClosed Access-
crisitem.journal.issn0362-5915-
crisitem.journal.eissn1557-4644-
Appears in Collections:Research publications
Show simple item record

WEB OF SCIENCETM
Citations

12
checked on Oct 17, 2024

Page view(s)

40
checked on Jul 31, 2023

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.