Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/33398
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Florenzano, Florenzano | - |
dc.contributor.author | Riveros, Cristian | - |
dc.contributor.author | Ugarte, Martín | - |
dc.contributor.author | VANSUMMEREN, Stijn | - |
dc.contributor.author | Vrgoc, Domagoj | - |
dc.date.accessioned | 2021-02-11T14:22:14Z | - |
dc.date.available | 2021-02-11T14:22:14Z | - |
dc.date.issued | 2018 | - |
dc.date.submitted | 2021-02-11T13:37:22Z | - |
dc.identifier.citation | Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Association for Computing Machinery, p. 165 -177 | - |
dc.identifier.isbn | 9781450347068 | - |
dc.identifier.uri | http://hdl.handle.net/1942/33398 | - |
dc.description.abstract | Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages in order to locate the data that a user wants to extract from a text document, and then store this data into variables. Since document spanners can easily generate large outputs, it is important to have good evaluation algorithms that can generate the extracted data in a quick succession, and with relatively little precomputation time. Towards this goal, we present a practical evaluation algorithm that allows constant delay enumeration of a spanner's output after a precomputation phase that is linear in the document. While the algorithm assumes that the spanner is specified in a syntactic variant of variable set automata, we also study how it can be applied when the spanner is specified by general variable set automata, regex formulas, or spanner algebras. Finally, we study the related problem of counting the number of outputs of a document spanner, providing a fine grained analysis of the classes of document spanners that support efficient enumeration of their results. | - |
dc.language.iso | en | - |
dc.publisher | Association for Computing Machinery | - |
dc.subject.other | Information Extraction | - |
dc.subject.other | Spanners | - |
dc.subject.other | Constant-delay evaluation | - |
dc.subject.other | Automata | - |
dc.subject.other | Capture Variables | - |
dc.title | Constant Delay Algorithms for Regular Document Spanners | - |
dc.type | Proceedings Paper | - |
local.bibliographicCitation.conferencedate | June 10-15, 2018 | - |
local.bibliographicCitation.conferencename | PODS 2018: 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems | - |
local.bibliographicCitation.conferenceplace | Houston (Texas), USA | - |
dc.identifier.epage | 177 | - |
dc.identifier.spage | 165 | - |
local.bibliographicCitation.jcat | C1 | - |
local.publisher.place | 1515 BROADWAY, NEW YORK, NY 10036-9998 USA | - |
local.type.refereed | Refereed | - |
local.type.specified | Proceedings Paper | - |
dc.identifier.doi | 10.1145/3196959.3196987 | - |
dc.identifier.isi | WOS:000455483100013 | - |
local.provider.type | Web of Science | - |
local.bibliographicCitation.btitle | Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems | - |
local.uhasselt.uhpub | no | - |
local.uhasselt.international | yes | - |
item.fulltext | No Fulltext | - |
item.contributor | Florenzano, Florenzano | - |
item.contributor | Riveros, Cristian | - |
item.contributor | Ugarte, Martín | - |
item.contributor | VANSUMMEREN, Stijn | - |
item.contributor | Vrgoc, Domagoj | - |
item.fullcitation | Florenzano, Florenzano; Riveros, Cristian; Ugarte, Martín; VANSUMMEREN, Stijn & Vrgoc, Domagoj (2018) Constant Delay Algorithms for Regular Document Spanners. In: Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Association for Computing Machinery, p. 165 -177. | - |
item.accessRights | Closed Access | - |
Appears in Collections: | Research publications |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.