Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/45025
Title: SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow
Authors: Light, Dean
Aiashy, Ahmad
Diab, Mahmoud
Nachmias, Daniel
VANSUMMEREN, Stijn 
Kimelfeld, Benny
Issue Date: 2024
Publisher: ASSOC COMPUTING MACHINERY
Source: Proceedings of the Vldb Endowment, 17 (12) , p. 4281 -4284
Abstract: Document spanners have been proposed as a formal framework for declarative Information Extraction (IE) from text, following IE products from the industry and academia. Over the past decade, the framework has been studied thoroughly in terms of expressive power, complexity, and the ability to naturally combine text analysis with relational querying. This demonstration presents SPANNERLIB-a library for embedding document spanners in Python code. SPANNERLIB facilitates the development of IE programs by providing an implementation of Spannerlog (Datalog-based document spanners) that interacts with the Python code in two directions: rules can be embedded inside Python, and they can invoke custom Python code (e.g., calls to ML-based NLP models) via user-defined functions. The demonstration scenarios showcase IE programs, with increasing levels of complexity, within Jupyter Notebook.
Notes: Light, D (corresponding author), Technion, Haifa, Israel.
dean.light92@gmail.com; ahmad-ai@campus.technion.ac.il;
mahmoud.diab@campus.technion.ac.il; nach.daniel@gmail.com;
stijn.vansummeren@uhasselt.be; bennyk@cs.technion.ac.il
Document URI: http://hdl.handle.net/1942/45025
ISSN: 2150-8097
e-ISSN: 2150-8097
DOI: 10.14778/3685800.3685855
ISI #: 001378223700007
Rights: This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment
Category: A1
Type: Journal Contribution
Appears in Collections:Research publications

Files in This Item:
File Description SizeFormat 
SpannerLib_ Embedding Declarative Information Extraction in an Imperative Workflow.pdfPublished version648.93 kBAdobe PDFView/Open
Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.