Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/45095
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPARCIAK, Marcel-
dc.contributor.authorVANDEVOORT, Brecht-
dc.contributor.authorNEVEN, Frank-
dc.contributor.authorPEETERS, Liesbet-
dc.contributor.authorVANSUMMEREN, Stijn-
dc.date.accessioned2025-01-16T11:20:13Z-
dc.date.available2025-01-16T11:20:13Z-
dc.date.issued2024-
dc.date.submitted2025-01-07T10:21:30Z-
dc.identifier.citationproceedings of Workshops at the 50th International Conference on Very Large Data Bases, VLDB 2024, VLDB Endowment,-
dc.identifier.issn2150-8097-
dc.identifier.urihttp://hdl.handle.net/1942/45095-
dc.description.abstractLarge Language Models (LLMs) have shown useful applications in a variety of tasks, including data wrangling. In this paper, we investigate the use of an off-the-shelf LLM for schema matching. Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions. Using a newly created benchmark from the health domain, we propose different so-called task scopes. These are methods for prompting the LLM to do schema matching, which vary in the amount of context information contained in the prompt. Using these task scopes we compare LLM-based schema matching against a string similarity baseline, investigating matching quality, verification effort, decisiveness, and complementarity of the approaches. We find that matching quality suffers from a lack of context information, but also from providing too much context information. In general, using newer LLM versions increases decisiveness. We identify task scopes that have acceptable verification effort and succeed in identifying a significant number of true semantic matches. Our study shows that LLMs have potential in bootstrapping the schema matching process and are able to assist data engineers in speeding up this task solely based on schema element names and descriptions without the need for data instances.-
dc.description.sponsorshipS. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under Grant No. BOF20ZAP02. This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme. This work was supported by Research Foundation—Flanders(FWO)forELIXIRBelgium(I002819N).Theresources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research FoundationFlanders (FWO) and the Flemish Government.-
dc.language.isoen-
dc.publisherVLDB Endowment-
dc.rightsThis work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. Proceedings of the VLDB Endowment. ISSN 2150-8097.-
dc.subjectComputer Science - Databases-
dc.subjectComputer Science - Databases-
dc.subjectComputer Science - Artificial Intelligence-
dc.subject.otherComputer Science - Databases-
dc.subject.otherComputer Science - Artificial Intelligence-
dc.titleSchema Matching with Large Language Models: an Experimental Study-
dc.typeProceedings Paper-
local.bibliographicCitation.conferencedate2024, August 26-30-
local.bibliographicCitation.conferencenameInternational Conference on Very Large Data Bases, VLDB 2024-
local.bibliographicCitation.conferenceplaceGuangzhou, China-
local.format.pages10-
local.bibliographicCitation.jcatC1-
local.type.refereedRefereed-
local.type.specifiedProceedings Paper-
dc.identifier.arxiv2407.11852-
dc.identifier.urlhttps://vldb.org/workshops/2024/proceedings/TaDA/TaDA.8.pdf-
local.provider.typeArXiv-
local.bibliographicCitation.btitleproceedings of Workshops at the 50th International Conference on Very Large Data Bases, VLDB 2024-
local.uhasselt.internationalno-
item.contributorPARCIAK, Marcel-
item.contributorVANDEVOORT, Brecht-
item.contributorNEVEN, Frank-
item.contributorPEETERS, Liesbet-
item.contributorVANSUMMEREN, Stijn-
item.fullcitationPARCIAK, Marcel; VANDEVOORT, Brecht; NEVEN, Frank; PEETERS, Liesbet & VANSUMMEREN, Stijn (2024) Schema Matching with Large Language Models: an Experimental Study. In: proceedings of Workshops at the 50th International Conference on Very Large Data Bases, VLDB 2024, VLDB Endowment,.-
item.fulltextWith Fulltext-
item.accessRightsRestricted Access-
Appears in Collections:Research publications
Files in This Item:
File Description SizeFormat 
TaDA.8.pdf
  Restricted Access
Published version1.2 MBAdobe PDFView/Open    Request a copy
Show simple item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.