Schema Matching with Large Language Models: an Experimental Study

PARCIAK, Marcel; VANDEVOORT, Brecht; NEVEN, Frank; PEETERS, Liesbet; VANSUMMEREN, Stijn

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/45095

Full metadata record

DC Field	Value	Language
dc.contributor.author	PARCIAK, Marcel	-
dc.contributor.author	VANDEVOORT, Brecht	-
dc.contributor.author	NEVEN, Frank	-
dc.contributor.author	PEETERS, Liesbet	-
dc.contributor.author	VANSUMMEREN, Stijn	-
dc.date.accessioned	2025-01-16T11:20:13Z	-
dc.date.available	2025-01-16T11:20:13Z	-
dc.date.issued	2024	-
dc.date.submitted	2025-01-07T10:21:30Z	-
dc.identifier.citation	proceedings of Workshops at the 50th International Conference on Very Large Data Bases, VLDB 2024, VLDB Endowment,	-
dc.identifier.issn	2150-8097	-
dc.identifier.uri	http://hdl.handle.net/1942/45095	-
dc.description.abstract	Large Language Models (LLMs) have shown useful applications in a variety of tasks, including data wrangling. In this paper, we investigate the use of an off-the-shelf LLM for schema matching. Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions. Using a newly created benchmark from the health domain, we propose different so-called task scopes. These are methods for prompting the LLM to do schema matching, which vary in the amount of context information contained in the prompt. Using these task scopes we compare LLM-based schema matching against a string similarity baseline, investigating matching quality, verification effort, decisiveness, and complementarity of the approaches. We find that matching quality suffers from a lack of context information, but also from providing too much context information. In general, using newer LLM versions increases decisiveness. We identify task scopes that have acceptable verification effort and succeed in identifying a significant number of true semantic matches. Our study shows that LLMs have potential in bootstrapping the schema matching process and are able to assist data engineers in speeding up this task solely based on schema element names and descriptions without the need for data instances.	-
dc.description.sponsorship	S. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under Grant No. BOF20ZAP02. This research received funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme. This work was supported by Research Foundation—Flanders(FWO)forELIXIRBelgium(I002819N).Theresources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research FoundationFlanders (FWO) and the Flemish Government.	-
dc.language.iso	en	-
dc.publisher	VLDB Endowment	-
dc.rights	This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. Proceedings of the VLDB Endowment. ISSN 2150-8097.	-
dc.subject	Computer Science - Databases	-
dc.subject	Computer Science - Databases	-
dc.subject	Computer Science - Artificial Intelligence	-
dc.subject.other	Computer Science - Databases	-
dc.subject.other	Computer Science - Artificial Intelligence	-
dc.title	Schema Matching with Large Language Models: an Experimental Study	-
dc.type	Proceedings Paper	-
local.bibliographicCitation.conferencedate	2024, August 26-30	-
local.bibliographicCitation.conferencename	International Conference on Very Large Data Bases, VLDB 2024	-
local.bibliographicCitation.conferenceplace	Guangzhou, China	-
local.format.pages	10	-
local.bibliographicCitation.jcat	C1	-
local.type.refereed	Refereed	-
local.type.specified	Proceedings Paper	-
dc.identifier.arxiv	2407.11852	-
dc.identifier.url	https://vldb.org/workshops/2024/proceedings/TaDA/TaDA.8.pdf	-
local.provider.type	ArXiv	-
local.bibliographicCitation.btitle	proceedings of Workshops at the 50th International Conference on Very Large Data Bases, VLDB 2024	-
local.uhasselt.international	no	-
item.fulltext	With Fulltext	-
item.accessRights	Restricted Access	-
item.validation	vabb 2026	-
item.fullcitation	PARCIAK, Marcel; VANDEVOORT, Brecht; NEVEN, Frank; PEETERS, Liesbet & VANSUMMEREN, Stijn (2024) Schema Matching with Large Language Models: an Experimental Study. In: proceedings of Workshops at the 50th International Conference on Very Large Data Bases, VLDB 2024, VLDB Endowment,.	-
item.contributor	PARCIAK, Marcel	-
item.contributor	VANDEVOORT, Brecht	-
item.contributor	NEVEN, Frank	-
item.contributor	PEETERS, Liesbet	-
item.contributor	VANSUMMEREN, Stijn	-
Appears in Collections:	Research publications

Files in This Item:

File	Description	Size	Format
TaDA.8.pdf Restricted Access	Published version	1.2 MB	Adobe PDF	View/Open Request a copy

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM