Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/43437
Full metadata record
DC FieldValueLanguage
dc.contributor.authorBYLOIS, Niels-
dc.contributor.authorNEVEN, Frank-
dc.contributor.authorVANSUMMEREN, Stijn-
dc.date.accessioned2024-07-26T06:38:33Z-
dc.date.available2024-07-26T06:38:33Z-
dc.date.issued2024-
dc.date.submitted2024-07-26T06:05:54Z-
dc.identifier.citationInformation systems frontiers (Print),-
dc.identifier.urihttp://hdl.handle.net/1942/43437-
dc.description.abstractWe introduce an advanced method for validating data quality, which is crucial for ensuring reliable analytics insights. Traditional data quality validation relies on data unit tests, which use global metrics to determine if data quality falls within expected ranges. Unfortunately, these existing approaches suffer from two limitations. Firstly, they offer only coarse-grained assessments, missing fine-grained errors. Secondly, they fail to pinpoint the specific data causing test failures. To address these issues, we propose a novel approach using conditional metrics, enabling more detailed analysis than global metrics. Our method involves two stages: unit test discovery and monitoring/error identification. In the discovery phase, we derive conditional metric-based unit tests from historical data, focusing on stability to select appropriate metrics. The monitoring phase involves using these tests for new data batches, with conditional metrics helping us identify potential errors. We validate the effectiveness of this approach using two datasets and seven synthetic error scenarios, showing significant improvements over global metrics and promising results in fine-grained error detection for data ingestion validation.-
dc.description.sponsorshipFunding S. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under Grant No. BOF20ZAP02. This work is partially funded by the Research Foundation - Flanders (FWOgrant G055219N). The resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government. Acknowledgements We thank Kris Luyten for many helpful discussions on the material presented in this paper and Brecht Vandevoort for comments on a previous version of this paper.-
dc.language.isoen-
dc.publisherSPRINGER-
dc.rightsThe Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024-
dc.subject.otherData monitoring-
dc.subject.otherData profiling-
dc.subject.otherDynamic data-
dc.subject.otherData unit tests-
dc.titleData Ingestion Validation Through Stable Conditional Metrics with Ranking and Filtering-
dc.typeJournal Contribution-
local.format.pages23-
local.bibliographicCitation.jcatA1-
dc.description.notesBylois, N (corresponding author), Hasselt Univ, Data Sci Inst, Agoralaan Bldg D, B-3590 Diepenbeek, Belgium.-
dc.description.notesniels.bylois@uhasselt.be-
local.publisher.placeVAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS-
local.type.refereedRefereed-
local.type.specifiedArticle-
local.bibliographicCitation.statusEarly view-
dc.identifier.isi001262877700001-
local.provider.typewosris-
local.description.affiliation[Bylois, Niels; Neven, Frank; Vansummeren, Stijn] Hasselt Univ, Data Sci Inst, Agoralaan Bldg D, B-3590 Diepenbeek, Belgium.-
local.uhasselt.internationalno-
item.fulltextWith Fulltext-
item.accessRightsRestricted Access-
item.fullcitationBYLOIS, Niels; NEVEN, Frank & VANSUMMEREN, Stijn (2024) Data Ingestion Validation Through Stable Conditional Metrics with Ranking and Filtering. In: Information systems frontiers (Print),.-
item.contributorBYLOIS, Niels-
item.contributorNEVEN, Frank-
item.contributorVANSUMMEREN, Stijn-
crisitem.journal.issn1387-3326-
crisitem.journal.eissn1572-9419-
Appears in Collections:Research publications
Files in This Item:
File Description SizeFormat 
s10796-024-10504-y.pdf
  Restricted Access
Early view2.59 MBAdobe PDFView/Open    Request a copy
Show simple item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.