Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/43437
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | BYLOIS, Niels | - |
dc.contributor.author | NEVEN, Frank | - |
dc.contributor.author | VANSUMMEREN, Stijn | - |
dc.date.accessioned | 2024-07-26T06:38:33Z | - |
dc.date.available | 2024-07-26T06:38:33Z | - |
dc.date.issued | 2024 | - |
dc.date.submitted | 2024-07-26T06:05:54Z | - |
dc.identifier.citation | Information systems frontiers (Print), | - |
dc.identifier.uri | http://hdl.handle.net/1942/43437 | - |
dc.description.abstract | We introduce an advanced method for validating data quality, which is crucial for ensuring reliable analytics insights. Traditional data quality validation relies on data unit tests, which use global metrics to determine if data quality falls within expected ranges. Unfortunately, these existing approaches suffer from two limitations. Firstly, they offer only coarse-grained assessments, missing fine-grained errors. Secondly, they fail to pinpoint the specific data causing test failures. To address these issues, we propose a novel approach using conditional metrics, enabling more detailed analysis than global metrics. Our method involves two stages: unit test discovery and monitoring/error identification. In the discovery phase, we derive conditional metric-based unit tests from historical data, focusing on stability to select appropriate metrics. The monitoring phase involves using these tests for new data batches, with conditional metrics helping us identify potential errors. We validate the effectiveness of this approach using two datasets and seven synthetic error scenarios, showing significant improvements over global metrics and promising results in fine-grained error detection for data ingestion validation. | - |
dc.description.sponsorship | Funding S. Vansummeren was supported by the Bijzonder Onderzoeksfonds (BOF) of Hasselt University under Grant No. BOF20ZAP02. This work is partially funded by the Research Foundation - Flanders (FWOgrant G055219N). The resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government. Acknowledgements We thank Kris Luyten for many helpful discussions on the material presented in this paper and Brecht Vandevoort for comments on a previous version of this paper. | - |
dc.language.iso | en | - |
dc.publisher | SPRINGER | - |
dc.rights | The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024 | - |
dc.subject.other | Data monitoring | - |
dc.subject.other | Data profiling | - |
dc.subject.other | Dynamic data | - |
dc.subject.other | Data unit tests | - |
dc.title | Data Ingestion Validation Through Stable Conditional Metrics with Ranking and Filtering | - |
dc.type | Journal Contribution | - |
local.format.pages | 23 | - |
local.bibliographicCitation.jcat | A1 | - |
dc.description.notes | Bylois, N (corresponding author), Hasselt Univ, Data Sci Inst, Agoralaan Bldg D, B-3590 Diepenbeek, Belgium. | - |
dc.description.notes | niels.bylois@uhasselt.be | - |
local.publisher.place | VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS | - |
local.type.refereed | Refereed | - |
local.type.specified | Article | - |
local.bibliographicCitation.status | Early view | - |
dc.identifier.isi | 001262877700001 | - |
local.provider.type | wosris | - |
local.description.affiliation | [Bylois, Niels; Neven, Frank; Vansummeren, Stijn] Hasselt Univ, Data Sci Inst, Agoralaan Bldg D, B-3590 Diepenbeek, Belgium. | - |
local.uhasselt.international | no | - |
item.fulltext | With Fulltext | - |
item.accessRights | Restricted Access | - |
item.fullcitation | BYLOIS, Niels; NEVEN, Frank & VANSUMMEREN, Stijn (2024) Data Ingestion Validation Through Stable Conditional Metrics with Ranking and Filtering. In: Information systems frontiers (Print),. | - |
item.contributor | BYLOIS, Niels | - |
item.contributor | NEVEN, Frank | - |
item.contributor | VANSUMMEREN, Stijn | - |
crisitem.journal.issn | 1387-3326 | - |
crisitem.journal.eissn | 1572-9419 | - |
Appears in Collections: | Research publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
s10796-024-10504-y.pdf Restricted Access | Early view | 2.59 MB | Adobe PDF | View/Open Request a copy |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.