Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/7912
Full metadata record
DC FieldValueLanguage
dc.contributor.authorFan, Wenfei-
dc.contributor.authorGEERTS, Floris-
dc.contributor.authorJia, Xibei-
dc.date.accessioned2008-02-26T08:40:12Z-
dc.date.available2008-02-26T08:40:12Z-
dc.date.issued2007-
dc.identifier.citationProceedings of the 33rd International Conference on Very Large Databases (VLDB). p. 315-326.-
dc.identifier.isbn78-1-59593-649-3-
dc.identifier.urihttp://hdl.handle.net/1942/7912-
dc.description.abstractTwo central criteria for data quality are consistency and accuracy. Inconsistencies and errors in a database often emerge as violations of integrity constraints. Given a dirty database D, one needs automated methods to make it consistent, i.e., find a repair D0 that satisfies the constraints and “minimally” differs from D. Equally important is to ensure that the automatically-generated repair D0 is accurate, or makes sense, i.e., D0 differs from the “correct” data within a predefined bound. This paper studies effective methods for improving both data consistency and accuracy. We employ a class of conditional functional dependencies (CFDs) to specify the consistency of the data, which are able to capture inconsistencies and errors beyond what their traditional counterparts can catch. To improve the consistency of the data, we propose two algorithms: one for automatically computing a repair D0 that satisfies a given set of CFDs, and the other for incrementally finding a repair in response to updates to a clean database. We show that both problems are intractable. Although our algorithms are necessarily heuristic, we experimentally verify that the methods are effective and efficient. Moreover, we develop a statistical method that guarantees that the repairs found by the algorithms are accurate above a predefined rate without incurring excessive user interaction-
dc.language.isoen-
dc.publisherACM-
dc.titleImproving data quality: consistency and accuracy-
dc.typeProceedings Paper-
local.bibliographicCitation.conferencedate2007-
local.bibliographicCitation.conferencenameConference on Very Large Databases (VLDB)-
dc.bibliographicCitation.conferencenr33-
local.bibliographicCitation.conferenceplaceVienna, Austria-
dc.identifier.epage326-
dc.identifier.spage315-
local.bibliographicCitation.jcatC1-
local.type.specifiedProceedings Paper-
dc.bibliographicCitation.oldjcatC2-
dc.identifier.urlhttp://www.vldb.org/conf/2007/papers/research/p315-cong.pdf-
local.bibliographicCitation.btitleProceedings of the 33rd International Conference on Very Large Databases (VLDB)-
item.contributorFan, Wenfei-
item.contributorGEERTS, Floris-
item.contributorJia, Xibei-
item.fullcitationFan, Wenfei; GEERTS, Floris & Jia, Xibei (2007) Improving data quality: consistency and accuracy. In: Proceedings of the 33rd International Conference on Very Large Databases (VLDB). p. 315-326..-
item.fulltextWith Fulltext-
item.accessRightsOpen Access-
Appears in Collections:Research publications
Files in This Item:
File Description SizeFormat 
VLDB2007.pdfPublished version581.7 kBAdobe PDFView/Open
Show simple item record

Page view(s)

30
checked on Sep 7, 2022

Download(s)

4
checked on Sep 7, 2022

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.