Improving data quality: consistency and accuracy

Fan, Wenfei; GEERTS, Floris; Jia, Xibei

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/7912

Full metadata record

DC Field	Value	Language
dc.contributor.author	Fan, Wenfei	-
dc.contributor.author	GEERTS, Floris	-
dc.contributor.author	Jia, Xibei	-
dc.date.accessioned	2008-02-26T08:40:12Z	-
dc.date.available	2008-02-26T08:40:12Z	-
dc.date.issued	2007	-
dc.identifier.citation	Proceedings of the 33rd International Conference on Very Large Databases (VLDB). p. 315-326.	-
dc.identifier.isbn	78-1-59593-649-3	-
dc.identifier.uri	http://hdl.handle.net/1942/7912	-
dc.description.abstract	Two central criteria for data quality are consistency and accuracy. Inconsistencies and errors in a database often emerge as violations of integrity constraints. Given a dirty database D, one needs automated methods to make it consistent, i.e., find a repair D0 that satisfies the constraints and “minimally” differs from D. Equally important is to ensure that the automatically-generated repair D0 is accurate, or makes sense, i.e., D0 differs from the “correct” data within a predefined bound. This paper studies effective methods for improving both data consistency and accuracy. We employ a class of conditional functional dependencies (CFDs) to specify the consistency of the data, which are able to capture inconsistencies and errors beyond what their traditional counterparts can catch. To improve the consistency of the data, we propose two algorithms: one for automatically computing a repair D0 that satisfies a given set of CFDs, and the other for incrementally finding a repair in response to updates to a clean database. We show that both problems are intractable. Although our algorithms are necessarily heuristic, we experimentally verify that the methods are effective and efficient. Moreover, we develop a statistical method that guarantees that the repairs found by the algorithms are accurate above a predefined rate without incurring excessive user interaction	-
dc.language.iso	en	-
dc.publisher	ACM	-
dc.title	Improving data quality: consistency and accuracy	-
dc.type	Proceedings Paper	-
local.bibliographicCitation.conferencedate	2007	-
local.bibliographicCitation.conferencename	Conference on Very Large Databases (VLDB)	-
dc.bibliographicCitation.conferencenr	33	-
local.bibliographicCitation.conferenceplace	Vienna, Austria	-
dc.identifier.epage	326	-
dc.identifier.spage	315	-
local.bibliographicCitation.jcat	C1	-
local.type.specified	Proceedings Paper	-
dc.bibliographicCitation.oldjcat	C2	-
dc.identifier.url	http://www.vldb.org/conf/2007/papers/research/p315-cong.pdf	-
local.bibliographicCitation.btitle	Proceedings of the 33rd International Conference on Very Large Databases (VLDB)	-
item.contributor	Fan, Wenfei	-
item.contributor	GEERTS, Floris	-
item.contributor	Jia, Xibei	-
item.fullcitation	Fan, Wenfei; GEERTS, Floris & Jia, Xibei (2007) Improving data quality: consistency and accuracy. In: Proceedings of the 33rd International Conference on Very Large Databases (VLDB). p. 315-326..	-
item.fulltext	With Fulltext	-
item.accessRights	Open Access	-
Appears in Collections:	Research publications

Files in This Item:

File	Description	Size	Format
VLDB2007.pdf	Published version	581.7 kB	Adobe PDF	View/Open

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM