Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/35321
Title: | Approximate functional dependencies: a comparison of measures and a relevance focused tool for discovery | Authors: | WEYTJENS, Sebastiaan | Advisors: | NEVEN, Frank | Issue Date: | 2021 | Publisher: | tUL | Abstract: | Many companies nowadays make use of data to optimize their processes. However, the collected data can contain various inconsistencies due to typing errors, for example. This forces the company to clean the data before deducing insights. One possible solution to discover erroneous information is finding columns that determine other columns, also called Functional Dependencies (FDs). For example, two people that live in the same city have to live in the same country. However, as FDs do not allow errors, we have to find a method to find dependencies that approximately hold in the relation, referred to as Approximate Functional Dependencies (AFDs). This thesis aims to design a relevance-focused tool for domain experts to discover AFDs. We review the existing measures to determine the degree of approximation of an AFD by testing them on various theoretical examples. Based on the findings of these tests, we decide on a combination of measures that focuses on discovering relevant AFDs. Then, we integrate those measures and other AFD metadata into c-metric, a score representing the confidence in a particular AFD. Our extensive experimental evaluation of the c-metric shows that the metric is significantly more suitable for relevant AFD discovery than the existing approximation measures. Finally, to assist domain experts in discovering relevant AFDs, we implement a tool that visualizes our c-metric and other AFD metadata, such as probability distributions. | Notes: | master in de informatica | Document URI: | http://hdl.handle.net/1942/35321 | Category: | T2 | Type: | Theses and Dissertations |
Appears in Collections: | Master theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
845d31c7-7084-4c8b-a964-d10bad246e52.pdf | 2.4 MB | Adobe PDF | View/Open |
Page view(s)
76
checked on Sep 7, 2022
Download(s)
136
checked on Sep 7, 2022
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.