Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/41467
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.date.accessioned | 2023-10-04T11:27:42Z | - |
dc.date.available | 2023-10-04T11:27:42Z | - |
dc.date.issued | 2023 | - |
dc.date.submitted | 2023-10-04T11:26:23Z | - |
dc.identifier.citation | Zenodo. 10.5281/zenodo.8098909 https://zenodo.org/record/8098909 | - |
dc.identifier.uri | http://hdl.handle.net/1942/41467 | - |
dc.description.abstract | Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport. The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs. The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small. Dataset References adult.csv: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. claims.csv: TSA Claims Data 2002 to 2006, published by the U.S. Department of Homeland Security. dblp10k.csv: Frequency-aware Similarity Measures. Lange, Dustin; Naumann, Felix (2011). 243–248. Made available as DBLP Dataset 2. hospital.csv: Hospital dataset used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper. t_biocase_... files: t_bioc_... files used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper. tax.csv: Tax dataset used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper. | - |
dc.language.iso | en | - |
dc.publisher | Zenodo | - |
dc.subject.classification | Data models | - |
dc.subject.other | data science | - |
dc.subject.other | computer science | - |
dc.subject.other | databases | - |
dc.subject.other | approximate functional dependencies | - |
dc.subject.other | data management | - |
dc.subject.other | relational data | - |
dc.subject.other | dataset | - |
dc.subject.other | benchmark | - |
dc.title | Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery | - |
dc.type | Dataset | - |
local.bibliographicCitation.jcat | DS | - |
dc.description.version | 1.0 | - |
dc.rights.license | Creative Commons Attribution 4.0 International (CC-BY-4.0) | - |
dc.identifier.doi | 10.5281/zenodo.8098909 | - |
dc.identifier.url | https://zenodo.org/record/8098909 | - |
dc.description.other | You can cite all versions by using the DOI 10.5281/zenodo.8098908. This DOI represents all versions, and will always resolve to the latest one. | - |
local.provider.type | datacite | - |
local.uhasselt.international | no | - |
local.contributor.datacreator | PARCIAK, Marcel | - |
local.contributor.datacreator | VANSUMMEREN, Stijn | - |
local.contributor.datacreator | WEYTJENS, Sebastiaan | - |
local.contributor.datacreator | PEETERS, Liesbet | - |
local.contributor.datacreator | NEVEN, Frank | - |
local.contributor.datacreator | HENS, Niel | - |
local.contributor.datacurator | PARCIAK, Marcel | - |
local.contributor.rightsholder | PARCIAK, Marcel | - |
local.format.extent | 4.0 MB; 17.4 Mb; 5.0 Mb; 270 kB; 6 kB; 30.6 MB; 79.1 kB; 14.1 MB; 21.2 MB; 24.9 MB; 30.5 Mb; 29.8 Mb; 73.0 MB | - |
local.format.mimetype | Comma-separated values (CSV) | - |
local.contributororcid.datacreator | 0000-0002-6950-929X | - |
local.contributororcid.datacreator | 0000-0001-7793-9049 | - |
local.contributororcid.datacreator | 0000-0001-5892-508X | - |
local.contributororcid.datacreator | 0000-0002-6066-3899 | - |
local.contributororcid.datacreator | 0000-0002-7143-1903 | - |
local.contributororcid.datacreator | 0000-0003-1881-0637 | - |
local.contributororcid.datacurator | 0000-0002-6950-929X | - |
local.contributororcid.rightsholder | 0000-0002-6950-929X | - |
local.contributingorg.datacreator | Hasselt University | - |
local.contributingorg.datacurator | Hasselt University | - |
local.contributingorg.rightsholder | Hasselt University | - |
dc.rights.access | Open Access | - |
item.accessRights | Closed Access | - |
item.fullcitation | PARCIAK, Marcel; VANSUMMEREN, Stijn; WEYTJENS, Sebastiaan; PEETERS, Liesbet; NEVEN, Frank & HENS, Niel (2023) Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery. Zenodo. 10.5281/zenodo.8098909 https://zenodo.org/record/8098909. | - |
item.fulltext | No Fulltext | - |
item.contributor | PARCIAK, Marcel | - |
item.contributor | VANSUMMEREN, Stijn | - |
item.contributor | WEYTJENS, Sebastiaan | - |
item.contributor | PEETERS, Liesbet | - |
item.contributor | NEVEN, Frank | - |
item.contributor | HENS, Niel | - |
crisitem.discipline.code | 01020501 | - |
crisitem.discipline.name | Data models | - |
crisitem.discipline.path | Natural sciences > Information and computing sciences > Information systems > Data models | - |
crisitem.discipline.pathandcode | Natural sciences > Information and computing sciences > Information systems > Data models (01020501) | - |
crisitem.license.code | CC-BY-4.0 | - |
crisitem.license.name | Creative Commons Attribution 4.0 International (CC-BY-4.0) | - |
Appears in Collections: | Datasets |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.