Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/35297
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | VALKENBORG, Dirk | |
dc.contributor.advisor | DYUBANKOVA, Natalia | |
dc.contributor.author | Van Eylen, Tim | |
dc.date.accessioned | 2021-09-13T13:06:28Z | - |
dc.date.available | 2021-09-13T13:06:28Z | - |
dc.date.issued | 2021 | |
dc.identifier.uri | http://hdl.handle.net/1942/35297 | - |
dc.description.abstract | MinHash Locality Sensitive Hashing (LSH) was used to find and remove near-duplicates from large chemical datasets to avoid data leakage during training and testing of AI models for forward prediction modelling. The MinHash LSH algorithm is a nearest-neighbour algorithm which provides query times in O(n) time complexity, while pairwise comparisons require O(n²) time complexity, making them intractable for large datasets. A recent attention neural network, Molecular Transformer, was tested on the combination of three large datasets with and without the removal of these near-duplicates and compared against literature. It was concluded that MinHash LSH provides an elegant approach to removing near-duplicates. Furthermore, the reported results of the Molecular Transformer where not generalizable to aggregated datasets, although the reduced accuracy of the model on a reduced dataset could be shown. | |
dc.format.mimetype | Application/pdf | |
dc.language | en | |
dc.publisher | tUL | |
dc.title | Applicability domain of chemical reaction modeling | |
dc.type | Theses and Dissertations | |
local.bibliographicCitation.jcat | T2 | |
dc.description.notes | Master of Statistics and Data Science-Biostatistics | |
local.type.specified | Master thesis | |
item.fulltext | With Fulltext | - |
item.contributor | Van Eylen, Tim | - |
item.fullcitation | Van Eylen, Tim (2021) Applicability domain of chemical reaction modeling. | - |
item.accessRights | Open Access | - |
Appears in Collections: | Master theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
aff78031-f687-49d5-a8bd-f6509442e578.pdf | 1.61 MB | Adobe PDF | View/Open |
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.