Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/39389
Title: Materials design through ensemble learning: When the average model knows best
Authors: VANPOUCKE, Danny E.P. 
Mehrkanoon, Siamak
Bernaerts, Katrien
Van Knippenberg, O. S. J.
Hermans, K
Issue Date: 2022
Publisher: 
Source: Belgian Physical Society, Tabloo, Dessel, 18 May 2022
Abstract: Machine Learning plays an ever more important role in modern materials-design and-discovery presenting a steady flow of new discoveries. Unfortunately, these achievements are generally rooted in large data sets. Although such big data sets are becoming more common place, they are generally not representative for the day-today work performed by materials researchers, where large numbers of samples are often unfeasible due to production-cost or-time, or availability of raw materials. In this work, we investigate the impact of very small data sets (<25 samples) on model quality and show how even for these data sets high quality models can be constructed. Machine Learning in small data sets Due to the success of Machine Learning within the context of large data sets, there is a natural interest to apply these methods in the context of small data sets and also reap their rewards here. The use of artificial intelligence and Machine Learning is these cases is generally aimed at improved design of experiments for materials optimisation, often in combination with robotic automation. Within this context the active learning approach comes naturally,[1] as it starts from a small data (sub)set, which is incrementally increased through the addition of the most useful data points in the master data set. Within the context of design of experiments, this would be newly created samples. Alternately, several authors have focussed on (small) deep neural networks in combination with small data sets (50 to several 100 samples), showing reasonable quality models.[2] These examples show that, even in the context of small data sets, Machine Learning can be successful for materials Figure 1: Modelling small data sets. (a) schematic representation of the problem. (b) and (c) heatmaps of ensembles of 1000 model instances for a linear and non-linear data set of 20 data points.[3]
Other: Abstract to BPS 2022 conference; Oral presentation.
Keywords: polyester;poly(ethylene imine);structure-property relationships;machine learning
Document URI: http://hdl.handle.net/1942/39389
Category: C2
Type: Conference Material
Appears in Collections:Research publications

Files in This Item:
File Description SizeFormat 
BPS2022_DannyEPVanpoucke.pdf
  Restricted Access
Conference material168.8 kBAdobe PDFView/Open    Request a copy
Show full item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.