Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/16866
Full metadata record
DC FieldValueLanguage
dc.contributor.authorVan der Borght, Koen-
dc.contributor.authorVERBEKE, Geert-
dc.contributor.authorvan Vlijmen, Herman-
dc.date.accessioned2014-06-02T13:55:59Z-
dc.date.available2014-06-02T13:55:59Z-
dc.date.issued2014-
dc.identifier.citationBMC BIOINFORMATICS, 15-
dc.identifier.issn1471-2105-
dc.identifier.urihttp://hdl.handle.net/1942/16866-
dc.description.abstractBackground: Different high-dimensional regression methodologies exist for the selection of variables to predict a continuous variable. To improve the variable selection in case clustered observations are present in the training data, an extension towards mixed-effects modeling (MM) is requested, but may not always be straightforward to implement. In this article, we developed such a MM extension (GA-MM-MMI) for the automated variable selection by a linear regression based genetic algorithm (GA) using multi-model inference (MMI). We exemplify our approach by training a linear regression model for prediction of resistance to the integrase inhibitor Raltegravir (RAL) on a genotype-phenotype database, with many integrase mutations as candidate covariates. The genotype-phenotype pairs in this database were derived from a limited number of subjects, with presence of multiple data points from the same subject, and with an intra-class correlation of 0.92. Results: In generation of the RAL model, we took computational efficiency into account by optimizing the GA parameters one by one, and by using tournament selection. To derive the main GA parameters we used 3 times 5-fold cross-validation. The number of integrase mutations to be used as covariates in the mixed effects models was 25 (chrom.size). A GA solution was found when R2 MM > 0.95 (goal.fitness). We tested three different MMI approaches to combine the results of 100 GA solutions into one GA-MM-MMI model. When evaluating the GA-MM-MMI performance on two unseen data sets, a more parsimonious and interpretable model was found (GA-MM-MMI TOP18: mixed-effects model containing the 18 most prevalent mutations in the GA solutions, refitted on the training data) with better predictive accuracy (R2) in comparison to GA-ordinary least squares (GA-OLS) and Least Absolute Shrinkage and Selection Operator (LASSO). Conclusions: We have demonstrated improved performance when using GA-MM-MMI for selection of mutations on a genotype-phenotype data set. As we largely automated setting the GA parameters, the method should be applicable on similar datasets with clustered observations.-
dc.description.sponsorshipThe authors would like to thank the anonymous reviewers for their constructive comments to improve the manuscript. Financial support from the IAP research network # P7/06 of the Belgian Government (Belgian Science Policy) is gratefully acknowledged.-
dc.language.isoen-
dc.rights© 2014 Van der Borght et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.-
dc.subject.othervariable selection; linear regression; genetic algorithm; mixed-effects model; multi-model inference-
dc.titleMulti-model inference using mixed effects from a linear regression based genetic algorithm-
dc.typeJournal Contribution-
dc.identifier.volume15-
local.format.pages11-
local.bibliographicCitation.jcatA1-
dc.description.notesVan der Borght, K (reprint author), Janssen Infect Dis Diagnost BVBA, B-2340 Beerse, Belgium. kvdborgh@its.jnj.com-
local.type.refereedRefereed-
local.type.specifiedArticle-
dc.identifier.doi10.1186/1471-2105-15-88-
dc.identifier.isi000334549600002-
item.fulltextWith Fulltext-
item.accessRightsOpen Access-
item.validationecoom 2015-
item.fullcitationVan der Borght, Koen; VERBEKE, Geert & van Vlijmen, Herman (2014) Multi-model inference using mixed effects from a linear regression based genetic algorithm. In: BMC BIOINFORMATICS, 15.-
item.contributorVan der Borght, Koen-
item.contributorVERBEKE, Geert-
item.contributorvan Vlijmen, Herman-
crisitem.journal.issn1471-2105-
crisitem.journal.eissn1471-2105-
Appears in Collections:Research publications
Files in This Item:
File Description SizeFormat 
van der borght 1.pdfartikel722.12 kBAdobe PDFView/Open
Show simple item record

SCOPUSTM   
Citations

2
checked on Sep 2, 2020

WEB OF SCIENCETM
Citations

2
checked on Jun 29, 2022

Page view(s)

64
checked on Jul 2, 2022

Download(s)

102
checked on Jul 2, 2022

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.