Can machine learning support survival model selection to inform economic evaluations? Exploring K-Fold cross validation based model selection in seven datasets

BERMEJO DELGADO, Inigo; Grimm, S.

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/45933

Title:	Can machine learning support survival model selection to inform economic evaluations? Exploring K-Fold cross validation based model selection in seven datasets
Authors:	BERMEJO DELGADO, Inigo Grimm, S.
Issue Date:	2024
Publisher:	ELSEVIER SCIENCE INC
Source:	Value in health, 27 (12) (Art N° MSR17)
Abstract:	from the intent-to-treat population. Estimated HRs and 95% CIs of ivosidenib versus placebo were calculated. Results: The previously published RPSFTM-adjusted results showed that ivosidenib was associated with mortality risk reduction (MRR) versus placebo (HR=0.49 [95% CI: 0.34; 0.70]). The external analysis reported here, showed that ivosidenib was associated with MRR, using the RPSFTM 'treatment group' (not re-censored HR=0.52 [95% CI: 0.37; 0.75]), and 'on-treatment' approaches (re-censored HR=0.49 [95% CI: 0.28; 0.87]; not re-censored HR=0.52 [95% CI: 0.36; 0.74]). The IPCW-adjusted Cox proportional hazards regression analysis also showed that ivosidenib was associated with MRR (HR=0.74 [95% CI: 0.35; 1.56]). Conclusions: All three crossover adjustment methods applied in this external re-analysis of ClarIDHy data showed that ivosidenib was associated with MRR, consistent with previously published RPSFTM-adjusted results. Objectives: The selection of survival models for informing economic evaluations of innovative therapies with limited long-term data traditionally relies on metrics of statistical goodness of fit in the full trial data. However, models selected based on full trial data might underperform in the target population due to overfitting. K-fold cross validation (CV), commonly used in machine learning, splits the data allowing better estimation of fit in unseen data. We explore whether k-fold CV improves model selection. Methods: We used seven publicly available long-term survival datasets covering a range of diseases. We simulated 100 artificial data locks by sampling 250 patients without replacement, and right-censoring once median survival was reached. We fitted standard parametric and flexible survival models to each simulated dataset and selected models with lowest AIC/BIC as estimated using 10-fold CV and traditional methods. We then estimated the restricted mean survival time (RMST) error of best-fitting models relative to the RMST calculated from the full dataset's Kaplan-Meier. Results: K-fold CV led to lower mean RMST errors compared to traditional model selection methods in six (all seven) datasets when selecting models based on AIC (and BIC). On average, the RMST error was 27% higher (when based on AIC) and 40% (BIC) higher using traditional model selection compared to CV-based model selection. CV never selected complex models (3+ parameters) whilst the traditional method resulted in complex models being selected in 51% (AIC) and 12% (BIC) of simulations. Conclusions: In the first study exploring k-fold CV for survival model selection, we show that it can regularly outperform traditional methods. Notably, k-fold CV favors less complex models compared to traditional methods, which may hint at their better generalizability. We conclude that k-fold CV may be an important addition to the modeler's toolbox when performing survival analysis. Further research should explore whether these findings hold in additional settings. Objectives: The purpose of the present study was to assess how the eight dimensions of the SF-36 HRQoL profile instrument impact the utility scores derived from the major multiattribute utility instruments (MAUIs). Methods: We employed the ordinary least squares (OLS) estimator to estimate models that analyze the relationship between SF-36 dimensions and various MAUIs using data from the multi-instrument comparison (MIC) study (Richardson et al., 2015). We focused on the sensitivity of six major MAUIs-AQoL-4D, AQoL-8D, 15D, EQ-5D, SF-6D, and HUI3-to changes in the eight SF-36 dimensions. Results: Our analysis show that the AQoL-8D demonstrates greater sensitivity to mental health (MH) compared to AQoL-4D, 15D, EQ-5D, and HUI3. The EQ-5D showed higher sensitivity to bodily pain (BP) than all other MAUIs. Additionally, the 15D was more sensitive to physical functioning (PF) compared to AQoL-4D and AQoL-8D. Finally, the SF-6D exhibited greater sensitivity to the role emotional (RE) dimension than 15D, AQoL-4D, and AQoL-8D. Conclusions: Our study highlights that HRQoL utility scores are affected differently by the eight dimensions measured by the SF-36 survey, depending on the MAUI used. These findings allow to deduce which dimensions of the SF-36 have the greatest influence on the utility scores generated by a specific MAUI. Thus, the selection of a MAUI for research may be informed by its sensitivity to the health dimensions of particular interest.
Document URI:	http://hdl.handle.net/1942/45933
ISSN:	1098-3015
e-ISSN:	1524-4733
ISI #:	001457486000270
Category:	M
Type:	Journal Contribution
Appears in Collections:	Research publications

Files in This Item:

File	Description	Size	Format
xx.pdf	Published version	739.96 kB	Adobe PDF	View/Open

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM