On the estimation and validation of biomarker-index’ accuracy

GARCIA BARRADO, Leandro

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/20559

Title:	On the estimation and validation of biomarker-index’ accuracy
Authors:	GARCIA BARRADO, Leandro
Advisors:	BURZYKOWSKI, Tomasz
Issue Date:	2015
Abstract:	Alzheimer’s Disease (AD) is an enormous burden on society and future perspectives foresee this burden only to increase. The need for a treatment for AD is growing but at the same time advances in AD research are hindered by issues related to the diagnosis of the disease. The currently used clinical diagnosis of AD is known to be imperfect while the perfect post-mortem diagnosis is expensive and useless from a diagnostic point of view. Therefore, the need for easily measurable biomarkers is high but many fail to show statistically adequate diagnostic accuracy. One of the reasons may be biased estimation of biomarker accuracy due to the use of the imperfect clinical diagnosis as a reference test without acknowledging this. The main goal of this dissertation was the development of methods facilitating the development of biomarker-based diagnostic tests for AD. The first research question focuses on how to efficiently estimate the accuracy of a diagnostic biomarker-index. Because of the lack of a gold-standard reference-test, currently available methods making use of the true disease labels would lead to biased accuracy estimates. Therefore, we propose the use of a Bayesian latent-class mixture model in Chapter 3. The model allows to include the information from an imperfect reference-test while accounting for its imperfectness. Care has to be taken with respect to the inclusion of prior information since a combination of uninformative priors may lead to an extremely informative prior for the parameter of interest. Therefore, an alternative parametrisation is proposed to allow the inclusion of prior information directly on the accuracy of the diagnostic-biomarker index. We show that, when appropriate priors are chosen, this model provides unbiased estimates of the diagnostic biomarker-index’ accuracy. Moreover, the results suggest that the reports indicating disappointing results of diagnostic performance of the AD CSF-biomarkers might by due in part to the fact that the clinical diagnosis was treated as a GS reference-test. The assumption that the considered biomarkers are independent of the reference test, conditionally on the true disease status, is untestable and only heuristically enforceable. Therefore, the proposed Bayesian latent-class model is extended in Chapter 4. By considering that the imperfect reference-test is a dichotomized version of an underlying continuous latent-tolerance variable, conditional dependence between the biomarkers and the reference test are modelled directly. Assuming that the continuous tolerance variable and the biomarkers are jointly normally distributed, their correlation can be estimated. Therefore, the estimated accuracy of the diagnostic biomarker-index is corrected for any possible conditional dependence between the biomarkers and reference test without the need for any untestable heuristic argumentation. In terms of the AD application, it is shown that, although statistically significant conditional dependence is observed, it has no significant impact on the accuracy estimate of the diagnostic biomarker-index. The focus of the second research question is on the validation of a developed diagnostic biomarker-index. Because of the need for large sample sizes or expensive data to reach adequate power of the validation study together with the lack of an effi- cient statistical framework, validation is rarely performed. In Chapter 5 we propose a Bayesian framework allowing efficient validation of a diagnostic biomarker-index. By making use of the exchangeability assumption of the parameters of the development and validation studies, accuracy information obtained in the development study can be included into the validation study. In particular, an approximation to the posterior distribution of the accuracy parameter from the development study, is carried over to the validation study. Validation is defined as an hypothesis test, testing whether a particular validation criterion value can be rejected. Before comparing the proposed analysis to a ’traditional’ analysis in which the development-study information is ignored, significance levels of the hypothesis test are adjusted to obtain comparable type-I error probabilities. We show that, although the information from the development study is discarded by doubling its standard deviation, a large reduction of the required sample size is possible. In particular, the considered settings shows a reduction to about 20% of the required sample size compared to a validation study ignoring the development-study accuracy information to reach a power of approximately 0.53. The development and validation of a diagnostic AD CSF-biomarker cut-off for a particular commercially available assay does not imply the applicability of the cut off on other assays, measuring the same biomarker. This would imply setting up time-consuming and expensive studies. Therefore, the third research question investigates the transfer of the cut-off value of an AD CSF-biomarker from a currently used assay to a new one, without having to conduct new development and validation studies. The validity and the effect of the currently applied linear-regression transfer-method on the clinical performance of the biomarker measured with a new assay, have never been investigated. In Chapter 6 we establish that if the underlying assumptions of the linear-regression-based transfer-method are violated the results are biased. This entails that the diagnostic biomarker has different operating characteristics depending on the assay on which it is measured. Therefore, we propose a novel two-stage Bayesian approach which leads to unbiased and more precise estimates than the linear-regression-based transfer-method. The approach first estimates the distributional characteristics of the diagnostic-biomarker on the current assay based on the results of a GS reference-test. Next, the posterior information is introduced in the second stage as prior information. In the second stage, the cut-off of the new-assay is estimated by considering data measured on both assays side-byside. Because of the introduction of the information on the current assay in the first stage, no GS information is required to end up with unbiased estimates. The proposed Bayesian approach provides more precise cut-off estimates than the linear regression-based transfer-method. Though, with the limited sample size of currently considered development and validation studies, only imprecise cut-off estimates are available. This means that the currently used cut-offs have large uncertainty in terms of operating characteristics, which is rarely acknowledged. De ziekte van Alzheimer heeft een enorme impact op onze huidige samenleving en voorspellingen menen dat deze impact enkel zal toenemen. De nood aan een doeltreffende behandeling voor Alzheimer neemt toe terwijl vooruitgang in het onderzoek naar de ziekte belemmerd wordt door moeilijkheden met de diagnose van de ziekte. Van de momenteel gehanteerde klinische diagnose van Alzheimer, weet men dat deze niet perfect is. De post-mortem diagnose is dan weer wel perfect, maar kostelijk en vanuit diagnostisch oogpunt onbruikbaar. Daarom is de nood aan eenvoudig te meten biomerkers groot. Vele biomerkers, echter, slagen er niet in om statistisch voldoende diagnostische nauwkeurigheid aan te tonen. Een van de redenen zou kunnen zijn dat men, door de foutieve aanname dat de klinische diagnose een perfecte referentietest zou zijn, tot een vertekende schatting van diagnostische nauwkeurigheid komt. Het hoofddoel van deze verhandeling was om methoden te ontwikkelen die de ontwikkeling van diagnostische tests voor Alzheimer op basis van biomerkers, zouden kunnen ondersteunen. De eerste onderzoeksvraag richt zich op het efficiënt schatten van de nauwkeurigheid van een diagnostische index gebaseerd op biomerkers. Bij gebrek aan een gouden standaard referentietest, leiden de huidige methoden, die gebruik maken van de ware onderliggende ziekte status, mogelijks tot vertekende schattingen van de nauwkeurigheid. Om die reden stellen wij een Bayesian latent-class mixture model voor in Hoofdstuk 3. Dit model laat toe om de informatie vervat in een niet-perfecte referentietest, toch op te nemen tijdens de schatting van diagnostische nauwkeurigheid, terwijl er rekening mee wordt gehouden dat deze niet perfect is. Men dient zorg te besteden aan de wijze waarop men prior informatie aan het model toevoegt. Een combinatie van niet-informatieve priors kan immers leiden tot een zeer informatieve prior voor de parameter waarin men ge¨ınteresseerd is. Om dit te vermijden, stellen wij een alternatieve parameterizatie voor, die het toelaat om de prior informatie rechtstreeks te veronderstellen op het niveau van de nauwkeurigheid van de diagnostische biomerker-index. We tonen aan dat, wanneer geschikte priors worden gekozen, het voorgestelde model een niet-vertekende schatting van de nauwkeurigheid van de diagnostische biomerker-index kan maken. De resultaten suggereren ook dat de voorgaande teleurstellende resultaten inzake de nauwkeurigheid van de Alzheimer biomerkers mogelijk te wijten zijn aan het foutief beschouwen van de klinische diagnose als een perfecte referentietest. De aanname dat de beschouwde biomerkers onafhankelijk zijn van de referentietest, gegeven de ware ziekte status, is niet testbaar en kan enkel via heuristische argumentatie aannemelijk gemaakt worden. Hierom, wordt het Bayesian latent-class model uitgebreid in Hoofdstuk 4. Door aan te nemen dat de niet-perfecte referentietest een dichotome versie is van een onderliggende continue latente tolerantievariabele, kan de conditionele afhankelijkheid tussen de biomerkers en de referentietest rechtstreeks gemodelleerd worden. Onder de veronderstelling dat de gezamenlijke distributie van de continue tolerantievariabele en de biomerkers een multi-variate normaal distributie is, kan de beschouwde correlatie geschat worden. Hierdoor is de geschatte nauwkeurigheid van de diagnostische biomerker-index gecorrigeerd voor een mogelijke conditionele afhankelijkheid tussen de biomerkers en de referentietest zonder zich te hoeven beroepen op heuristische argumentatie. Wat betreft een applicatie op data van Alzheimer patiënten, tonen we aan dat hoewel er sprake is van statistisch significante conditionele afhankelijkheid, dit geen effect heeft op de schatting van de nauwkeurigheid van de diagnostische biomerker-index. De tweede onderzoeksvraag betreft de validatie van een ontwikkelde diagnostische biomerker-index. Om tot een toereikende power van de validatiestudie te komen, zijn er momenteel zulke grote steekproefgroottes of kostelijke data nodig in combinatie met een gebrek aan efficiënte statistische modellen, dat er zelden tot validatie wordt overgegaan. In Hoofdstuk 5 stellen wij een Bayesiaans model voor dat toelaat om de nauwkeurigheid van een diagnostische biomerker-index efficiënt te valideren. Door gebruik te maken van de uitwisselbaarheidsassumptie, wordt het mogelijk om nauwkeurigheidsinformatie, vergaard in de ontwikkelingsstudie, te introduceren in de validatiestudie. In het bijzonder kan een benadering van de posterior distributie van de nauwkeurigheidsparameter, geschat in de ontwikkelingsstudie, overgedragen worden als prior informatie voor de validatiestudie. Validatie is gedefinieerd in de vorm van een hypothese-toets die nagaat of een bepaald validatie criterium al dan niet kan worden weerlegd. Bij het vergelijken van de voorgestelde analyse met de ’traditionele’ analyse, waarbij de nauwkeurigheidsinformatie van de ontwikkelingsstudie buiten beschouwing gelaten wordt, worden eerst de significantie niveaus van de hypothese-toets aangepast zodanig dat vergelijkbare kansen op een type-I fout worden bekomen. We tonen aan dat, hoewel de informatie van de ontwikkelingsstudie wordt gereduceerd door de standaarddeviatie van deze informatie te verdubbelen, een significante reductie mogelijk is van de benodigde steekproefgrootte. In het beschouwde voorbeeld, kan deze reductie tot ongeveer 20% van de ’traditioneel’ benodigde steekproefgrootte gaan om een zelfde power van ongeveer 0.53 te bekomen. De ontwikkeling en validatie van de drempelwaarde van een diagnostische Alzheimer biomerker voor een bepaald commercieel beschikbaar platform, impliceert niet automatisch de overdraagbaarheid van de drempelwaarde naar een ander platform, dat dezelfde biomerker meet. Dit betekent dat er nieuwe tijdrovende en kostelijke studies moeten worden opgezet. Om te vermijden dat er nieuwe ontwikkelings- en validatiestudies zouden moeten uitgevoerd worden, spitst de derde onderzoeksvraag zich toe op de overdraagbaarheid van biomerker drempelwaarden van een huidig toegepast platform naar een nieuw platform. De geldigheid en het effect van de huidige toegepaste lineaire-regressie overdrachtsmethode op de klinische nauwkeurigheid van de biomerker, gemeten op het nieuwe platform, zijn tot op heden nooit onderzocht. In Hoofdstuk 6 stellen we vast dat wanneer de onderliggende aannames van de huidige overdrachtsmethode geschonden zijn, vertekende resultaten bekomen worden. Deze vaststelling houdt in dat, afhankelijk van het platform waarop de biomerker gemeten wordt, diens klinische nauwkeurigheid varieert. Daarom stellen wij een nieuwe tweefase Bayesiaanse aanpak voor die tot onvertekende en preciezere resultaten leidt dan de huidige overdrachtsmethode. De voorgestelde methode schat eerst de distributionele kenmerken van de diagnostische biomerker, gemeten op het huidige platform, in combinatie met de resultaten van een gouden standaard referentietest. Vervolgens wordt de posterior informatie in de tweede fase aangebracht als prior informatie. In de tweede fase wordt de drempelwaarde van het nieuwe platform geschat door middel van data gemeten op beide platforms. Omdat de informatie omtrent het huidige platform via de eerste fase wordt binnengebracht, is er geen nood meer aan gouden standaard referentietest informatie om tot onvertekende schattingen te komen. Hoewel de voorgestelde Bayesiaanse methode tot preciezere schattingen leidt dan de huidige overdrachtsmethode, blijven de schattingen nog steeds zeer onzeker omwille van de beperkte steekproefgroottes van de ontwikkelings- en validatiestudies. Dit betekent dat er op dit ogenblik grote onzekerheid bestaat rond de gebruikte drempelwaarden in termen van klinische nauwkeurigheid, onzekerheid die zelden bekrachtigd wordt.
Document URI:	http://hdl.handle.net/1942/20559
Category:	T1
Type:	Theses and Dissertations
Appears in Collections:	PhD theses Research publications

Files in This Item:

File	Description	Size	Format
9407 D-2015-2451-54 Leandro García Barrado.pdf		2.68 MB	Adobe PDF	View/Open

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM