Semi-Parametric Mixture Models for Censored Data, with Applications in the Field of Antimicrobial Resistance

JASPERS, Stijn

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/22017

Title:	Semi-Parametric Mixture Models for Censored Data, with Applications in the Field of Antimicrobial Resistance
Authors:	JASPERS, Stijn
Advisors:	AERTS, Marc VERBEKE, Geert
Issue Date:	2016
Abstract:	Ever since the accidental discovery and isolation of penicillin by Sir Alexander Fleming, different antibiotics and antimicrobials have been discovered, thereby changing the entire direction of approaches to treating infectious diseases. Unfortunately, over the last decades, there has been a decrease in the number of antimicrobials that are effective in treating infections and antimicrobial resistance (AMR) has become one of the main public health concerns. Therefore, it is extremely important to study and monitor the emergence of isolates with a reduced susceptibility against antimicrobials. This monitoring is based on data that constitute minimum inhibitory concentration (MIC) values, which are most commonly collected using dilution experiments. A direct result of this kind of experiments is that data are censored, a data complexity that needs to be accounted for in the analysis. In this dissertation, we exploited the benefits of mixture models to estimate the MIC density of specific microorganisms that are tested for susceptibility against a specific antimicrobial. Indeed, mixture models are ideally suited to model unobserved population heterogeneity. In the AMR setting, the general population is divided into two main sub-population, which can be referred to as the wild-type and non-wildtype populations, respectively. The wild-type population, typically located on the left of the MIC distribution, is assumed to have no acquired or mutational resistance. It commonly shows a uni-modal distribution reflecting a slight biological variability around a mode which is not altered by changing circumstances over time. On the other hand, the non-wild-type component can be much more complex as it is commonly composed of different sub-groups of non-wild-type isolates that have acquired different resistance mechanism. Therefore, the general mixture model considered throughout the dissertation is composed of two main components, reflecting the wild-type and non-wild-type populations. In Chapter 4, we adopted a local view and focused on the estimation of the wildtype component only. We developed the multinomial based method, which is encompassed in the likelihood framework. Different parametric assumptions can be made regarding the density of the wild-type component and the assumptions can be compared using the AIC criterion. As an alternative, we also presented a model averaged approach, which employs the Akaike weights to construct a single, averaged estimate based on all fitted models. In this way, the wild-type part of the MIC value density of interest can be quantified in case a representative sample of the desired antibiotic-bacterium combination is available. Once quantified, the estimated density can be employed to derive some specific characteristics of interest, such as the epidemiological cut-off (ECOFF) that distinguishes wild-type isolates from non-wild-type isolates. The developed model was compared to a non-linear least squares regression approach (Turnidge et al., 2006) through a simulation study and promising results were obtained. In Chapters 5, 6 and 7, several models are presented to estimate the entire MIC mixture density. While the wild-type component could be assumed to be of a fixed parametric form, less information is available on the non-wild-type component. Indeed, this second component is often multi-modal as it may be a mixture of distinct non-wild-type populations itself. In addition, the number of non-wild-type subpopulations, as well as their respective distributions are unknown a priori. Therefore, in order to impose as little constraints as possible when estimating the non-wild-type distribution, we considered several semi-parametric density estimation routines. The penalised mixture approach was discussed in Chapter 5. In this method, a finite but penalised mixture of Gaussian densities is used to estimate the unknown second component. The penalty is included with the aim of rendering a smooth estimate. In order to fit within the AMR setting, the approach was adjusted to cope with censored data and further extended to incorporate a parametric first component. As a result, we looked at two versions of the new semi-parametric mixture model. In the first model, we considered a fixed first component, with the values of its parameters equal to the estimates obtained in an initial stage using the multinomial based method. Next, in the second stage of the procedure, the mixing weights of the component densities are estimated, resulting into the final density estimate. On the other hand, the second model under consideration jointly estimates all parameters and was found to be slightly more favourable than the former method. The procedure presented in Chapter 6 employs a basis of a generous number of Gaussian component densities to approximate the unknown second component of the MIC density. Nevertheless, in contrast to the penalised mixture approach, no penalty is imposed on the corresponding mixing weights. Rather, smoothness is obtained by fixing the standard deviations of the basis functions to a common value, which can be determined after a grid search. Based on the introduced back-fitting algorithm, optimal estimates were obtained for the weights of the distinct component densities and for the parameters related to the parametric first component. A key role within this method is put aside for the vertex exchange method (Böhning, 1986). In Chapter 7, we followed the non-parametric density estimation routine introduced by Lambert and Eilers (2009). Their Bayesian composite link model with roughness penalties was extended to incorporate a parametric first component, thereby fitting into the framework for modelling AMR data. Based on the real-life data applications, a promising behaviour was observed when estimating the entire MIC density of interest with, in particular, focus on the parameters of the first component and the prevalence of wild-type isolates. In summary, we introduced four different semi-parametric mixture models to estimate the full MIC density. Once this estimate is obtained, model-based classification can be performed as an alternative to using the ECOFF for classifying isolates into one of the two main sub-populations. In Chapter 8, a simulation study was performed to compare the performance of the introduced estimators. Two simulation scenarios were considered. The first simulation mixture was composed of three lognormal components and could therefore be considered to be a special case of two of our proposed models. In contrast, the second simulation mixture cannot be considered to be a special case of any of the fitted models as it is composed of a gamma and two skewed t-densities. A similar performance behaviour could be observed in both of the scenarios. The back-fitting algorithm was found to perform slightly better than the Bayesian composite link model. Both procedures were to be preferred over the penalised mixture approach, which was observed to perform less well, especially in the region of overlap between wild-type and non-wild-type isolates. Based on the performed simulation studies and real-life data applications, we could conclude that the introduced methods provide a nice toolbox for modelling MIC data related to a single antibiotic-bacterium combination. We were able to estimate the univariate MIC density and to use this estimate to perform model-based classification and to derive the prevalence of wild-type isolates. Nevertheless, there still remain other topics of interest in the field of AMR. First of all, with the aim of monitoring the evolution of resistant isolates over time, it is of interest to make the mixing weights time-dependent. In addition, other covariates could be of interest, including the sampling country and the animal or food source the isolates were collected from. Another subject of major interest in the field of AMR is the analysis of specific co-resistance patterns, which arise in case a specific bacterial isolate is tested for susceptibility against a range of different antimicrobials. The latter requires a multivariate mixture model to incorporate all available information. In this respect, Chapter 9 builds upon the Bayesian estimation routine for estimating multivariate normal mixtures as introduced by Komárek (2009). The original method was extended to allow for the mixing weights to depend on categorical covariates in a saturated way. In spite of the fact that there is still room for improvement, we believe that this new approach is a very nice first step towards a monitoring tool to follow up on the evolution of isolates that might show resistance to one or more antimicrobials.
Document URI:	http://hdl.handle.net/1942/22017
Category:	T1
Type:	Theses and Dissertations
Appears in Collections:	PhD theses Research publications

Files in This Item:

File	Description	Size	Format
thesisGENERAL.pdf		35.57 MB	Adobe PDF	View/Open

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM