Semiparametric and adaptive statistical methods for microbiome data analysis: towards increased reproducibility

KODALCI, Leyla

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/45325

Title:	Semiparametric and adaptive statistical methods for microbiome data analysis: towards increased reproducibility
Authors:	KODALCI, Leyla
Advisors:	Thas, Olivier
Issue Date:	2024
Abstract:	Differential abundance (DA) analysis is an essential tool in microbiome studies that enables the identification of microbial taxa which are linked to certain conditions or diseases. However, DA analysis is posed with significant challenges of reproducibility due to the inherent features of amplicon-sequenced microbiome data, such as compositionality, sparsity, overdispersion and high dimensionality. Keeping these challenges into account is critical for ensuring reliable and unbiased findings. This dissertation presents two novel methods for DA analysis, each developed with its own perspective on addressing the particular challenges arising from the complex nature of microbiome data. The first is a semiparametric DA method that uses simple sign transformations in combination with established statistical models to test for differential abundance. This approach has the major advantage that the sign methods inherit the flexibility of these statistical models, meaning that they can adjust for covariates and confounders, without relying on strong distributional assumptions. In Chapter 2, we have shown that this approach controls the false discovery rates (FDR) at a fixed nominal level, while maintaining competitive sensitivity, and is robust under several conditions. The second method, ADAM (Adaptive Differential Abundance Method), adaptively selects the most appropriate DA method from a pre-defined set of DA methods in a data-driven way. By adjusting to the data at hand with its unique combination of characteristics, ADAM can contribute to more reproducible DA analysis. In Chapter 3, we have demonstrated that this approach controls the FDR at a fixed nominal level while maintaining competitive sensitivity across a range of scenarios. Following the development of these DA methods, we explore the evaluation and bench marking of DA methods. The diversity that exists among available DA methods, partly driven by the complex nature of microbiome data, leads to considerable heterogeneity in their evaluation and benchmarking, which negatively contributes to the reproducibility crisis in microbiome research. In Chapter 4, we present ’Neutralise’, an open-science community-driven initiative for neutral comparisons of two-sample tests. By using the very simple two-sample problem, this chapter aims to provide a proof of concept of such an initiative by focusing on the framework’s design and architecture while avoiding the added complexities associated with microbiome data. Building on this, in Chapter 5, we initiate a call to develop such a comprehensive open-science initiative for neutral comparisons of DA methods in microbiome research and address the specific challenges that come into play when extending Neutralise to microbiome data and benchmarking of DA methods. By developing statistical methodology and establishing robust benchmarking practices, this research makes an effort for more reproducible data analysis in microbiome studies.
Document URI:	http://hdl.handle.net/1942/45325
Category:	T1
Type:	Theses and Dissertations
Appears in Collections:	Research publications

Files in This Item:

File	Description	Size	Format
PhD_Thesis_FINAL_LK.pdf Until 2029-11-30	Published version	20.47 MB	Adobe PDF	View/Open Request a copy

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM