Statistical Methods for Microarray-based Analysis of Gene-expression, Classification and Biomarker Validation

VAN SANDEN, Suzy

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/8832

Title:	Statistical Methods for Microarray-based Analysis of Gene-expression, Classification and Biomarker Validation
Authors:	VAN SANDEN, Suzy
Advisors:	BURZYKOWSKI, Tomasz SHKEDY, Ziv
Issue Date:	2008
Publisher:	UHasselt Diepenbeek
Abstract:	To understand the complications involved in the analysis of microarray data, it is necessary to have a sufficient understanding of the different stages of a typical microarray experiment. Therefore, a brief introduction to microarray technology is offered in Chapter 2. Throughout the dissertation, a number of key experiments are used for demonstration purposes, or as a starting point to setup simulations. A full description of the case studies is given in Chapter 3. For cDNA microarrays, normalization procedures are necessary to make the signals from different channels and arrays comparable. One objective, which is the focus of the first part of the dissertation, is to remove curvature seen on plots of the log ratio versus the mean log intensity values of two channels. A selection of methods proposed for this purpose is described in Chapters 4. Some of them are based on the assumption of a shift between the measurements of the two channels. In Chapter 5, we explore the use of background measurements to estimate that shift and to correct for it. We compare our proposal to some well-known methods by applying them to a number of microarray studies. The second part of the dissertation focusses on gene selection and classification methods. Dudoit et al. (2002), Lee et al. (2005), and Statnikov et al. (2005) investigated the performance of several classification methods applied to real-life microarray data. Due to the limited availability of datasets, only a number of settings could be evaluated. Also, the true classification and the set of truly differentially expressed genes were unknown. In order to overcome these limitations, we conduct a simulation study, using a linear mixed effects model to simulate cDNA microarray data under different scenarios. Chapter 6 describes the simulation model, as well as the different gene selection and classification methods considered in the study. In Chapter 7, we compare several classification methods with respect to their ability to discriminate between two classes of biological samples in various experimental settings. For the selection of genes, on which classification is based, one particular method is applied. Gene selection is, however, an important aspect of classification. We therefore extend the study in Chapter 8 by considering several gene selection methods. Furthermore, the stability of the methods with respect to distributional assumptions is examined by considering data simulated from a symmetric and asymmetric Laplace distribution, in addition to normally distributed microarray data. In the third part of the dissertation, we discuss the use of ANOVA models to analyze microarray data, as proposed by Kerr et al. (2000). They are often applied to the data under the assumption of normally distributed error terms. In many cases, this assumption may be problematic. Purdom and Holmes (2005) have investigated the distribution of gene-expression measurements observed in several real-life microarray experiments. They have concluded that the distribution can often be better approximated by a Laplace distribution than by a normal one. In Chapter 10, we consider the analysis of microarray data by using ANOVA models under the assumption of Laplace-distributed error terms. We explain the methodology and investigate problems related to fitting this type of models. We apply the models to several microarray experiments conducted on mice. In addition, in Chapter 11, we conduct a simulation study to investigate the different aspects of the models in more detail. Recently, microarray data have also been considered as a means to select genes that may be capable of serving as a biomarker for a primary response variable. Within this framework, one wants to assess the effect of a treatment on the response of interest by using information about the expression levels of a group of genes. The fourth and final part of the dissertation is devoted to the discovery of biomarkers. In Chapter 12, we give an introduction to genomic biomarkers. We present the joint model for the gene-expression and the response variable proposed by Shkedy et al. (2008). The model allows to detect differentially expressed genes and to evaluate them as biomarkers. The response of primary interest in the study discussed by Shkedy et al. (2008) is a continuous variable. In Chapter 13, we propose techniques for biomarker detection and evaluation for categorical response data. Two approaches are considered, amongst which an extension of the joint model, used by Shkedy et al. (2008), to the binary setting. Finally, in Chapter 14, some concluding remarks, regarding the different topics covered, are offered.
Document URI:	http://hdl.handle.net/1942/8832
Category:	T1
Type:	Theses and Dissertations
Appears in Collections:	PhD theses Research publications

Files in This Item:

File	Description	Size	Format
Suzy Van Sanden.pdf		15.32 MB	Adobe PDF	View/Open

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM