Mathematical and statistical models for estimating infectious disease parameters based on serological and social contact data

TRAN, Mai Phuong Thao

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/31930

Title:	Mathematical and statistical models for estimating infectious disease parameters based on serological and social contact data
Authors:	TRAN, Mai Phuong Thao
Advisors:	Hens, Niel Faes, Christel
Issue Date:	2020
Abstract:	This dissertation focuses on the analysis of serological data that results from serological surveys or vaccine trials. Serological surveys collected data on antibody titre to a specific antigen from a national serum bank or residual samples from laboratories. Upon pre-defined thresholds, one individual is categorized as seropositive (infected with a pathogen before), or seronegative (still susceptible to the infection). This type of data is also called current status data. Usually, antibody titres to several pathogens were tested at the same time, which provides us with multivariate current status data. While we work with the dichotomized version of antibody titres from serological surveys, antibody titres from different vaccine trials were analyzed on a continuous scale. These trials were conducted to evaluate the effectiveness of maternal vaccination against pertussis in infants. Usually, different types of antibodies were measured, leading to multivariate data. The work presented in this dissertation makes use of different methods and techniques and employs both maximum likelihood estimation and Bayesian analysis. The first part focused on modeling longitudinal data on antibody titres, more specifically, anti-PT and anti-PRN in pregnant women and infants. Nonlinear mixed effect models (NLMM) were employed in a Bayesian framework. A hypothesized dynamic model that reflects the evolution of antibody titres in subjects was the base for these analyses. Results from different trials (in Chapters 2, 3, and 4) show that pre-pregnancy vaccination and maternal vaccination helped to increase the antibody titres in the cord at birth in infants. However, it is worth noticing that the blunting effect was present, at least until the booster dose in infants at around month 15. Via a simulation study, we conclude that one needs to collect more than just five observations per infant so that the inference from the assumed model is reasonably justified. Depending on studies, the presence of a lower limit of detection (or quantification) can make the analysis more challenging. One can incorporate the idea of the Tobit regression into the NLMM framework to achieve “unbiased” estimates. Conventional practices, such as the complete case analysis or substitution method, are biased and should be used with care. While the first part makes use of antibody titre data on its own, part II employs both the antibody titre data and dichotomised serological survey data with the focus on methods to capture association and account for heterogeneity. To be more specific, in Chapter 5, we propose methods to measure association among censored antibody titre data. The proposed method makes use of a copula function to join two marginal distributions of the two variables. The copula function determines the dependence structure between the two variables completely. In the case of fitting a linear regression, we assume that the marginal distribution of the censored covariate is known. This distribution is used later within the maximum likelihood estimation framework. We show via a simulation study that the proposed method performed well under different settings, given that some underlying assumptions are satisfied. Moreover, it is advised that the use of conventional approaches should be avoided since these methods give biased estimates depending on the percentages of censoring present in the data. Similar to the approach in Chapter 5, in Chapter 6, a copula function is used to join the two marginal distributions of the two frailty terms. The copula function imposes the dependence structure between the two frailty terms; hence, it defines the bivariate survival function. Via this construction, our proposed copula-frailty model (known as the general frailty model) does not constrain the association parameter. We prove that the well-known additive gamma frailty model is a particular case of our general frailty model. In a simulation study, our copula-frailty model outperforms the well-recognized additive gamma frailty model when a negative association between two event times is present. Lastly, part III of the thesis devotes to analyze serological data resulted from serological surveys in a Bayesian framework. Additionally, social contact data that were obtained from social contact data surveys were incorporated via the Mass Action Principle to make inference about the age-dependent force of infection and proportion for airborne diseases. Chapter 7 presented the Bayesian analysis of serological data using different approaches, including the base, sequential, and joint analyses. In the base analysis, serological data were analyzed, and fitted contact rate matrices were used as an input in this analysis. In the sequential analysis, both contact rate matrices and the proportionality factor q were estimated at the same time, and the posterior distributions resulted from the analysis of social contact data alone were used as prior distributions for these parameters). Lastly, social contact data and serological data were pooled together into the joint analysis. Using the Bayesian framework, one can infer the uncertainties of all parameters of interest once the analysis is done. The sequential and joint analyses offer the possibility to combine variability from different sources of data into one analysis. Deze thesis richt zich op de analyse van serologische data die zijn verzameld aan de hand van serologische enquˆetes of klinische studies met betrekking tot vaccins (vaccinstudies). Serologische enquˆetes verzamelden gegevens over antistoffen die specifiek zijn voor een antigen. De antistoffen werden verzameld van een nationale serumbank of de residuen van laboratoria. Gebaseerd op een vooraf gedefinieerd grenswaarde, wordt een persoon gecategoriseerd als seropositief (reeds besmet (geweest) met de ziekteverwekker) of seronegatief (nog steeds vatbaar voor de ziekte). Dit soort van data wordt ook “current status data” genoemd. Vaak zijn antistoffen van verschillende ziekteverwekkers getest. Daarvoor zijn multivariate current status data beschikbaar. Terwijl we werken met de dichotome versie uit serologische enquˆetes, zijn antistoffen uit verschillende vaccinstudies geanalyseerd op een continue schaal. Deze studies werden uitgevoerd om de werkzaamheid van maternale vaccinatie tegen kinkhoest bij zuigelingen te evalueren. Meestal werden verschillende soorten antistoffen gemeten, wat leidde tot multivariate gegevens. Het werk dat in dit proefschrift wordt gepresenteerd, maakt gebruik van verschillende methoden en technieken en maakt gebruik van zowel maximale waarschijnlijkheidsschatting als Bayesiaanse analyse. Het eerste deel richt zich op het modelleren van longitudinale antistofgegevens. In het bijzonder zijn anti-PT en anti-PRN antistof in zwangere vrouwen en pasgeboren baby’s gemeten. Niet-linear gemengde modellen (NLMMs) zijn gebruikt in een Bayesiaans kader. Een hypothetisch dynamisch model dat de evolutie van antistof in een subject weerspiegelt was de basis voor deze analyses. Resultaten uit verschillende vaccinstudies (in hoofdstuken 2, 3, en 4) tonen aan dat pre-zwangerschap vaccinatie en zwangerschap vaccinatie hebben geholpen om antistoffen te verhogen in het snoer bij de geboorte in pasgeboren baby’s. Echter, de resultaten tonen ook aan dat een blunting effect aanwezig was, minstens tot op het moment van de boosterdosis. Via een simulatiestudie kunnen we concluderen dat het aantal bloedstalen groter dan vijf moet zijn opdat de inferentie van de NLMMs redelijk gerechtvaardigd zou zijn. Afhankelijk van de studies, kan de aanwezigheid van een “lower limit of detection” (of “lower limit of quantification”) de analyses moeilijker maken. Een mogelijkheid om met het probleem van censoring om te gaan is het gebruik van een Tobit regression in de NLMMs. Conventionele methoden, bijvoorbeeld de complete case analyses en de substitutiemethode, geven vertekende schattingen. Daarom zijn die methoden niet aanbevolen. Hoewel het eerste deel gebruik maakt van antistoffen op zichzelf, analyseert het tweede deel beide soorten data: antistoffen op zichzelf, en hun dichotomische versie. De focus van dit deel ligt op methoden om de associatie en heterogeniteit te vangen. Meerbepaald, in hoofdstuk 5, stellen wij voor methoden om de associatie tussen gecensureerde antistof data te meten. De voorgestelde methode maakt gebruik van een copulafunctie om de twee marginale verdelingen te koppelen. De copulafunctie bepaalt volledig de structuur van de associatie tussen de twee variabelen. In het geval van het fitten van een lineair regressiemodel, nemen we aan dat de marginale verdeling van de gecensureerde covariaat bekend is. Deze verdeling wordt later gebruikt in de maximum likelihood schatting (MLE). Wij laten zien via een simulatiestudie dat de voorgestelde methode goed presteert in verschillende scenario’s. Deze conclusie is geldig onder de voorwaarden dat onderliggende veronderstellingen vervuld zijn. Bovendien is het aangewezen om het gebruik van conventionele benaderingen te vermijden, aangezien deze methoden vertekende schattingen opleveren, afhankelijk van de percentages censuur in de gegevens. Net als bij de benadering in hoofdstuk 5, in hoofdstuk 6, wordt een copulafunctie gebruikt om de twee marginale verdelingen van de twee kwetsbare termen te verbinden. De copula-functie legt de afhankelijkheidsstructuur op tussen de twee kwetsbaarheidstermen; daarom definieert het de bivariate overlevingsfunctie. Via deze constructie beperkt ons voorgestelde model van copulakwetsbaarheid (bekend als het algemene frailty model) de associatieparameter niet. We bewijzen dat het bekende additieve gamma-frailty model een specifiek geval is van ons algemeen frailty model. In een simulatieonderzoek presteert ons copula-frailty model beter dan het algemeen erkende additieve gamma-frailty model wanneer een negatieve associatie tussen de tijdstippen van twee gebeurtenissen aanwezig is. Ten slotte wordt deel III van het proefschrift gewijd aan het analyseren van serologische gegevens die het resultaat zijn van serologische onderzoeken in een Bayesiaans kader. Bovendien werden sociale contactgegevens die werden verkregen uit enquˆetes over sociale contactgegevens opgenomen via het Mass Action Principle om conclusies te trekken over de leeftijdsafhankelijke kracht van infectie en de proportie voor door de lucht overgedragen ziekten. Hoofdstuk 7 presenteerde de Bayesiaanse analyse van serologische gegevens met behulp van verschillende benaderingen, waaronder de basis-, sequenti¨ele en gezamenlijke analyses. In de basisanalyse werden serologische gegevens geanalyseerd en bij deze analyse werden gefitte matrices voor contactfrequentie gebruikt. In de sequenti¨ele analyse werden zowel de matrices van de contactfrequentie als de evenredigheidsfactor q gelijktijdig geschat, en de posterieure verdelingen die het resultaat waren van de analyse van enkel de sociale contactgegevens werden gebruikt als prior verdelingen voor deze parameters). Ten slotte werden sociale contactgegevens en serologische gegevens samengevoegd in de gezamenlijke analyse. Met behulp van het Bayesiaanse framework kan men de onzekerheden van alle relevante parameters afleiden zodra de analyse is voltooid. De sequenti¨ele en gezamenlijke analyses bieden de mogelijkheid om variabiliteit van verschillende gegevensbronnen te combineren in ´e´en analyse.
Document URI:	http://hdl.handle.net/1942/31930
Category:	T1
Type:	Theses and Dissertations
Appears in Collections:	Research publications

Files in This Item:

File	Description	Size	Format
Thesis_TTMP_final.pdf		5.54 MB	Adobe PDF	View/Open

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM