Please use this identifier to cite or link to this item:
|Title:||A nonparametrical goodness-of-fit test||Authors:||MOONS, Elke
|Issue Date:||2003||Source:||EIRASS Workshop, abstract proceedings.||Abstract:||The assessment of the fit of a model is a very important component in any modeling procedure. Goodness-of-fit tests try to evaluate how well model-based predicted outcomes coincide with the observed data. In transport modeling, logit models have become a standard. However, just in these logistic regression models, investigating goodness-of-fit often is problematic when continuous covariates are modeled, since the approximate chi-squared null distributions for the Pearson test statistic is no longer valid. Categorization might provide a solution for this problem, but it is often not clear how the categories should be defined. Hosmer and Lemeshow(1980)were the first to propose a goodness-of-fit test, that can be used for logistic regression models with continuous predictors. They suggested using a Pearson-like chi-square statistic, but the groups are formed according to deciles of risk. In this way they solve the problem of categorizing, though it is well-known that this is at the cost of power. Many other methods and approaches, see e.g. Azzalini, Bowman and Heardle(1989), le Cessie and van Houwelingen(1991,1995), Aerts, Claessens and Hart(1999,2000), among others, were examined and most of them were based on nonparametric concepts. They have been shown to have good power characteristics, however, when many explanatory variables are modeled, most of these methods are faced with the curse of functionality. The test statistic that we propose here is similar in approach to the Hosmer and Lemeshow lack-of-fit test statistic in that the observations are classified into distinct groups. However, in our proposed test the grouping is not according to the fitted probabilities under the null model. We let a recursive partitioning algorithm, as used in the classification trees of Breiman et al.(1984), divide the sample space into different groups. This will affect the power characteristics of the test statistic. Classification trees are nonparametric in nature and they can also deal with large and complex datasets with many explanatory variables. Therefore, they are frequently used in data mining applications. The distribution of this test statistic has been studied through simulations and we will compare its performance to this of the Hosmer and Lemeshow test. An analysis on a large and complex real-world transportation data set will be provided to exemplify the procedure.||Document URI:||http://hdl.handle.net/1942/11987||Category:||C2||Type:||Proceedings Paper|
|Appears in Collections:||Research publications|
Show full item record
checked on May 20, 2022
checked on May 20, 2022
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.