|
Statistical Engineering Division SeminarAnalysis of Variance of Cross-Validation Estimators of the Generalization Error of Computer Algorithms
Dr. Marianthi Markatou Abstract We bring together methods from two different disciplines, statistics and machine learning, to address the problem of estimating the variance of cross-validation estimators of the generalization error of computer algorithms. We approach this problem as a problem in approximating the moments of a statistic. The approximation illustrat es the role of training and test sets in the performance of the algorithm. It provides a unifying approach to the evaluation of various methods used in obtaining training and test sets, and it takes into account the variability due to different training and test sets. For the simple problem of predicting the sample mean and in the case of smooth loss functions, we show that the variance of the cross-validation estimator of the generalizatio n error is a function of the moments of the random variables Y = Card(Si n Sj ), Y* = Card(Sic n Sjc) where Si and Sj are two training sets and Sic and Sjc are the corresponding test sets. We prove that the distribution of Y, Y* is hypergeometric and we compare our estimator of variance with the estimator proposed by Nadeau and Bengio (2003). We extend then the results in the case of multiple linear regression, kernel regression and classification. Remark: The talk is based on the paper: Markatou, M, Tian, H, Biswas, S. and Hripcsak, G. (2005). Analysis of Variance of Cross-Validation Estimators of the Generalization Error. Journal of Machine Learning Research, 6, 1127-1168.. NIST Contact: Charles Hagwood, (301) 975-2846.
Date created: 3/10/2006 |